Re: Using procmail with SendMail for all users

Mark Parr writes on 23 September 1997 at 21:48:26

I'd like to use ProcMail with SendMail to filter out all undesirable
senders (ie spams) for everyone at my site -- not just individuals.
Therefore, I don't want every user having to configure ProcMail for
the spammers -- I would rather do it system wide.  From I want I can


and

Stanton McCandlish writes on 23 September 1997 at 16:20:42

I've set up a spam-killing procmail filter for myself and several other
users here that is smart enough to try to deliver mail locally for users
that want that, forward it on to another address if that is desired, or


I toyed with this idea a few weeks ago...the roadblock I came up
against was getting the "From " postmark "correct" on forwarded
messages.

At issue is the fact that the mail experts will tell you loud and
clear that in a forwarding situtation, the postmark should be the
forwarding agent (envelope address) and NOT the original "From:"
header.  These experts will go on to even tell you that most
implementations of $HOME/.forward files behave incorrectly as they
preserve the original postmark.

These experts know far more than I do about this, and I can understand
the point they make about where bounces should go and the like.
Furthermore, I want to "do the right thing", especially if I'm going
to sharing it on the net.

The result was that mail forwarded to remote addresses had a postmark
of "daemon", which is technically correct.  However, this created two
problems for which I couldn't find a decent solution
   * if somebody ran procmail at the forwarded address, ^FROM_DAEMON
     would now match when it wouldn't before.  There are ways to
     work-around this, but they require changes to an already working
     .procmailrc file.
   * there are numerous broken mail tools that incorrectly use the
     "From " postmark when they should use the From: header
     (/usr/ucb/from  under Solaris 2.5 for example).  Such forwarding
     "breaks" these utilities (or at least that's the way it appears).

With all those disclaimers in place, I've appended below what I had
setup (I now use a INCLUDERC file which I recently posted).  Many
thanks to David Tamkin who wrote much of the code and provided a lot
of good advice.

   Dan
------------------- message is author's opinion only ------------------
J. Daniel Smith <DanS(_at_)bristol(_dot_)com>        
http://www.bristol.com/~DanS
Bristol Technology B.V.                   +31 33 450 50 50, ...51 (FAX)
Amersfoort, The Netherlands               {info,jobs}(_at_)bristol(_dot_)com
-----
# 
# J. Daniel Smith
# 21 August 1997
#
# spam.rc
#
# Try to detect SPAM and take appropriate actions when found.
#
# This file is designed to be used from the /etc/aliases file to
# filter mail destined for some other address.  Typical usages would
# be as follows
#   logname: "| /path/to/procmail -m /this/file.rc subject logname"
#   fwd: "| /path/to/procmail -m /this/file.rc discard fwd(_at_)host"
#   prog: "| /path/to/procmail -m /this/file.rc header /some/prog args"
# The argument is one of
#   subject - modify the Subject line to indicate SPAM was found
#   discard - send the message to /dev/null
#   header - add an X-This-Is-Spam: header to the message (this is
#            also done if for the "subject" option)
# which determines what should be done with detected SPAM.  The second
# argument specifics what to do with real mail (or SPAM that isn't
# discarded).  It can be a local file (user's mailbox), forward to
# another address, or a pipe to another program.
#

#####
##### Initial setup needed for *all* procmail invokations
#####
PATH=/bin:/usr/bin:/usr/local/bin:/usr/ucb
SHELL=/bin/sh

# should be SUID for -d to work properly
PROCMAIL=/usr/local/bin/procmail

# Like procmail's ^TO, but for From: and CC: lines
# The extra outer layer of parentheses are so that one can use forms like
# ${FROM}* or ${FROM}+ or ${FROM}?.
CC="(^((Original-)?(Resent-)?(Cc|Bcc)):(.*[^a-zA-Z])?)"
FROM="(^((X-(Envelope-)?)?(Apparently-|Resent-)*(From|Reply-To|Sender):\
(.*[^-a-z0-9_])?|From ([^       ]*[-_(_at_)!(_dot_)])?))"

# log everything verbosely, since I want to see how all this works
# this needs to be near the beginning of the file to turn logging on ASAP
#LOGFILE=/etc/mail/spam.LOG
#LOGABSTRACT=all
#VERBOSE=on

#
# Much of the following is compliments of David Tamkin 
<dattier(_at_)wwa(_dot_)com>
#
IF_SPAM=$1 # subject, discard, or header
IF_KOSHER=$2 # local logname, remote address, or program without args

:0 # if it's a program, lose $1 so that "$@" will be what we need
* IF_KOSHER ?? ^^/
{ SHIFT=1 }

#####
##### Do SPAM detection
#####
##### These recipes could check for an existing X-SpamCheck-Reason: header
##### for improved efficiency, but for now it might be interesting to
##### see how many different heuristics catch a particular piece of SPAM.
#####
##### Each recipe should set IS_SPAM=yes and add a X-SpamCheck-Reason:
##### header
#####

IS_SPAM=no      # default

# Invalid Message-Id:s are likely SPAM
:0
* ! ^Message-Id:[       ]*<[^   <>@]+(_at_)[^   <>@]+>[         ]*$
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Invalid Message-Id"
}

# required headers
:0h
* ^From:
* ^(Apparently-)?To:
* ^Date:
{ }
:E
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Insufficient message headers"
}

# No large headers
:0
{
  MAX_COMMAS=45
  #
  # From David W. Tamkin <dattier(_at_)wwa(_dot_)com>  
  #
  :0h # H is implicit; this is h
  * ^Resent-(To|Cc):
  ADDRESSES=|formail -czxResent-To: -xResent-Cc:
  :0Eh
  ADDRESSES=|formail -czxTo: -xCc: -xApparently-To:

  # Now, the number of addressees should be the number of non-empty
  # lines (procmail always sees an extra empty line at the end of a
  # search area) plus the number of commas; this will still overcount
  # if someone has a comma inside a name comment (thus MAX_COMMAS
  # instead of MAX_ADDRESSES).
  :0
  * 1^1 ADDRESSES ?? ^.+$
  * 1^1 ADDRESSES ?? ,
  * $-${MAX_COMMAS}^0
  {
    IS_SPAM=yes
    :0fwh
    | formail -A "X-SpamCheck-Reason: Too many commas in addresses"
  }
}

# spam-like addresses - let friends(_at_)planetall(_dot_)com fall through
:0
* $(${FROM}|^TO)(remove|delete|free|friend@)
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Suspicious addresses"
}

# Thanks to Pegasus mail, we have this:
:0
* ^X-Distribution:[     ]?(moderate|bulk|mass)
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Pegasus moderate/bulk/mass mailing"
}

# This is too easy :-)
:0
* ^X-(Advertisement|[0-9]):
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: X-Advertisement: header detected"
}

# Headers that shouldn't exist in "real" mail
#
# Might need to be a little more particular here; 
# Philip Guenther <guenther(_at_)gac(_dot_)edu>: If a message comes into your
# mailbox that has the X-UIDL: header, and doesn't have your address in
# the header, then I would have strong doubts about it's legitamacy. 
#
# Edward J. Sabol <sabol(_at_)alderaan(_dot_)gsfc(_dot_)nasa(_dot_)gov>: 
E-mails with
# X-UIDL: headers are almost definitely spam unless they've been
# Resent-To: me by someone. Also, valid X-UIDL: headers have 32 hexadecimal
# digits exactly.
:0
* ^X-UIDL:
* ! ^X-UIDL:[   ]*[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][     ]*$
* ! ^Resent-To:
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Invalid X-UIDL: header detected"
}

# Check if From: = To:
#
# Extract Reply-To: or From: (try that order).  The negation
# is to pull a deMorgan's law trick and get OR like semantics
# with short circuiting.
:0
* ! ^Reply-To: *\/[^ ].*
* ! ^From: *\/[^ ].*
{
   # No Reply-To: or From: header was found.  What to do here
   # is your choice.  *Every* message should have a From: header,
   # and some MTAs (e.g., sendmail) will create one, so this
   # may very well be impossible, in which case anything you put
   # here will be ignored, except for comments which you'll continue
   # to read and ponder until you realize how silly they are.

   # I'll treat this as likely spam
   :0
   {
    IS_SPAM=yes
    :0fwh
    | formail -A "X-SpamCheck-Reason: Missing From: or Reply-To: header"
   }
}
# If the previous recipe failed it's conditions, then a match was
# found.  Use the match as the target of a ^TO_ search.  ^TO_ was
# introduced in procmail 3.11pre4.  If you don't have at least that,
# just use ^TO
# We exclude anything with a Resent- header to avoid problems with
# lists that change the Reply-To: to point back to the list.
:0E
* $ ^TO$\MATCH\>
* ! ^Resent-
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Suspicious message headers"
}

# and From: = Reply-To:
# I've generated some messages like this myself :-), thus the added
# check against ^FROM_DAEMON
:0
* ! ^From: *\/[^ ].*
{
  # see comments above
  :0
  {
    IS_SPAM=yes
    :0fwh
    | formail -A "X-SpamCheck-Reason: No From: header"
  }
}
:0E
* !^FROM_DAEMON
* $ ^Reply-To:[         ]?$\MATCH\\>
{
  IS_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: From: and Reply-To: headers are identical"
}


#####
##### Now deliver the mail
#####

:0
* IS_SPAM ?? yes
{
  :0h
  * IF_SPAM ?? discard
  /dev/null

  MATCH # unset it to start
  :0Efwh # if set to "subject" make it work if there is a subject or not
  * IF_SPAM ?? subject
  * 1^0 ^Subject\/:.*
  * 1^0
  | formail -I"Subject: SPAM$MATCH"

  # this is the "default" action, thus no "IF_SPAM ?? header" test.
  # Some mail systems gateways (e.g. Notes' PostalUnion) will pitch
  # this, which is why the "subject" option above exists.  Of course,
  # do this after "discard"... :-)
  #
  # Only add this header for SPAM to make further procmail filtering
  # easy.  But empty header fields might get pitched...
  :0fwh
  | formail -A "X-SpamCheck-Disposition: this message is spam"
}

# ... but if not detected as spam (or spam is not being discarded), we
# rely on $IF_KOSHER 

# save the original postmark...
:0fwh
* ^^From +\/[^ ]+
| formail -zA "X-Orig-Envelope-From: $MATCH"

# ...and record that the message passed through here
:0fwh
| formail -A"X-SpamCheck: Dan's SPAM Detector" \
          -A"X-SpamCheck-Version: 0.1" \
          -A"X-SpamCheck-Destination: $IF_KOSHER"

:0 # local logname
* IF_KOSHER ?? ^^[^/@!]+^^
* ? grep ^$IF_KOSHER: /etc/passwd
| $PROCMAIL -d $IF_KOSHER
# Conditions on the above may be made more stringent to be sure of matching
# acceptable patterns for local lognames.  For example,
# * IF_KOSHER ?? ^^[a-z]\
#   [0-9a-z][0-9a-z]?[0-9a-z]?[0-9a-z]?[0-9a-z]?[0-9a-z]?[0-9a-z]?^^
# or if you don't mind the fork,
# * ? grep ^$IF_KOSHER: /etc/passwd
# In such a case, you could even put the local logname test first, before
# the handling of remote addresses.

# For a remote address the envelope sender really should point back to your
# system, as that is where it was readdressed.  If you don't want it to appear
# as "nobody" you can change the action line of
#   :0
#   * IF_KOSHER ?? [(_at_)!]
#   ! $IF_KOSHER
# to something like
#   ! -f postmaster $IF_KOSHER

# program name might contain an @ or !, but a remote address will not
# begin with a /
:0E # W perhaps? # to a program
* IF_KOSHER ?? ^^/
| "$@"
:0E # remote address
* IF_KOSHER ?? [(_at_)!]
! $IF_KOSHER


:0Efwh # IF_KOSHER is null or unset or improper.
| formail -A"X-Diagnostic: no handling specified for legitimate mail" 
 :0A # local lockfile if saved for, rather than forwarded to, the postmaster
 ! postmaster # or a folder for the postmaster writable by this process