Re: how do I filter spam for a whole domain?

Caine the wanderer writes on 10 September 1997 at 19:28:02

Have been seeing a lot of info on banning spam to a particular user, but
is there a way to ban it for a whole domain (say through sendmail or
something)?  Sorry this is a little off topic, just curious as we are


I've been playing around with a .rc file ran off an /etc/aliases entry
for each user.  Biggest glitch is the SMTP postmark changes which
causes ^FROM_DAEMON to match when it didn't before; there's various
ways to address this, but I'm not completely satisfied with any of
them yet.

Thus, for now, I've reverted back to doing things via an INCLUDERC
file, which is appended below.

   Dan
------------------- message is author's opinion only ------------------
J. Daniel Smith <DanS(_at_)bristol(_dot_)com>        
http://www.bristol.com/~DanS
Bristol Technology B.V.                   +31 33 450 50 50, ...51 (FAX)
Amersfoort, The Netherlands               {info,jobs}(_at_)bristol(_dot_)com
-----
# 
# J. Daniel Smith
# 21 August 1997
#
# spam.rc
#
# Try to detect SPAM and take appropriate actions when found.
#

# Like procmail's ^TO, but for From: and CC: lines
# The extra outer layer of parentheses are so that one can use forms like
# ${FROM}* or ${FROM}+ or ${FROM}?.
CC=${CC:-"(^((Original-)?(Resent-)?(Cc|Bcc)):(.*[^a-zA-Z])?)"}
FROM=${FROM:-"(^((X-(Envelope-)?)?(Apparently-|Resent-)*(From|Reply-To|Sender):\
(.*[^-a-z0-9_])?|From ([^       ]*[-_(_at_)!(_dot_)])?))"}

#
# Much of the following is compliments of David Tamkin 
<dattier(_at_)wwa(_dot_)com>
#
SPAMCHECK_ACTION=${SPAMCHECK_ACTION:-header} # subject, discard, or header

#####
##### Do SPAM detection
#####
##### These recipes could check for an existing X-SpamCheck-Reason: header
##### for improved efficiency, but for now it might be interesting to
##### see how many different heuristics catch a particular piece of SPAM.
#####
##### Each recipe should set IS_SPAM=yes and add a X-SpamCheck-Reason:
##### header
#####

SPAMCHECK_SPAM=no       # default

#####
##### All various header-based checks.
#####

# Invalid Message-Id:s are likely SPAM
:0
* ! ^Message-Id:[       ]*<[^   <>@]+(_at_)[^   <>@]+>[         ]*$
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Invalid Message-Id"
}

# required headers
:0h
* ^From:
* ^(Apparently-)?To:
* ^Date:
{ }
:E
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Insufficient message headers"
}

# bogus addresses
# $!(^TO|${FROM}).+@([-a-z0-9_]+\.)+\.[-a-z0-9_]+
# atext="[a-zA-Z0-9!#$%&'*+-=?^_`{|}~]"
# dotatom="[    ]*${atext}(\.${atext})?[        ]*"
# $!(^TO|${FROM})${dotatom}(_at_)${dotatom}
# don't execpt all syntactially valid address - who's going to have
# a real email address of "foo_(_at_)-bar-(_dot_)com"?
word="[a-z0-9][-a-z0-9_.+]*[a-z0-9]+"
tld="(com|gov|org|edu|net|[a-z][a-z])"
:0h
* $^TO${word}@(${word}\.)+${tld}
* $${FROM}${word}@(${word}\.)+${tld}
{ }
:E
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: invalid Internet address"
}


# No large headers
:0
{
  MAX_COMMAS=45
  #
  # From David W. Tamkin <dattier(_at_)wwa(_dot_)com>  
  #
  :0h # H is implicit; this is h
  * ^Resent-(To|Cc):
  ADDRESSES=|formail -czxResent-To: -xResent-Cc:
  :0Eh
  ADDRESSES=|formail -czxTo: -xCc: -xApparently-To:

  # Now, the number of addressees should be the number of non-empty
  # lines (procmail always sees an extra empty line at the end of a
  # search area) plus the number of commas; this will still overcount
  # if someone has a comma inside a name comment (thus MAX_COMMAS
  # instead of MAX_ADDRESSES).
  :0
  * 1^1 ADDRESSES ?? ^.+$
  * 1^1 ADDRESSES ?? ,
  * $-${MAX_COMMAS}^0
  {
    SPAMCHECK_SPAM=yes
    :0fwh
    | formail -A "X-SpamCheck-Reason: Too many commas in addresses"
  }
}

# spam-like addresses - let friends(_at_)planetall(_dot_)com fall through
:0
* $(${FROM}|^TO)(remove|delete|free|friend@)
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Suspicious addresses"
}

# Thanks to Pegasus mail, we have this:
:0
* ^X-Distribution:[     ]?(moderate|bulk|mass)
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Pegasus moderate/bulk/mass mailing"
}

# This is too easy :-)
:0
* ^X-(Adverti[sz](e)?ment|[0-9]):
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: X-Advertisement: header detected"
}

# Headers that shouldn't exist in "real" mail
#
# Might need to be a little more particular here; 
# Philip Guenther <guenther(_at_)gac(_dot_)edu>: If a message comes into your
# mailbox that has the X-UIDL: header, and doesn't have your address in
# the header, then I would have strong doubts about it's legitamacy. 
#
# Edward J. Sabol <sabol(_at_)alderaan(_dot_)gsfc(_dot_)nasa(_dot_)gov>: 
E-mails with
# X-UIDL: headers are almost definitely spam unless they've been
# Resent-To: me by someone. Also, valid X-UIDL: headers have 32 hexadecimal
# digits exactly.
:0
* ^X-UIDL:
* !^X-UIDL:[    ]*[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]\
                  [0-9a-f][0-9a-f][0-9a-f][0-9a-f][     ]*$
* !^Resent-To:
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Invalid X-UIDL: header detected"
}

# Check if From: = To:
MATCH=${SENDER:-`formail -rtzx To:`}
# We exclude anything with a Resent- header to avoid problems with
# lists that change the Reply-To: to point back to the list.
:0
* $^TO$MATCH\>
* !^Resent-
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: To: and From:/Reply-To: headers are 
identical"
}

# and From: = Reply-To:
# I've generated some messages like this myself :-), thus the added
# check against ^FROM_DAEMON
MATCH=`formail -IReply-To: -rtzx To:`
:0
* $!(^FROM_DAEMON|${FROM}majordomo)
* $^(Reply|Errors)-To:[         ]?$MATCH\>
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: From: and Reply-To:/Errors-To: headers are 
identical"
}


#####
##### Look at the body...this starts getting trickier
#####
# this is going to need some beefing up...
:0BD
* FREE
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: Text 'FREE' detected"
}

# raw HTML
:0BH
* !^(Mime-Version|Content-Type):
* \<(body.*|html)\>
{
  SPAMCHECK_SPAM=yes
  :0fwh
  | formail -A "X-SpamCheck-Reason: HTML w/o MIME headers"
}

#####
##### Now deliver the mail
#####

:0
* SPAMCHECK_SPAM ?? yes
{
  :0h
  * SPAMCHECK_ACTION ?? discard
  /dev/null

  MATCH # unset it to start
  :0Efwh # if set to "subject" make it work if there is a subject or not
  * SPAMCHECK_ACTION ?? subject
  * 1^0 ^Subject\/:.*
  * 1^0
  | formail -I"Subject: SPAM$MATCH"

  # this is the "default" action, thus no "SPAMCHECK_ACTION ?? header" test.
  # Some mail systems gateways (e.g. Notes' PostalUnion) will pitch
  # this, which is why the "subject" option above exists.  Of course,
  # do this after "discard"... :-)
  #
  # Only add this header for SPAM to make further procmail filtering
  # easy.  But empty header fields might get pitched...
  :0fwh
  | formail -A "X-SpamCheck-Disposition: this message is spam"
}

# ...and record that the message passed through here
:0fwh
| formail -A"X-SpamCheck: Dan's SPAM Detector" \
          -A"X-SpamCheck-Version: 0.2"