procmail
[Top] [All Lists]

Spammer-slammer algorithm

1997-10-22 15:32:56
    I've seen a few lists on the web where people valiantly try
to come up with lists of all the wierd domain names that the
spammers use.  It's a losing battle.  Just like slimy viruses
and germs, the spammer domain names mutate to get get past our
defenses.  For $100/month "cost of doing business", they can
get new domain names every month from InterNIC, and any list
compiled by volunteers is bound to be out of date.  There has
to be a better way.
    There is.  Filter by the IP address of the computer that's
sending the spam.  Getting a new address block is somewhat
more of a hassle than a new domain name.  You can't get a new
one every month as easily as a new domain name.  That's what my
algorithm relies on.  I haven't seen it before on the internet,
but if somebody already has come up with this idea, let's just
say that great minds think alike ;>)  And as for the volunteer
blacklist-keepers out there, you can earn netizens' gratitude
by keeping lists of spammers' IP address blocks, as well as
domain names.
    The first recipie is pretty much standard.  Reject email if
it isn't addressed to either me or one of the mailing lists I
subscribe to.  I keep a copy of the rejected headers for use in
building up a list of spammer domains, and also just in case I
do inadvertantly filter out legitimate email.
    I've had very few spams addressed directly to me, but I
wouldn't be surprised if the spammers eventually surmount the
problem of customising "To:" addresses.  Let's assume that a
spam is addressed correctly to me, and that the spammer isn't
honest enough to put in "X-Advertisement:" or offer a removal
option.  So it gets past the first recipie, and past the first
3 conditions of the second recipie.
    That's where my "Spammer Slammer" algorithm comes in.
I've included 5 sample spammer domains, and show how they can
be filtered.  If you check the header of an email message to
you, the sending machine's IP address is in a "Received: from"
header.  Note that a message can be passed around via one or
two machines internal to your ISP before it gets to "/var/mail".
Reading down the headers from the top, you generally want the
first "Received: from" machine that is outside your ISP.  You
might follow further, but beware of forged headers.  Unless
your ISP gets spam from someone spoofing an address, the first
external "Received: from" should be reliable.
    The address will look something like [123.456.789.012].
If it isn't someone abusing an honest ISP, and you do want to
filter out the entire domain, execute the command
"whois full net 123.456.789." from a unix prompt; or submit
"full net 123.456.789." to the web interface to InterNIC.
You'll get back a listing with, hopefully, a range of IP
addresses.  In some cases you might get AGIS as the top-level,
with CYBERPROMO as the sub-level.  You can decide who you want
to restrict.
    Because the 3 characters "[" and "." and "]" are procmail
controls, they have to be escaped in recipies.  The easiest
filters are for IP blocks consisting of 256 values 0..255 in
the fourth part of the address.  NETBLK-PBI-CUSTNET-1208 and
NETBLK-CYBERPROMO-205-199 and NETBLK-CYBERPROMO-205-199B are
three examples.  For situations where partial blocks are owned,
filtering gets a bit uglier, but it can be handled.  The
NETBLK-CYBERPROMO1-COM and SOFTFACTS-BLK-205-254 addresses
are filtered in my example filter.  In the sample that follows,
the subdirectory .nospam is assumed to exist in your home
directory.  $ORGMAIL should resolve as /var/mail/<your logon ID>,
but you may want to check with your ISP.
    If you have an email correspondent who uses a spam-ridden
provider like hotmail.com, you could enter their full email
address into a recipie before the first one, and deliver to
$ORGMAIL on a positive match.

############################ START OF SAMPLE
MAILDIR=$HOME/.nospam
LOGFILE=$MAILDIR/HEADERS
LOGABSTRACT=NO

:0Hi
* !^(From|Reply-To):.*(interlog.com|mapinfo|majordom|csl.sri.com|autoreply)
* !^(To|Cc|Bcc):.*(waltdnes|mapinfo|csl.sri.com)
* !^To:.*reform-online
* !^To:.*procmail.Informatik.RWTH.Aachen.DE
{

LOG=////////////////////////////////////////
       :0hi
       |grep . >> $LOGFILE
}

# Filter "honest spammers"; First 3 lines.
#
# Corporate Computer World (NETBLK-PBI-CUSTNET-1208)
# 207.212.65.0 - 207.212.65.255
#
# Cyber Promotions Inc (NETBLK-CYBERPROMO-205-199B)
# 205.199.2.0 - 205.199.2.255
#
# Cyber Promotions Inc (NETBLK-CYBERPROMO-205-199)
# 205.199.212.0 - 205.199.212.255
#
# Cyber Promotions (NETBLK-CYBERPROMO1-COM)
# 207.87.233.64 - 207.87.233.95  A bit uglier to filter :>(
#
# SOFTFACTS-BLK-205-254 (Stomping grounds of Nevwest/Lostvegas)
# 205.254.164.0 - 205.254.167.0

:0Hi
* !^X-Advertisement:
* !^X-(0-9):.*(iemmc.com|remov)
* !^X-(0-9)(0-9):.*(iemmc.com|remov)
* !Received: from*\[207\.212\.65\.
* !Received: from*\[205\.199\.2\.
* !Received: from*\[205\.199\.212\.
* !Received: from*\[207\.87\.233\.(6[4-9]|[7-8][0-9]|9[0-5])\]
* !Received: from*\[205\.254\.16([4-6]\.*|7\.0)\]
$ORGMAIL

LOG=////////////////////////////////////////
       :0hi
       |grep . >> $LOGFILE

############################## END OF SAMPLE

    A couple of questions while I'm at it.  The line...
 "|grep . >> $LOGFILE" does look a bit clunky.  Would
 "|cat >> $LOGFILE" copy the headers properly?

    Also, does the at-sign "@", have to be escaped with a
backslash, or is it not a special character?


 Walter Dnes
 <waltdnes(_at_)interlog(_dot_)com>

<Prev in Thread] Current Thread [Next in Thread>