After doing spam filtering using the usual criteria (X-Advertisment:,
Cyberpromo blocking, known spammer domains, Friend(_at_)public(_dot_)com,
etc.), some
spammers with slightly above-average (for a spammer) intelligence were still
getting through my filters. Eventually, I started noticing a trend in a lot
of spam: the To: header would exactly match the From: header. Unfortunately,
some recent spam seems to be going away from this trend, but I still trap a
fair amount using this technique, so I thought I'd share it with you. After I
implemented this a few weeks ago, the amount of spam that got past my filters
went to zero. It also hasn't misfiled any e-mail, so it seems pretty
successful to me. Here it is:
# First, a few definitions (I got most of these from postings to the procmail
# mailing list over the years):
PRE_ADDR_SPAN='(.*[^-((_dot_)%(_at_)a-zA-Z0-9])?'
IN_ADDR_SPAN='([^,.> ]+\.)?'
FROMHDR="(^((((Resent|Apparently)-)?From|Sender|Reply-To|(X-)?Envelope-From):|>?From
)$PRE_ADDR_SPAN)"
# Next, define a regexp that matches all of your valid e-mail addresses
# and another that matches your domain name(s)
MY_DOMAINS="(($IN_ADDR_SPAN)*your\.domain\.name|somewhere-else\.net)"
MY_NAMES="youruserid(@$MY_DOMAINS)?"
# Initialize variables...
TO_VALUE # Insure that TO_VALUE is unset.
FROM_VALUE # Insure that FROM_VALUE is unset.
# E-mails where the To: and From: headers match but it's not To: or From: me
# or somebody from my domain are probably spam.
:0
* ^To:[ ]*\/[^ ].*
{
TO_VALUE = $MATCH
:0
* ^From:[ ]*\/[^ ].*
{
FROM_VALUE = $MATCH
:0:
* TO_VALUE ?? .
* FROM_VALUE ?? .
* $ ! ^TO($MY_NAMES)
* $ ! $FROMHDR($MY_NAMES|[^(_at_)]+@$MY_DOMAINS)
* $ FROM_VALUE ?? ^^$\TO_VALUE^^
mbox.spam
}
}
As always, that's a space and a tab inside those "[ ]" and "[^ ]".
Later,
Ed