Era Eriksson posted a set of spam-killing recipes ...
| Here's what I use:
|
| SHELL=/bin/sh
|
| SPAM="!!!+|\$\$+|(,000)+|magazine| ... etcetera, make your own ;-)
|
| :1:
| $^Subject:.*($SPAM).*($SPAM).*($SPAM)
| $HOME/scratch/inbox/spam
|
| :2:
| $^Subject:.*($SPAM).*($SPAM)
| ^From:(_dot_)*(_at_)[^ ]+\.com[ ]
| $HOME/scratch/inbox/spam
|
| :2:
| $^Subject: .*($SPAM|web)
| ^From: .*(earthlink\.net|spray\.com|spraynet\.com|spray\.net|pipeline\.com)
| $HOME/scratch/inbox/spam
and asked,
| This could be made a lot more straightforward with scoring (man
| procmailsc) but I have yet to see an implementation. I have asked on
| this list if somebody was using scoring to hunt for spams but no
| replies so far.
Well, ok. Era's procmail doesn't have scoring, nor apparently even asterisk
counting, but the latter came first, so if we have scoring, we have asterisk
notation.
| SPAM="!!!+|\$\$+|(,000)+|magazine| ... " # etcetera, make your own ;-)
Of course, if you're going to test for (,000)+ as an entire alternative
by itself with no need for anything specific to the left or the right of
it, you might as well, just test for ,000; also, I personally would prefer
to include the outer parentheses at this point rather than in every use of
the variable below. So let's make it
SPAM="(!!!+|[$][$]+|,000|magazine| ... )" # etcetera, make your own
Now, scoring looks for *non-overlapping* occurrences, so this:
* $ 1^1 ^Subject:.*$SPAM
would score only 1 no matter how many times $SPAM appears in the subject.
A message would need to have multiple subject lines, two ore more of them
containing matches to $SPAM, to score more than one from that condition.
So we work around the overlap by saving the rest of the subject in a
variable:
:0
* ^Subject:\/.*
{ SUBJECT=$MATCH }
Now we can scan $SUBJECT for appearances of $SPAM and get an accurate count.
If it weren't for the need to count "web" if the message is from a 2-point
site but not otherwise, the recipe would have been so much simpler. Note
that spray.com, spraynet.com, and pipeline.com addresses will get 1 point
from the third condition and 1 from the fourth. The first condition is
unweighted (and thus absolute) for a quick exit when there's no need.
:0: # 2 points or fewer acceptable, but more than 2 points and you're out
* $ SUBJECT ?? web|$SPAM
* $ 1^1 SUBJECT ?? $SPAM
* 1^0 ^From:(_dot_)*(_at_)[^ ]+\.com[ ]
* 1^0 ^From:.*(spray(net)?|pipeline)\.com
* 2^0 ^From: .*(earthlink|spray)\.net
* -2^0
$HOME/scratch/inbox/spam
But we do have that complication, so let's have at. If your version of
procmail does not allow interleaved comments in the middles of recipes,
move them to a safe place:
:0:
# If the subject is clean, escape unconditionally.
* $ SUBJECT ?? web|$SPAM
# Score .9 for each appearance ("^1") of a match to $SPAM in the subject:
* $ .9^1 SUBJECT ?? $SPAM
# Score .8 for any .com site:
* .8^0 ^From:(_dot_)*(_at_)[^ ]+\.com[ ]
# Score 1.2 (total 2) for suspect .com sites:
* 1.2^0 ^From:.*(spray(net)?|pipeline)\.com
# Score 2 for suspect .net sites:
* 2^0 ^From: .*(earthlink|spray)\.net
# Score .1 if "web" is in the subject at least once:
* .1^0 SUBJECT ?? web
# Forgive scores of 2 or lower:
* -2^0
# If net score is still positive, shunt to spam folder:
$HOME/scratch/inbox/spam
It should work out like this:
Three appearances of $SPAM in the subject guarantee at least +.7 and will
cause rejection whether or not "web" is also in the subject and regardless
of the site of origin.
Mail from a non-suspect .com site with neither "web" nor $SPAM escapes.
Mail from a non-suspect .com site with "web" but no $SPAM scores -1.1.
Mail from a non-suspect .com site with one $SPAM but no "web" scores -.3.
Mail from a non-suspect .com site with "web" and one $SPAM scores -.2.
None of those result in rejection.
Mail from a non-suspect .com site with two $SPAMs scores +.6 without "web"
or +.7 with "web" and is rejected in either situation.
Coming from a suspect site is worth 2 points (.8+1.2 for those in .com and
simply 2 for those in .net); adding "web" or even one $SPAM is enough to get
a positive final score and to be rejected.
Mail from a suspect site with a clean subject escapes.