procmail
[Top] [All Lists]

Re: Rejecting multiple names/subjects at once?

1996-06-13 22:34:41
[Mailed and mailed.]

On Thu, 13 Jun 1996 17:11:07 -0500 (CDT),
AIRWAVES MEDIA <rrb(_at_)clm(_dot_)aiss(_dot_)uiuc(_dot_)edu> wrote:
} Too be honest - I doubt killing on specific subjects will be that 
} effective.  These change with every mailing.  
There are strings that I want to use, not necessarily complete
subjects.  Strings lime two dollar signs in a row, the words 'USA
Magazines' and others.

Here's what I use:

    SHELL=/bin/sh

    SPAM="!!!+|\$\$+|(,000)+|magazine| ... etcetera, make your own ;-)

    :1:
    $^Subject:.*($SPAM).*($SPAM).*($SPAM)
    $HOME/scratch/inbox/spam

    :2:
    $^Subject:.*($SPAM).*($SPAM)
    ^From:(_dot_)*(_at_)[^ ]+\.com[ ]
    $HOME/scratch/inbox/spam

    :2:
    $^Subject: .*($SPAM|web)
    ^From: .*(earthlink\.net|spray\.com|spraynet\.com|spray\.net|pipeline\.com)
    $HOME/scratch/inbox/spam

That's the basic idea. I have different variations for different
situations, but you should get the picture. The variable SPAM is set
to a (largish; few dozen) set of words I hunt for. Only one match will
not do; if three of the words are matched, the message is fried. Then
as you can see I have lowered the threshold for some sites. 
  This could be made a lot more straightforward with scoring (man
procmailsc) but I have yet to see an implementation. I have asked on
this list if somebody was using scoring to hunt for spams but no
replies so far. 
  My site is still using an ancient version of Procmail and I haven't
felt the urge (and/or had the time) to set up my personal upgrade, so
I don't even currently have a Procmail with scoring. The above
emulates scoring, in a way, but it's very limited, of course (and a
bit of a drag on the maintenance side ...)
  For the record, I've had +very+ few matches on these, but then, I've
largely been spared from spams lately (knock on wood. Maybe the
spammers have started excluding non-American domains in a rush to, er,
get more focused :-). 
  Also, like Wotan and others have commented, it's extremely hard to
think up a scheme that works for all possible cases (with few or no
mismatches). The following have still slipped through:

    Subject: Unusual Promotion for 1st Time Users
    Subject: $800,000.00 Sweepstakes!
    Subject: May I please have your permission?
    Subject: 1000 Shares

The first one contains enough suspect words for this scheme to work,
in principle. For the "sweepstakes" case, I suppose a special rule for
^Subject: ($SPAM) ($SPAM)[!?.]?$ could do, or then you could just examine
the content a little bit for a high frequency of slime words. But then
you'd definitely want scoring.

Hope this helps,

/* era */

-- 
See <http://www.ling.helsinki.fi/~reriksso/> for mantra, disclaimer, etc.