procmail
[Top] [All Lists]

Re: A better way

1998-05-29 00:08:53
On Thu, 28 May 1998 08:31:39 -0700 (PDT), Eric Hilding
<eric(_at_)hilding(_dot_)com> wrote:
As some new cleverly worded sales hype approach slips
through, I add to the recipe.  But I'm thinking there
must be a better way to perhaps keep all of the words
and phrases to filter on in a separate alphabetical
file to simplify making any future corrections.  Maybe
something like a "lookup" list? 

I don't see how that would simplify things very much, but it's
certainly doable, and prevents you from running up against Procmail's
LINEBUF limit. (Your grep might have a similar limit, though.)

    :0HB:
    * ? egrep -f file
    ugh.spam

Would it be more appropriate to also use a lockfile
like:
:0 B:lockfile.crapper  

Why? 

If you're thinking you need to lock the "crapper" file in case you
would happen to be editing it while mail comes in, that largely
depends on your editor. You'd probably have to lock it by hand in that
case. 

  $ lockfile lockfile.crapper
  $ edit file  # ...
  $ rm -f lockfile.crapper

Procmail attempts to lock only when it actually writes to the folder,
so you need to instate a "regional lock file" with LOCKFILE= to lock
something already in the condition-matching phase:

    LOCKFILE=lockfile.crapper
    :0BH:
    * ? egrep -f file
    ugh.spam
    LOCKFILE  # We're done, give up the regional lock

The local lock on ugh.spam is not strictly necessary if this is the
only recipe which will ever attempt to write to it, but I'd leave it
in regardless. (Better safe than sorry.)

Can the body word and phrase filtering work on the header 
at the same time if I use:
:0 Bh:lockfile.crapper

That's BH. h says the +action+ line (as opposed to the conditions)
gets the header only. (The default is H but hb, i.e. condtions look at
only the headers, while the action line gets the whole message.)

Hope this helps,

/* era */

Tangential remarks:

I hope you're not relying exclusively on hype word lists for
filtering. Something like 90-95% of all spam can be caught by looking
for some well-known stigmata which the spammer programs put in the
headers.

My own hype word lists use scoring, so one hype word match is not
enough to trash a message; some like "definately legitamate" [sic]
don't add much to the score, whereas something like "Paul Johnson,
Raleigh, NC" (this is from the standard Chris Erickson MMF spam)
contributes almost enough to trip the filter on its own. (I imagine
this message would get caught merely because I include these two
examples. Shit happens. Don't throw anything to /dev/null -- anyhow,
you should do your duty and complain about all the spam you receive.)

You could use scoring even with external egreps, but it will be rather
complicated.

Make sure you have SHELL=/bin/sh at the top of your .procmailrc if
your login shell is csh or tcsh.

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

<Prev in Thread] Current Thread [Next in Thread>