procmail
[Top] [All Lists]

Re: scanning the body of a message

1997-05-19 10:37:00
Michael Fuhr <mfuhr(_at_)dimensional(_dot_)com> wrote:
Is there an option other than scanning the body like this:

:0b
* http://www\.(siteone|sitetwo|sitethree)

My list of sex domains 70+ long, making the above method hard to read.  :( 

If your list gets too long, use the weighted scoring technique
documented in procmailsc(5).  Here's an example:

Yes, but that will take (much) more CPU time.
As for solutions using external (f)greps, those are likely to be more
timeconsuming as well.

Any time such a list becomes unwieldy (sp?) to handle, why not have a small
sh or awk or sed or perl script construct the regexp for you?

In your .procmailrc:

INCLUDERC=sexspamurlrc

In a file called sexspamdomains:

siteone
sitetwo
sitethree

In a file called gensexspamurlrc:

#!/bin/sh

exec <sexspamdomains >sexspamurlrc.tmp

echo ':0'

read domain

regexp="* http://www\\.($domain"

while read domain
do
   regexp="$regexp|$domain"
done

echo "$regexp)\\.com"
echo "/dev/null"

exec 1>&-

mv sexspamurlrc.tmp sexspamurlrc

exit 0


Now you can conveniently edit sexspamdomains in a readable fashion.  Just
make sure you run gensexspamurlrc after making changes.
There is *no* runtime penalty for procmail, so you get convenience and speed.

BTW, this example applies to *many* examples in the past where people where
trying to avoid coding "complex" expressions directly in procmail and
using external programs instead.  By using the external program in this
fashion, it is run *only* when the list changes, and not every time a mail
arrives.
-- 
Sincerely,                                                          
srb(_at_)cuci(_dot_)nl
           Stephen R. van den Berg (AKA BuGless).

"Father's Day: Nine months before Mother's Day."