procmail
[Top] [All Lists]

Re: simple questions

1998-12-10 05:50:16
On Thu, 10 Dec 1998 10:56:30 +0200 (EET), I wrote:
<psalzman(_at_)landau(_dot_)ucdavis(_dot_)edu> wrote:
known spammers in a separate file?  does someone have a file like this?
 3. Have Procmail construct the recipe on the fly each time. This is
    probably even more expensive and there are a couple of tricky
    conditions you have to look out for. This is the solution I
    recommend the least and will therefore not cover any further. 

Just for the record, this is something that Panix used to do in their
publically available recipe file. They took it out. That probably
means something.

Here's a basic implementation:

    # List of spammers is in $HOME/spammers.txt
    SPAMMER_REGEX=`tr '\012' '|' <$HOME/spammers.txt`

    :0
    * $ ^Received:.*\<($SPAMMER_REGEX)\>
    | your-action-here

And here are the caveats:

  * If the SPAMMER_REGEX (plus the Received: part) grows larger than
    your setting of LINEBUF, you are in trouble (Procmail could dump
    core -- that kind of trouble). Yes, you could set your LINEBUF
    insanely high, but that's inelegant and a further waste of
    resources. And if you just set it slighly higher, you +will+
    unknowingly hit that limit again sometime in the future when you
    add more spammers to spammers.txt.

  * If the spammers.txt file contains empty lines, you end up with a
    regex somewhat like "spammer 1||spammer 2" which means "spammer 1"
    or nothing at all or "spammer 2" which means you accidentally get
    a match on all messages with a Received: header. Depending on the
    type of action you have on spam, the effects range from annoying
    to catastrophic.

    Note that a trailing newline on the last line of the spammers.txt
    file counts as an "empty line" in this context!

    Here's a slightly better, but more expensive, version of the
    assignment part of the recipe:

    SPAMMER_REGEX=`(egrep -v '$^|^[     ]*#' spammers.txt ; 
        echo -n nonesvch:::nonesvch:::nonesvch) | tr '\012' '|'`

    # Assuming echo -n is understood by your echo to mean no newline

    This even allows you to have empty lines, and comments (any line
    whose first nonblank character is #), in your spammers.txt file.

Like I mentioned above, this is a bit of a waste of resources. I
haven't measured the effects of the tr versus fgrep -- in fact the
fgrep method probably wastes more cycles than the original tr. And in
principle fgrep, too, is somewhat prone to the empty lines problem,
but at least a trailing newline on the last line is not problematic.

/* era */

-- 
.obBotBait: It shouldn't even matter whether    <http://www.iki.fi/~era/>
I am a resident of the state of Washington. <http://members.xoom.com/procmail/>

<Prev in Thread] Current Thread [Next in Thread>