procmail
[Top] [All Lists]

Re: Scoring Recipe for repeating addresses?

2002-06-18 16:37:59
At 14:29 2002-06-18 -0700, procmail(_at_)deliberate(_dot_)net did say:
        Thanks but your suggestion would grab *every* email with
a To/Cc addreessed to some postmaster@

Agreed. Martin's post (as well as his point about your recipe "condition") should be considerably more workable for you.

Something else you could try would be to pipe the recipients out to a perl script that would extract the recipient addresses, chop off domain portions, and sort the list, then reject it in the event that certain criteria were matched (more than 'x' duplicates, more than 'x' recipients perhaps all starting with the same one or two characters, etc).

Barring that, see the recipe I present below.

I still want to be able to receive postmaster traffic just discard those messages that are sent to 4 or 5 postmasters in a lump, those which are
clearly (and I assume to be) spam.

Well, they could be notifications about spam, or some other mail problem. I have a nimda notification processor on my servers which might end up addressing multiple postmasters (at different locations) if the whois records use a postmaster address.


At 14:48 2002-06-18 -0700, procmail(_at_)deliberate(_dot_)net did say:

        Just noticed that I had an extra "." before the TLD ...
oops!  I'd try to fix this and try again but I'm a bit shy of
diverting all my email again (for 5 hours last time). I hate
breaking something and then not knowing why it broke!

Check the URL in my .sig, and read up on the "sandbox" testing method. You can easily toss _saved_ (or constructed) mail at your filter without having to make it live in your mail system. Once you use a sandbox to test filters with, procmail will become a LOT easier to work with, and you'll be able to easily play "what if" and "would this work" scenarios without endangering your real email.

Would that actually count each occurance in the To/Cc headers?
Seems too easy ...

And thus it was made so, though the above won't match each occurrence due to a syntax issue with how the headers will be matched (go ahead and test it in a sandbox -- you'll get no more than ONE hit per header) -- but the idea is sound and has been used by others. The filter says if you have MORE THAN FOUR such recipients, the score will be positive, and poof, it's a match.

However, that matches only on one specific address, not on any address generically. Now perhaps you're cool with "(web|post)master", but if you want something a bit more generic, the following might be workable:

# extract the FIRST address sent to at our domain (so, if there are multiple
# recipients at your domain, the logic here applies to identifying the FIRST
# one - which might not work for you, but that's the logic here)
:0
* ^(To|Cc):.*\/[-+\(_dot_)_a-z0-9]*(_at_)mydomain\(_dot_)tld
{
        # we have the match from above, but that is username(_at_)domain --
        # we just want the username portion.
        :0
        * MATCH ?? ^\/[^(_at_)]+
        {
                KEYADDR=$MATCH
        }
}

# Extract To/Cc into a variable (necessary for the scoring matching the way
# that we use it).
:0
* ^To:\/.*
{
        RECIP=$MATCH
}

:0
* ^Cc:\/.*
{
        RECIP="$RECIP, $MATCH"
}

# Only if the key address isn't blank...
:0:
* ! KEYADDR ?? ^^^^
* -4^0
* $ 1^1 RECIP ?? (\<)${KEYADDR}(_at_)[-a-z0-9_]+\(_dot_)
spam.multiaddr.mbx


I just threw a junk mailbox at the above, and it catches a number of messages quite successfully. You could grab the To/Cc headers, then check for multiple match, as well as a separate check for multiple post/webmaster (if needed), by just following the above with:

# if the determined key address was postmaster or webmaster, no need to repeat
# the effort here (unless of course, you have different threshold limits
# between this recipe and the one above).
:0:
* ! KEYADDR ?? ^(post|web)master$
* -4^0
* $ 1^1 RECIP ?? (\<)(post|web)master(_at_)[-a-z0-9_]+\(_dot_)
spam.multiaddr.mbx


I think that this should do a reasonable job of accomplishing something similar to what you were requesting.

For spam subject scoring, I match against a SUBJECT variable, rather than the header - it allows me to compound matches on related kewords, which wouldn't work if I were matching directly from the header.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail