Re: Scoring Recipe for repeating addresses?

At 14:29 2002-06-18 -0700, procmail(_at_)deliberate(_dot_)net did say:

        Thanks but your suggestion would grab *every* email with
a To/Cc addreessed to some postmaster@

Agreed. Martin's post (as well as his point about your recipe "condition")should be considerably more workable for you.

Something else you could try would be to pipe the recipients out to a perlscript that would extract the recipient addresses, chop off domainportions, and sort the list, then reject it in the event that certaincriteria were matched (more than 'x' duplicates, more than 'x' recipientsperhaps all starting with the same one or two characters, etc).


Barring that, see the recipe I present below.

I still want to be able to receive postmaster traffic just discard thosemessages that are sent to 4 or 5 postmasters in a lump, those which are
clearly (and I assume to be) spam.

Well, they could be notifications about spam, or some other mailproblem. I have a nimda notification processor on my servers which mightend up addressing multiple postmasters (at different locations) if thewhois records use a postmaster address.



At 14:48 2002-06-18 -0700, procmail(_at_)deliberate(_dot_)net did say:

        Just noticed that I had an extra "." before the TLD ...
oops!  I'd try to fix this and try again but I'm a bit shy of
diverting all my email again (for 5 hours last time). I hate
breaking something and then not knowing why it broke!

Check the URL in my .sig, and read up on the "sandbox" testing method. Youcan easily toss _saved_ (or constructed) mail at your filter without havingto make it live in your mail system. Once you use a sandbox to testfilters with, procmail will become a LOT easier to work with, and you'll beable to easily play "what if" and "would this work" scenarios withoutendangering your real email.

Would that actually count each occurance in the To/Cc headers?
Seems too easy ...

And thus it was made so, though the above won't match each occurrence dueto a syntax issue with how the headers will be matched (go ahead and testit in a sandbox -- you'll get no more than ONE hit per header) -- but theidea is sound and has been used by others. The filter says if you haveMORE THAN FOUR such recipients, the score will be positive, and poof, it'sa match.

However, that matches only on one specific address, not on any addressgenerically. Now perhaps you're cool with "(web|post)master", but if youwant something a bit more generic, the following might be workable:


# extract the FIRST address sent to at our domain (so, if there are multiple
# recipients at your domain, the logic here applies to identifying the FIRST
# one - which might not work for you, but that's the logic here)
:0
* ^(To|Cc):.*\/[-+\(_dot_)_a-z0-9]*(_at_)mydomain\(_dot_)tld
{
        # we have the match from above, but that is username(_at_)domain --
        # we just want the username portion.
        :0
        * MATCH ?? ^\/[^(_at_)]+
        {
                KEYADDR=$MATCH
        }
}

# Extract To/Cc into a variable (necessary for the scoring matching the way
# that we use it).
:0
* ^To:\/.*
{
        RECIP=$MATCH
}

:0
* ^Cc:\/.*
{
        RECIP="$RECIP, $MATCH"
}

# Only if the key address isn't blank...
:0:
* ! KEYADDR ?? ^^^^
* -4^0
* $ 1^1 RECIP ?? (\<)${KEYADDR}(_at_)[-a-z0-9_]+\(_dot_)
spam.multiaddr.mbx

I just threw a junk mailbox at the above, and it catches a number ofmessages quite successfully. You could grab the To/Cc headers, then checkfor multiple match, as well as a separate check for multiple post/webmaster(if needed), by just following the above with:


# if the determined key address was postmaster or webmaster, no need to repeat
# the effort here (unless of course, you have different threshold limits
# between this recipe and the one above).
:0:
* ! KEYADDR ?? ^(post|web)master$
* -4^0
* $ 1^1 RECIP ?? (\<)(post|web)master(_at_)[-a-z0-9_]+\(_dot_)
spam.multiaddr.mbx

I think that this should do a reasonable job of accomplishing somethingsimilar to what you were requesting.

For spam subject scoring, I match against a SUBJECT variable, rather thanthe header - it allows me to compound matches on related kewords, whichwouldn't work if I were matching directly from the header.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail