Re: Scoring Recipe for repeating addresses?

On Tue, 18 Jun 2002 15:59:41 -0700, PSE-L(_at_)mail(_dot_)professional(_dot_)org
(Professional Software Engineering) wrote:
=> Agreed.  Martin's post (as well as his point about your recipe "condition") 
=> should be considerably more workable for you.

        I liked Martin's approach and was toying with the use of
$VARIABLE to grab the number of occurances of my target address
part in one header (Cc) and then, if a multiple, increment the
next scoring by that amount in the other (To) header afterwards
etc ... it seems like the cc header is almost always the
populated one for spam.

I never realized that one can [perhaps???] use scoring like:

# --- condition lines are not real code
:0 c
* 1^1 _some_test_here_
/dev/null
TESTCOUNT = $=
# pass in previous testcount to start new count
:0 c
*$ $TESTCOUNT^0
* 1^1 _some_other_test_here_
/dev/null
UPDATEDCOUNT = $=
# --- and then use that counter variable to test on

=> Something else you could try would be to pipe the recipients out to a perl 
=> script that would extract the recipient addresses, 
        <snip>

        I can't even spell pearl <grin> much less code in it, but
it might be an option for me with PHP ... later ...

=> Well, they could be notifications about spam, or some other mail 
=> problem.  I have a nimda notification processor on my servers which might 
=> end up addressing multiple postmasters (at different locations) if the 
=> whois records use a postmaster address.

        I actually get very very little legitimate traffic to the
various pstmaster(_at_)mydomains(_dot_)tld but the spam to that is
increasing exponentially it seems. I suspect that the unfortunate
result is that many folks will quit having a deliverable
postmaster address. <sigh>

=> Check the URL in my .sig, and read up on the "sandbox" testing method.  You 
=> can easily toss _saved_ (or constructed) mail at your filter without having 
=> to make it live in your mail system.  Once you use a sandbox to test 
=> filters with, procmail will become a LOT easier to work with, and you'll be 
=> able to easily play "what if" and "would this work" scenarios without 
=> endangering your real email.

        I've seen your many previous recommendations on this
"sandbox" and for me up until now it's been like being told to
floss my teeth, I know it's good for me but I just can't bring
myself to actually do it (sorry for the analogy, a bit long in
the tooth I guess).

        I finally realize that you've been absolutely right all
along and I should sit down and learn to do this and set it up
and quit complaining ... thanks for continuing to press on this.

=> However, that matches only on one specific address, not on any address 
=> generically.  Now perhaps you're cool with "(web|post)master", but if you 
=> want something a bit more generic, the following might be workable:

        I *always* like more generalized and elegant approaches!
Life seems too full of down and dirty, just get by stuff.

=> # extract the FIRST address sent to at our domain (so, if there are multiple
=> # recipients at your domain, the logic here applies to identifying the FIRST
=> # one - which might not work for you, but that's the logic here)
=> :0
=> * ^(To|Cc):.*\/[-+\(_dot_)_a-z0-9]*(_at_)mydomain\(_dot_)tld
=> {
=>          :0
=>          * MATCH ?? ^\/[^(_at_)]+
=>          {
=>                  KEYADDR=$MATCH
=>          }
=> }

        Very slick. Would I mess it up by using $DOMAINLIST
instead of mydomain\.tld?
* $ ^(To|Cc):.*\/[-+\(_dot_)_a-z0-9]*(_at_)$DOMAINLIST

        BTW, I have no problem with the stricter definition of an
address to not include goofy special characters.  But, would
using the "word" construct something like that below be OK as
well?
* $ ^(To|Cc):.*\<\/(_dot_)*(_at_)$DOMAINLIST

Guess I just want to get that last CPU cycle in there ...
 
=> # Extract To/Cc into a variable (necessary for the scoring matching the way
=> # that we use it).
=> :0
=> * ^To:\/.*
=> {
=>          RECIP=$MATCH
=> }
=> 
=> :0
=> * ^Cc:\/.*
=> {
=>          RECIP="$RECIP, $MATCH"
=> }

        You can do that???

=> # Only if the key address isn't blank...
=> :0:
=> * ! KEYADDR ?? ^^^^
=> * -4^0
=> * $ 1^1 RECIP ?? (\<)${KEYADDR}(_at_)[-a-z0-9_]+\(_dot_)
=> spam.multiaddr.mbx

        Very very nice!

=> # if the determined key address was postmaster or webmaster, no need to 
repeat
=> # the effort here (unless of course, you have different threshold limits
=> # between this recipe and the one above).

        I think the same 4 or 5 occurances of any address will be
just fine, no need (for me) for a different threshold but it's a
nice touch.

=> I think that this should do a reasonable job of accomplishing something 
=> similar to what you were requesting.

        Indeed. *Very* reasonable!
 
=> For spam subject scoring, I match against a SUBJECT variable, rather than 
=> the header - it allows me to compound matches on related kewords, which 
=> wouldn't work if I were matching directly from the header.

        Understanding how/why that approach works really opens up
a whole new level of thinking about mucking about with procmail.
I like it.

        Thank you very much,

        - Don
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail