procmail
[Top] [All Lists]

Re: Embedded comments

2003-05-26 02:00:12
A matching snippet I use. It catches a curious new stealth spam trick
that (the dastardly cowards ;-) doesn't obey HTML regulation bash-dash-
dash "!--". (Is there no honour left?) Instead it breaks words in this
fash<t24ko2j5jklk>ion. I've put an example here (plain text, 4K):
http://chatwin.f2o.org/Robert/2003/Spam/embedded_spam_ex

The matching snippet is: 
        ${a}<(!-- ?)?${w}${w}+( *--)?>$a
where the PERL-style variables are (inherited from JARI AALTO :-)
http://info.ccone.at/INFO/Tips+Co/pm-tips.html
        a       = "[a-zA-Z]"            # word, only letters
        w       = "[0-9a-z_A-Z]"        # word
        
This will NOT catch "normal" HTML comments surrounded by whitespace. 
        
In use, this is in a body search (when everything else is out of the
way, and only likely spam is left):

* $$BODYSCORE^0

# HTML-comment stealth is really a giveaway: an arbitrary max 60K score
#  to catch: Fan<t24ko2j5jklk>tast<!-- j324kjhk3j534has -->ic etc.
* 12000^0.80 $${a}<(!-- ?)?${w}${w}+( *--)?>$a
* .......
* .......

My scoring is personal of course. This is just part of my adaptation
of JOHN CONOVER's "stochastic.UCE.detection" recipes.
http://www.johncon.com/john/StochasticUCEDetection/

OT: This is my first post to the list after more than 2 years of 
being subscribed. Thanks to everyone... In my case (because of what I
have used) thanks especially for the basic philosophy and intro to 
NANCY MCGOUGH http://www.ii.com/internet/robots/procmail/qs/ and to 
JOHN and JARI for the actual recipes I have tweaked. 

Every question I've had in these two years I've found amswered there
already.

Procmail works so well I am almost pleased when a new trick like that
curious one Fan<t24ko2j5jklk>tast... drops into my "quarantine" box to 
get looked at. With the straightforward statistical techniques
like John Conover's (they're sometimes called Bayesian, perhaps 
unnecessarily) need so little maintenance now to keep spam out of the
inboxes that I really am looking forward to the next little challenge
from the baddies. Makes me feel a bit mean really ...

Best of luck to all.

Robert

--  
R A Chatwin (Badajoz, Spain). Please do not reply OFF LIST as well.  
Personal mail is welcome at: procm at chatwin.f2o.org (R A Chatwin)
http://chatwin.f2o.org  - - - - - - - - - - - - "Future Compatible"             
              

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>