procmail
[Top] [All Lists]

HTML float style

2009-12-01 20:13:37

I had some aliases which had been forwarding mail to other users of a domain hosted on one of my servers, and one of them recently complained about how much spew was being forwarded (despite a few DNSBLs). Since they were forwarded directly from the MTA, and not locally delivered, they weren't subject to filtering, and my extreme procmail filtering is on my own host, not on the one I run friends mail through.

Anyway, changed the aliases to pipe through procmail before forwarding (with an appropriate envelope change) and in the process set things up so I could examine some of the spew before adding a few choice filters.

I noted a fair number of the HTML based spams are using span tags along with a float style - intended to split commonly filtered pharma words so that you can't match them easily. However, I don't see this technique applied to legitimate messages. To protect against accidentally flagging a legit message that might happen to use FLOAT for its intended purpose, I give the recipe an initial negative score -- all the spams have a *LOT* of these floats, while a legitimate message that happens to use float in a span probaby won't use it a lot.

Within my own corpus, the spams tend to have enough other indicators that they've been classified as spam without this, but this is a pretty good test by itself - enough so that this and a couple of other tests tacked into the procmailrc for the forwarded messages seems to be catching most of the stuff that gets through the DNSBLs (though one of the tests is a "does the relay appear on more than x of these secondary DNSBLs?" <g>). It's a bit heavy because the body scan, but that could be mitigated by employing a message size condition.

Generally, I don't like to have to dip into the message body, but increasingly that's necessary to get all of the goop out.

:0
* -10^0
* 1^1 B ?? ()<span[^>]*style="[^"]*FLOAT:
{
        # HTML spam breaking filtered words up using float to move text around
}
---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>