I had some aliases which had been forwarding mail to other users of a
domain hosted on one of my servers, and one of them recently complained
about how much spew was being forwarded (despite a few DNSBLs). Since they
were forwarded directly from the MTA, and not locally delivered, they
weren't subject to filtering, and my extreme procmail filtering is on my
own host, not on the one I run friends mail through.
Anyway, changed the aliases to pipe through procmail before forwarding
(with an appropriate envelope change) and in the process set things up so I
could examine some of the spew before adding a few choice filters.
I noted a fair number of the HTML based spams are using span tags along
with a float style - intended to split commonly filtered pharma words so
that you can't match them easily. However, I don't see this technique
applied to legitimate messages. To protect against accidentally flagging a
legit message that might happen to use FLOAT for its intended purpose, I
give the recipe an initial negative score -- all the spams have a *LOT* of
these floats, while a legitimate message that happens to use float in a
span probaby won't use it a lot.
Within my own corpus, the spams tend to have enough other indicators that
they've been classified as spam without this, but this is a pretty good
test by itself - enough so that this and a couple of other tests tacked
into the procmailrc for the forwarded messages seems to be catching most of
the stuff that gets through the DNSBLs (though one of the tests is a "does
the relay appear on more than x of these secondary DNSBLs?" <g>). It's a
bit heavy because the body scan, but that could be mitigated by employing a
message size condition.
Generally, I don't like to have to dip into the message body, but
increasingly that's necessary to get all of the goop out.
:0
* -10^0
* 1^1 B ?? ()<span[^>]*style="[^"]*FLOAT:
{
# HTML spam breaking filtered words up using float to move text around
}
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail