Reverse engineering spamassassin rules

I started using procmail for filtering a couple of years ago, but haverecently supplemented it with bayes (bogofilter) and spamassassin,trying to find a good middle ground. I like spamassassin for its blendof capabilities, but there's no denying it imposes performance overhead..

There are a handful of spamassassin rules that hit significant numbersof incoming spams. Converting some of these to procmail recipes wouldallow a "coarse screen" to be put in place with procmail, avoiding theneed to process obvious spam through other tools altogether (part of mybeloved layered defenses).

An example rule that is detecting many of the random-word, bayes-poisonspams:

body PT_WORDLIST_30/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z]{4,12}\s+

){30}/
describe PT_WORDLIST_30 string of 30+ random words
score  PT_WORDLIST_30   10.0

I know the regexp used is perl syntax, and not all features ({}) areavailable to procmail. But from reading through the procmail howtos, Ibelieve a scoring rule might be used to score the ratio of articles andprepositions (and punctuation) to "other" words can achieve much thesame result.

Before I meander too far down this path though, I wanted to see if thereare any good collections of such recipes already available. I've seemsome basic rules, but many of the trickier spams seem to get past those.I'm out to match characteristics rather than specific phrases.


Any thoughts appreciated.

- Bob




_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail