procmail
[Top] [All Lists]

Re: Keep getting subborn spam with random words

2004-03-10 11:42:53
Gary Funck wrote:
[...]
(I've tried dspam; though it required some training, it did work well
in the tests that I ran. I didn't try a head-to-head comparison, and
haven't deployed it.)

I'm glad to have Skip's input on this thead... though we may be OT for procmail, using bayes to supplement procmail is certainly a good tool.

I'm currently running four bayes implentations in bogofilter, ifile and spamprobe, as well as spamassassin's implementation. I've set up rules to cross-check them. I trained all on a common corpus of ~2000 spam, and a similar number of ham messages.

I haven't done any formal training, but on my (admittedly modest) setup with ~500-700 messages daily, it's ONLY bayes that is catching all of the "word salad" messages intended to poison bayes. I'll defer to skip on the intracies of the method, but in the discussions I've seen, it's generally agreed that training on these messages does NOT diminish the effectiveness of bayes, and in fact IMPROVES the odds of such messages getting flagged. (Emph. this is dependent on training!) An unknown word has no effect, so other indicators (headers, content) are more important. The check is more "I don't know about THESE (salad) words, but THESE (the spammer's message) sure sound like spam (even if obfuscated (i.e. v1agra). There are only so many ways to mis-spell viagra and have it register to a reader as viagra.

spamprobe uses word pairs. ifile can categorizes according to user-definable categories (i.e. spam, ham, hobby, procmail), as can crm114 (as-yet untested by me) -- something to "autoclassify" inbound messages. I'm only using spam/ham. bogofilter is (purportedly) very fast, and offers tri-state ham/spam/unsure scoring, which requires more training.

ALL FOUR are doing very well for me. I make a point of training on the same messages, and they're generally in near-perfect agreement. bogofilter's "unsure" is responsible for flagging most for training these days.

After doing all of the static rules tests, if the bayes tests indicate spam, I can be pretty sure it is. As part of a layered defense against spam, it's very useful as a "last line." All can be tweaked and tuned to varying degrees as well.

So, 1st line is procmail, 2nd (or more) is bayes. It works very well for me. YMMV.

- Bob






_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail