Re: Keep getting subborn spam with random words

Gary Funck wrote:

[...]
(I've tried dspam; though it required some training, it did work well
in the tests that I ran. I didn't try a head-to-head comparison, and
haven't deployed it.)

I'm glad to have Skip's input on this thead... though we may be OT forprocmail, using bayes to supplement procmail is certainly a good tool.

I'm currently running four bayes implentations in bogofilter, ifile andspamprobe, as well as spamassassin's implementation. I've set up rulesto cross-check them. I trained all on a common corpus of ~2000 spam, anda similar number of ham messages.

I haven't done any formal training, but on my (admittedly modest) setupwith ~500-700 messages daily, it's ONLY bayes that is catching all ofthe "word salad" messages intended to poison bayes. I'll defer to skipon the intracies of the method, but in the discussions I've seen, it'sgenerally agreed that training on these messages does NOT diminish theeffectiveness of bayes, and in fact IMPROVES the odds of such messagesgetting flagged. (Emph. this is dependent on training!) An unknown wordhas no effect, so other indicators (headers, content) are moreimportant. The check is more "I don't know about THESE (salad) words,but THESE (the spammer's message) sure sound like spam (even ifobfuscated (i.e. v1agra). There are only so many ways to mis-spellviagra and have it register to a reader as viagra.

spamprobe uses word pairs. ifile can categorizes according touser-definable categories (i.e. spam, ham, hobby, procmail), as cancrm114 (as-yet untested by me) -- something to "autoclassify" inboundmessages. I'm only using spam/ham. bogofilter is (purportedly) veryfast, and offers tri-state ham/spam/unsure scoring, which requires moretraining.

ALL FOUR are doing very well for me. I make a point of training on thesame messages, and they're generally in near-perfect agreement.bogofilter's "unsure" is responsible for flagging most for trainingthese days.

After doing all of the static rules tests, if the bayes tests indicatespam, I can be pretty sure it is. As part of a layered defense againstspam, it's very useful as a "last line." All can be tweaked and tuned tovarying degrees as well.

So, 1st line is procmail, 2nd (or more) is bayes. It works very well forme. YMMV.


- Bob






_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail