Re: Keep getting subborn spam with random words
2004-03-10 11:42:53
Gary Funck wrote:
[...]
(I've tried dspam; though it required some training, it did work well
in the tests that I ran. I didn't try a head-to-head comparison, and
haven't deployed it.)
I'm glad to have Skip's input on this thead... though we may be OT for
procmail, using bayes to supplement procmail is certainly a good tool.
I'm currently running four bayes implentations in bogofilter, ifile and
spamprobe, as well as spamassassin's implementation. I've set up rules
to cross-check them. I trained all on a common corpus of ~2000 spam, and
a similar number of ham messages.
I haven't done any formal training, but on my (admittedly modest) setup
with ~500-700 messages daily, it's ONLY bayes that is catching all of
the "word salad" messages intended to poison bayes. I'll defer to skip
on the intracies of the method, but in the discussions I've seen, it's
generally agreed that training on these messages does NOT diminish the
effectiveness of bayes, and in fact IMPROVES the odds of such messages
getting flagged. (Emph. this is dependent on training!) An unknown word
has no effect, so other indicators (headers, content) are more
important. The check is more "I don't know about THESE (salad) words,
but THESE (the spammer's message) sure sound like spam (even if
obfuscated (i.e. v1agra). There are only so many ways to mis-spell
viagra and have it register to a reader as viagra.
spamprobe uses word pairs. ifile can categorizes according to
user-definable categories (i.e. spam, ham, hobby, procmail), as can
crm114 (as-yet untested by me) -- something to "autoclassify" inbound
messages. I'm only using spam/ham. bogofilter is (purportedly) very
fast, and offers tri-state ham/spam/unsure scoring, which requires more
training.
ALL FOUR are doing very well for me. I make a point of training on the
same messages, and they're generally in near-perfect agreement.
bogofilter's "unsure" is responsible for flagging most for training
these days.
After doing all of the static rules tests, if the bayes tests indicate
spam, I can be pretty sure it is. As part of a layered defense against
spam, it's very useful as a "last line." All can be tweaked and tuned to
varying degrees as well.
So, 1st line is procmail, 2nd (or more) is bayes. It works very well for
me. YMMV.
- Bob
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Keep getting subborn spam with random words, Eric Wood
- Re: Keep getting subborn spam with random words, Jay Moore
- RE: Keep getting subborn spam with random words, Gary Funck
- Re: Keep getting subborn spam with random words, Skip Montanaro
- Re: Keep getting subborn spam with random words, Jay Moore
- Re: Keep getting subborn spam with random words, LuKreme
- Re: Keep getting subborn spam with random words, Professional Software Engineering
- Re: Keep getting subborn spam with random words, Bob George
- Re: Keep getting subborn spam with random words, Bob George
- Re: Keep getting subborn spam with random words, fleet
|
|
|