From: Skip Montanaro
Sent: Wednesday, March 10, 2004 8:46 AM
[...]
As I pointed out in an earlier message, by default Spambayes ignores all
tokens with a spam probability between 0.4 and 0.6. Tokens which it has
never seen get a spamprob of 0.5 and are thus ignored. Ignoring
such unsure
tokens was decided upon based upon a lot of testing. Other statistical
filters may have arrived at a different decision about how to treat
unrecognized tokens. I don't think Graham's initial code threw
them out, so
people naively implementing the scheme he outlined in "A Plan for Spam"
would probably see their classifiers overwhelmed by nonsense words.
A relatively new entrant in the Bayes classifier category is DSPAM.
It uses a "Dolby" technique, which essentially eliminates the "quiet spots"
which offer no, or little value, in determining if a message is spam or
not:
http://www.nuclearelephant.com/projects/dspam/
the 'noise reduction' algorithm is summarized here:
http://www.nuclearelephant.com/projects/dspam/bnr.html
and the white paper is here:
http://www.nuclearelephant.com/projects/dspam/BNR%20LNCS.pdf
(I've tried dspam; though it required some training, it did work well
in the tests that I ran. I didn't try a head-to-head comparison, and
haven't deployed it.)
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail