RE: Keep getting subborn spam with random words

From: Skip Montanaro
Sent: Wednesday, March 10, 2004 8:46 AM

[...]


As I pointed out in an earlier message, by default Spambayes ignores all
tokens with a spam probability between 0.4 and 0.6.  Tokens which it has
never seen get a spamprob of 0.5 and are thus ignored.  Ignoring 
such unsure
tokens was decided upon based upon a lot of testing.  Other statistical
filters may have arrived at a different decision about how to treat
unrecognized tokens.  I don't think Graham's initial code threw 
them out, so
people naively implementing the scheme he outlined in "A Plan for Spam"
would probably see their classifiers overwhelmed by nonsense words.


A relatively new entrant in the Bayes classifier category is DSPAM.
It uses a "Dolby" technique, which essentially eliminates the "quiet spots"
which offer no, or little value, in determining if a message is spam or
not:
http://www.nuclearelephant.com/projects/dspam/
the 'noise reduction' algorithm is summarized here:
http://www.nuclearelephant.com/projects/dspam/bnr.html
and the white paper is here:
http://www.nuclearelephant.com/projects/dspam/BNR%20LNCS.pdf

(I've tried dspam; though it required some training, it did work well
in the tests that I ran. I didn't try a head-to-head comparison, and
haven't deployed it.)




_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: [pro] Charset ISO-8859-1 and ASCII Subjects, Charles Gregory

Next by Date:

RE: Keep getting subborn spam with random words, Gary Funck

Previous by Thread:

Re: Keep getting subborn spam with random words, Skip Montanaro

Next by Thread:

Re: Keep getting subborn spam with random words, Bob George

Indexes:

[Date] [Thread] [Top] [All Lists]