Stephane Bortzmeyer [bortzmeyer(_at_)nic(_dot_)fr] wrote:
Julian Mehnle <bulk(_at_)mehnle(_dot_)net> wrote:
You don't want to apply bayesian filtering to raw e-mail headers,
On the contrary, you must do it (see
http://www.paulgraham.com/better.html):
You misread my statement. It said "_raw_ e-mail headers", as opposed to
"processed e-mail headers". You cannot expect "batilda.nic.fr" to be a
meaningful token for bayesian filterin, just like you cannot expect
"softfail" to be a meaningful token. Very much unlike
"Subject*batilda.nic.fr" and "ReceivedSPF*softfail". :-)
There is a lesson here for filter writers: don't ignore data. You'd
think this lesson would be too obvious to mention, but I've had to learn
it several times.
What you probably wanted to say is: don't ignore information.
But that is exactly what you are doing when you apply a context-ignorant
bayesian filter to raw message headers.