[Top] [All Lists]

RE: Suggest New Mechanism Prefix NUMBER to Accelerate SPF Adoption

2004-08-25 18:08:53

My last post on this.

SPF result  probability of spam
NEUTRAL             0.898679

If you want to test whether I was correct or incorrect, try also feed the 
following 3 tokens (as words) into your Bayesian:


Then you will see that the usual combined result of two evidence in Bayesian 
P(a @ b) (see equation in my first post this thread):

P( SPF_result @ Domain ) != SPF_result+Domain

This is because the two evidence are cross-correlated.  Thus you will know that 
the Bayesian you were doing was not correct.

Now if you include SPF_result+Domain in your  Bayesian, then you are more 
correct.  But will you see these tokens often enough to add any evidence to 
most results?  You can test and see how frequent the correct combined token is 
in top 15 words.

I bet you will find that you need some aprior data.

In probability, nothing is absolute, only relative correlation is what matters. 
 If the owner has some estimate of reasonably correlated data to offer, e.g. 
some idea of what range of SMTP auth compliance he has, then this can help your 
statistical analysis of what is spam.

Any way, I agree never mind.