My last post on this.
SPF result probability of spam
NEUTRAL 0.898679
If you want to test whether I was correct or incorrect, try also feed the
following 3 tokens (as words) into your Bayesian:
SPF_result
Domain
SPF_result+Domain
Then you will see that the usual combined result of two evidence in Bayesian
P(a @ b) (see equation in my first post this thread):
P( SPF_result @ Domain ) != SPF_result+Domain
This is because the two evidence are cross-correlated. Thus you will know that
the Bayesian you were doing was not correct.
Now if you include SPF_result+Domain in your Bayesian, then you are more
correct. But will you see these tokens often enough to add any evidence to
most results? You can test and see how frequent the correct combined token is
in top 15 words.
I bet you will find that you need some aprior data.
In probability, nothing is absolute, only relative correlation is what matters.
If the owner has some estimate of reasonably correlated data to offer, e.g.
some idea of what range of SMTP auth compliance he has, then this can help your
statistical analysis of what is spam.
Any way, I agree never mind.