spf-discuss
[Top] [All Lists]

RE: Suggest New Mechanism Prefix NUMBER to Accelerate SPF Adoption

2004-08-25 13:35:25
On Thu, 26 Aug 2004, AccuSpam wrote:

Here are the current stats:

SPF result   probability of spam
----------   -------------------
NEUTRAL              0.898679
NEUTRAL(guess)       0.926437
PASS         0.101463
PASS(guess)  0.257572
SOFTFAIL     0.910824
NONE(guessed)        0.658428
UNKNOWN              0.580007
ERROR                too rare to measure


That is mathematical disaster.

You are assigning the same probabilities for all domains.  But not all
domains will have the same probability.

BINGO!  And not all receivers have the same probabilities for a given
domain either.  And actually, bayes automatically tracks probability for each
domain as well (since the domain appears in the Received-SPF header), but
that table would be way too big to post.

Unless you have the algorithm (it is possible) and the huge volume of data to
feed the algorithm (you will need a significant % of internet email), then
can not mathematically reliably determine the probabilities for EACH domain.

Wrong.  That is how bayesian filters work.  An empirical measurement is
a lot more meaningful than some number made up on the spot.  All
you have to do is ensure that it gets fed meaningful tokens, and the
bayesian algorithm does the rest automatically - actually measuring the
statistics instead of making them up.  Numbers supplied by the sender
would be worthless as tokens, unless quantified into a handful of
ranges (called, just for example, NONE,PASS,FAIL,SOFTFAIL,NEUTRAL,UNKNOWN).

-- 
              Stuart D. Gathman <stuart(_at_)bmsi(_dot_)com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.