RE: [spf-discuss] Perils of reputation

paddy(_at_)panici(_dot_)net wrote on Wednesday, February 07, 2007 6:30 AM -0600:

I would be looking in the direction of statistics, eg: mean and sd, so

instead of a

count of ham/spam scoring you get a distribution.


This is very interesting, as it addresses what a reputation score means
for different compositions of message flows, and similar question just
came up for the Spambayes classifier.  I imagine that Stuart converts
message spam scores into sender reputation the same way everyone else
does:

1) compute a real-valued spam score for each received message

2) "quantize" the score into a binary result (ham/spam)

3) count how many results are in each class

4) use those two counts to compute an overall reputation score

5) make a decision


You can see that there are two decision steps where real-valued numbers
are quantized into a binary result, steps 2 and 5.  The final
quantization in step 5 is unavoidable, but the first one in step 2 is
done out of convenience and for historical reasons.  Each time you do
this you add "quantization noise", exactly analogous to that in an A/D
converter.

If the messages classify definitively as ham or spam, quantizing
individual message scores into a binary result and counting them
preserves the important information.  However, with Stuart's cageliner
material, the individual message spam scores are not definitive.
Quantizing these scores into two classes in step 2 introduces a lot of
noise that the subsequent averaging in step 3 may not adequately remove,
and the resulting overall reputation score may be biased.  Since the
expected value for the overall score is somewhere near the decision
threshold for the cageliner class of material, quantization noise can
have a large impact on the final decision.

I suspect that Paddy's suggestion will result in better categorization
of cageliner material without causing problems for strongly classifying
ham/spam.  There are a couple of ways to go about this.  One way is to
replace steps 2, 3 and 4 above with a computation of statistics, i.e.
mean and variance, as Paddy suggests.  In step 5, you compare the mean
to a threshold to produce a binary result (make a decision), and if you
care to assume a distribution shape, use the variance to compute a
real-valued confidence indicator.  Another approach is to use a
combining algorithm, i.e. Fischer combining, to produce a real-valued
result that you then compare to a threshold.

--
Seth Goodman

-------
Sender Policy Framework: http://www.openspf.org/
Archives at http://archives.listbox.com/spf-discuss/current/
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/?list_id=735