Daniel Feenberg writes:
On Thu, 30 Mar 2006, Justin Mason wrote:
Michael Thomas writes:
Does there not exist something like, oh say, BLreports that
judges you on false positive/false negative, coverage, timeliness,
etc?
http://wiki.apache.org/spamassassin/DnsblAccuracy082005
Can you explain how to read the chart? As I understand it, of the messages
identified by the XBL as from a spam source, 100% were classified by
Spamassassin as spam, and of the messages Spamassassin thought were OK, 1%
were on the XBL list. But I get a fair amount of spam with Spamassassin
scores of 4 or less, so it wouldn't be right to call it an error rate of
1%, would it? And what are the other 4 numbers?
Hi Daniel --
No, these are not figures based on SpamAssassin classifications, so
SpamAssassin's error rate is irrelevant.
These are hand-classified messages, sorted by a human being. So messages
classed as "spam" really are spam, for sure, and vice-versa. We're almost
[*] certain of it ;)
To explain the fields:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
17.449 24.9285 0.0113 1.000 0.97 3.90 RCVD_IN_XBL
That means that 24,9285% of the incoming spam was hit by XBL, and 0.0113%
of the incoming ham was hit (false positives).
That gives an "S/O ratio" if 1.000. S/O is similar to bayesian
probability, or positive predictive value in medicine. A 1.0 S/O is a
"perfect" score, meaning no false positives. (Unfortunately the XBL
doesn't have a *real* 1.0 S/O here -- it's nearly there at 0.9995469, and
the 1.0 listing is due to rounding.)
As the page notes, http://wiki.apache.org/spamassassin/HitFrequencies has
lots more info on this accuracy-measurement format.
[*: of course, as a fair bit of the academic research recently has
noted, it's very difficult to be 100% sure, even with a human looking
at every single message.]
--j.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg