ietf-asrg
[Top] [All Lists]

Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt

2006-03-31 06:02:15

Daniel Feenberg writes:
On Thu, 30 Mar 2006, Justin Mason wrote:
Michael Thomas writes:
Does there not exist something like, oh say, BLreports that
judges you on false positive/false negative, coverage, timeliness,
etc?

http://wiki.apache.org/spamassassin/DnsblAccuracy082005

Can you explain how to read the chart? As I understand it, of the messages 
identified by the XBL as from a spam source, 100% were classified by 
Spamassassin as spam, and of the messages Spamassassin thought were OK, 1% 
were on the XBL list. But I get a fair amount of spam with Spamassassin 
scores of 4 or less, so it wouldn't be right to call it an error rate of 
1%, would it? And what are the other 4 numbers?

Hi Daniel -- 

No, these are not figures based on SpamAssassin classifications, so
SpamAssassin's error rate is irrelevant.

These are hand-classified messages, sorted by a human being.  So messages
classed as "spam" really are spam, for sure, and vice-versa.  We're almost
[*] certain of it ;)

To explain the fields:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 17.449  24.9285   0.0113    1.000   0.97    3.90  RCVD_IN_XBL

That means that 24,9285% of the incoming spam was hit by XBL, and 0.0113%
of the incoming ham was hit (false positives).

That gives an "S/O ratio" if 1.000.  S/O is similar to bayesian
probability, or positive predictive value in medicine.  A 1.0 S/O is a
"perfect" score, meaning no false positives.  (Unfortunately the XBL
doesn't have a *real* 1.0 S/O here -- it's nearly there at 0.9995469, and
the 1.0 listing is due to rounding.)

As the page notes, http://wiki.apache.org/spamassassin/HitFrequencies has
lots more info on this accuracy-measurement format.


[*: of course, as a fair bit of the academic research recently has
noted, it's very difficult to be 100% sure, even with a human looking
at every single message.]

--j.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg

<Prev in Thread] Current Thread [Next in Thread>