Re: [Asrg] spamstones


Dave Crocker said:

hmmm. it occurs to me that a technical research group on spam might want
to consider look to agree on standard methodology for deriving false
negatives and false positives. this would allow everyone to compare
mechanisms in an equivalent way.

given the nature of spam, and the nature of most technologies for
detecting it, the determination of FNs and FPs is not automatically
obvious.  that makes it a fertile opportunity for standardization.


May I suggest using Ion Androutsopoulos' metrics?   He wrote some of the
seminal papers on naive Bayesian classification's effectiveness as a spam
filter, and proposed a very helpful metric -- TCR -- which we in
SpamAssassin have used as a result for a couple of years.  It's a nice way
to get an idea of effectiveness distilled into 1 number.

TCR takes into account a concept of "wasting the user's time" with FPs and
FNs.  In summary, FPs are much more inconvenient for the user, so should
be penalised much more heavily.

Too many FPs, and you've created so much work for the user, they would be
better off without your spam filter (a TCR of < 1.0) -- for example, the
MAPS RBL blacklists are reportedly this bad, according to effectiveness
rates posted by an analyst company 1.5 years ago. ;)

He also uses recall and precision metrics, which are the traditional
2-number metrics used in classification research, as far as I can see.

A citeseer, or even Google, search for his name will throw up these papers
quickly.

PS: one issue we ran into BTW is that the current crop of bayesian filters
do not make binary classifications; instead they typically classify mail
in the set { ham, unsure, spam } -- a triple-option classification.
Dealing with "unsures" with the traditional metrics is hard...

--j.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg