RE: [Asrg] 2.0 Metrics


Clifton Royston <cliftonr(_at_)lava(_dot_)net> wrote:

<much cut>

 The tentative definition for "dSpam" is:
10 * ( -log10(FP) - log10(FP) + log(1/4) )
 where the log(1/4) addition is a normalizing factor.  (This is
 equivalent to -10*log(FP*FN*4), etc.)


<more cut>

 Here are some problems with this metric, before everyone else points
them out:

1) The trivial systems which classify all mail as spam or all mail as
non-spam get arbitrarily high scores, because either lim(FN)->0 or
lim(FP)->0 as sample size increases.

2) Generally, a system which reduces either false positives or false
negatives to an extreme gets rated better than it "should"; e.g. dSpam
for 0.05% false negatives and 50% false positives = dSpam for a system
with .5% false positives and 5% false negatives, which would probably
be much more desirable.

3) The metric isn't biased against false positives, and should not be
used as the sole metric for a system.


There's a fairly straightforward modification which improves the metric in
these respects:
change the definition of dSpam to

dSpam = -10 * log10(FP + FN)

This fixes issues 1 and 2, and still seems to retain desirable properties.

To fix issue 3, it's easy enough to introduce a bias:
dSpam(b) = -10 * log10( 2*((b*FP) +FN) )/b+1)

Then you have a family of measures indexed by a bias value (positive real
number - limits as bias parameter approaches 0 or infinity are measures
based solely on FN or solely on FP respectively).

I think this still retains the desirable properties of the original
measure - for example dSpam for a toin toss filter is zero for all biases,
whatever the bias dSpam increases when either FN or FP decreases with the
other held constant, ...

Tom


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg