Re: [Asrg] 2.0 Metrics

On Wed, Apr 02, 2003 at 03:20:54PM -0700, Vernon Schryver wrote:

From: Clifton Royston <cliftonr(_at_)lava(_dot_)net>
  This does not take into account the differential cost of false
positives and false negatives; I do want to look at the Androutsopoulos
paper for this TCR measure.  However, I think it is potentially useful.
...


Because the differential costs of false positives and negatives differ
by orders of magnitude, any spam defense metric that consists of a
single number will have very limited utility.  Some people value not
receiving spam 100 times more than not receiving legitimate mail and
so are happy with 10% false positive rates.  Other organizatiosn will
not tolerate 0.1% false positive rates no matter how many clerks must
be hired to manually filter spam.  A metric that says one system has
a value of 10 and second has a value of 100 cannot tell you which
system (if either) is usable, not mention which is better according
to your lights.


  Thank you.  I do appreciate your taking the time to give me some
feedback.

  I don't fundamentally disagree with your argument; no one number can
tell you everything about a system.  You can't evaluate a car knowing
only its mpg, especially without knowing its intended use.  One
alternative is to always quote comparably computed FP *and* FN rates,
but it does seem that people have a hard time comparing those in the
common case where one pairwise comparison is greater and the other
smaller.

  However it's also true that "Almost anything can be measured in some
way that is better than not measuring it at all." (I forget what
software engineering book I stole this line from.)

  The standard truism in this area is that achieving a higher filter
rate (meaning a lower false negative rate) means accepting a higher
false positive rate; my motivation for suggesting dSpam is in part for
it to measure how much a particular system gets past that.

  The main merit to my mind of having one number is that it allows you
to make some meaningful comparisons of similar systems, where certain
factors are bounded or held constant in the comparison.  In the case
you mention above, if the organization demands FP < 0.1%, and has to
choose between tuning a system to give them 0.02% FP and 16% FN, or
0.025% FP and 2% FN, dSpam might give them an idea which to prefer.

  Liudvikas Bukys suggested to me off-list that the measure could be
parameterized to reflect the users's perceived cost of FP relative to
FN.  For example, dSpam[1000] might represent your organization which
values one lost mail more than 1000 extra spams, and dSpam[.01] might
reflect your user who cares about blocking spam 100 times more than
they care about losing mail.  This is a very attractive idea, and
though it requires some thought, it might make the measure much more
useful.

  -- Clifton

-- 
     Clifton Royston  --  LavaNet Systems Architect --  
cliftonr(_at_)lava(_dot_)net

  "If you ride fast enough, the Specialist can't catch you."
  "What's the Specialist?" Samantha says. 
  "The Specialist wears a hat," says the babysitter. "The hat makes noises."
  She doesn't say anything else.  
                      Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg