ietf-asrg
[Top] [All Lists]

Re: [Asrg] 2.0 Metrics (Was Re: [Asrg] spamstones)

2003-04-02 13:54:12
  Arrgh!  Following up to correct a couple critical typos below which I
didn't catch when proofreading before sending it.  Sorry, I plead
Murphy's law.

On Wed, Apr 02, 2003 at 10:38:23AM -1000, Clifton Royston wrote:
  I define FP and FN with the provision that they are not allowed to
be = 0, but otherwise in the standard way:  

  measure     spam category       ham category 
  -------     ---------------     --------------
  flagged     N(flagged-spam)     N(flagged-ham)
  unflagged   N(unflagged-spam)   N(flagged-ham)
  -------     ---------------     --------------
  total       N(spam)             N(ham)

  Then FP = max(N(flagged-ham),0.5 ) / N(ham)
       FN = max(N(unflagged-spam),0.5) / N(spam)

  Using the minimum of 1.5 for the numerator avoids undefined values in

                         0.5
 
the log computation, and also deliberately penalizes results claiming
"zero false positives" or "zero false negatives" if they use a small
sample size.

  The tentative definition for "dSpam" is:
 10 * ( -log10(FP) - log10(FP) + log(1/4) ) 

   10 * ( -log10(FP) - log10(FN) + log(1/4) ) 

   i.e. includes the log of both false positive rate and false negative
rate, not FP twice.

  -- Clifton

-- 
     Clifton Royston  --  LavaNet Systems Architect --  
cliftonr(_at_)lava(_dot_)net

  "If you ride fast enough, the Specialist can't catch you."
  "What's the Specialist?" Samantha says. 
  "The Specialist wears a hat," says the babysitter. "The hat makes noises."
  She doesn't say anything else.  
                      Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>