I love statistics, but is it possible that not-spam could be possibly 
called "not spam" rather than "ham" in the research-and-report 
context? The word "spam" creates enough difficulty on its own without 
adding another "zany techie word."
At 10:45 -1000 4/2/03, Clifton Royston wrote:
On Tue, Apr 01, 2003 at 02:18:34PM -0500, Liudvikas Bukys wrote:
 I slightly prefer the terms accept/reject over negative/positive.
 PROPORTION DEFINITION:
 I think that a 2x2 matrix is most straightforward presentation:
        {accept,reject} x {ham,spam}
        accept  reject  total
 ham    TA      FR      NHAM = TA+FR
 spam   FA      TR      NSPAM = FA+TR
 total  TA+FA   FR+TR   NTOTAL
 and the most helpful intuitive proportions (I think) would be
 FRp = FR/NHAM and FAp = FA/NSPAM.
 Numbers will ALWAYS be dependent on a particular corpus.
 However, I think that the definitions above will be stable
 (measuring classifier quality, not corpus composition) over
 a wide range of ham-spam ratios.  Using FR/NTOTAL or FA/NTOTAL
 will be dominated by ham-spam ratio of the test set, obscuring
 the performance of the classifier, making results unnecessarily
 corpus-specific.
  This is exactly correct, IMHO.
  Your matrix is a flipped version of the matrix I just presented in
proposing the FP/FN calculations to be used for the "dSpam" measure.
  -- Clifton
--
     Clifton Royston  --  LavaNet Systems Architect --  
cliftonr(_at_)lava(_dot_)net
  "If you ride fast enough, the Specialist can't catch you."
  "What's the Specialist?" Samantha says.
  "The Specialist wears a hat," says the babysitter. "The hat makes noises."
  She doesn't say anything else. 
                      Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg