On Tue, Apr 01, 2003 at 02:18:34PM -0500, Liudvikas Bukys wrote:
I slightly prefer the terms accept/reject over negative/positive.
PROPORTION DEFINITION:
I think that a 2x2 matrix is most straightforward presentation:
{accept,reject} x {ham,spam}
accept reject total
ham TA FR NHAM = TA+FR
spam FA TR NSPAM = FA+TR
total TA+FA FR+TR NTOTAL
and the most helpful intuitive proportions (I think) would be
FRp = FR/NHAM and FAp = FA/NSPAM.
Numbers will ALWAYS be dependent on a particular corpus.
However, I think that the definitions above will be stable
(measuring classifier quality, not corpus composition) over
a wide range of ham-spam ratios. Using FR/NTOTAL or FA/NTOTAL
will be dominated by ham-spam ratio of the test set, obscuring
the performance of the classifier, making results unnecessarily
corpus-specific.
This is exactly correct, IMHO.
Your matrix is a flipped version of the matrix I just presented in
proposing the FP/FN calculations to be used for the "dSpam" measure.
-- Clifton
--
Clifton Royston -- LavaNet Systems Architect --
cliftonr(_at_)lava(_dot_)net
"If you ride fast enough, the Specialist can't catch you."
"What's the Specialist?" Samantha says.
"The Specialist wears a hat," says the babysitter. "The hat makes noises."
She doesn't say anything else.
Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg