-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Michael Thomas wrote:
Does there not exist something like, oh say, BLreports that
judges you on false positive/false negative, coverage, timeliness,
etc?
In a general sense, large enough to be statistically significant with
the sort of accuracy required for traditional research rigor?
No. Unfortunately. There are too many variables (eg: what actually
_is_ a FP in a given environment), and correlating what is _REALLY_ HAM
vs. SPAM in a large enough environment to be significant is impractical.
["Spam" and "Ham" corpuses more than a day or two old are entirely
useless, no matter how big they are. Yesterday's post-facto analysis
(eg: training on spam) says _nothing_ about how well a particular
technique is going to work in real-time on tomorrow's spam.
Vendor: "In tests performed yesterday, our uber-fantastic spam filter
caught today 99% of spam [from a data set it was trained on from several
years ago]".
Me: "Something must be broken if it couldn't catch 100% of the spam
you've already trained it on".]
The best you can do is approximations that rely on various assumptions.
For example, in our environment with approximately a million emails per
day, approximately 75% of all spam is caught by one DNSBL, with a .01%
"would be FP if we didn't whitelist" rate.
But that carries with it various assumptions. Eg: what the volume of
spam really is (which we get by inference from another complex metric),
and that our FP handling process catches most of the FPs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iQCVAwUBRCwnFJ3FmCyJjHfhAQL8bgP9FWU1f8Ppwj3k4U8zll2LYcMrEiNVlfQT
1+BWdG2N0OaTEd58BhaBPve2fDVRL0D9SmdIlZ7havdyBKf2hXWE47uGk2GowySs
xrb7TT4p0h4m2ox1UQH6a5xkK46nrfVrfQItol3PVZPdov0QZiaeJDHS8545/XYn
8Fw4zsXx2Z0=
=Qnf7
-----END PGP SIGNATURE-----
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg