Dave CROCKER wrote:
Chris Lewis wrote:
Each of the numbers is
scaled differently in a computation something like this:
if (((complaints * cf + contentblocked * bf + trap * tf) / non-blocked)
1) {
go block the IP
}
[Notice that we're not factoring in blocked IP. Specifically to avoid
the thresholder locking up thru positive feedback ;-). They're blocked
anyway, so it doesn't matter.]
Where cf, bf, and tf are chosen thru experience and experimentation.
Relative to my question, what you've said is that you have a scaling factor
specifically for the TIS hits. That lets you tune its specific confidence
level.
But you don't cite any of the "good" attributes, nevermind scaling factors,
per
your reference to good/bad.
Not quite - the counts in the numerator are "bads" and in the
denominator the "goods". Each of the counts have scaling factors
(normalized so that "passed through"'s scaling factor is 1).
I don't look at them as confidence indicators, but rather as scaling
factors to derive some sort of notion of "what's the probability of a
given email from this IP being spam?".
I form that equation that way strictly thru force of habit, because
given the headroom I want it to have, pendantic accuracy is irrelevant.
A different equation for "block this IP?" would be something like this:
(complaints * cf + contentblocked * bf + trap * tf)
--------------------------------------------------- > maxratio
(complaints + contentblocked + trap + passthru)
Where if cf, bf and tf were all 1, is a simple ratio of spam/total.
It's probably easier to comprehend, but, might not have the
characteristics I want. I'll play with it.
I'm being pedantic because it was the differential nature of postive vs.
negative measurements that intrigued me, in terms of practical impact.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
https://www.irtf.org/mailman/listinfo/asrg