ietf-asrg
[Top] [All Lists]

Re: [Asrg] differential confidence

2008-12-04 13:16:01
```Dave CROCKER wrote:
```
```
Chris Lewis wrote:
```
```Do we block an IP on one TIS hit?  No.  We compute good/bad ratios and
have heuristics on when its high enough to do something about.
```
```

The "bad" number is affirmative.  People hit TIS.  As a measure, the bad
number
therefore has a 100% confidence level of accuracy (as long as we are careful

But where do you get the 'good' number from and is it really equally forceful?
```
```
...

```
```So, how do we factor in differential confidence levels in the final
assessment?
```
```
Ironically, I'm in the process of rebuilding this code at the moment ;-)

When you read this keep in mind that this is in _addition_ to all the
other filtering (including DNSBLs, both 3rd party and local) that we use.

Basically what we do is generate a score based on the number of
non-blocked emails, contentblocked emails (not IP-blocked), trap volumes
and complaints, and pick a threshold score.  Each of the numbers is
scaled differently in a computation something like this:

if (((complaints * cf + contentblocked * bf + trap * tf) / non-blocked)
```
```1) {
```
```        go block the IP
}

[Notice that we're not factoring in blocked IP.  Specifically to avoid
the thresholder locking up thru positive feedback ;-).  They're blocked
anyway, so it doesn't matter.]

Where cf, bf, and tf are chosen thru experience and experimentation.
There's also some gunk in there to deal with when the numbers are too
small to be significant (especially non-blocked == 0 ;-).  When the
non-blocked numbers are low, it doesn't matter very much whether you
block it or not anyway.

Note that there's also an implicit factor of how long the metrics are
over.  In the past it was 2 days.  Now it's probably going to 7 days
potentially with factoring in abrupt volume increases.

All of the metrics numbers have "100% confidence".  The scaling factors
are a confidence factor for each number.

They're somewhat predictable.

Eg: Our "TIS hit per spam" compliance factor is (currently) about 1 in
50.  Ignoring other factors, assuming smooth distribution, a cf of 25
will cause the IP to block when the output is 50% spam.  We put lots of
headroom in to allow for uneven distribution.

In the past we were using something like 50, 2 and .01 respectively for
cf, bf and tf.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
https://www.irtf.org/mailman/listinfo/asrg

```
 Current Thread [Asrg] A paper/project worth considering (found it!), Rich Kulawiec Re: [Asrg] A paper/project worth considering (found it!), J.D. Falk Re: [Asrg] A paper/project worth considering (found it!), Rich Kulawiec Re: [Asrg] A paper/project worth considering (found it!), J.D. Falk Re: [Asrg] A paper/project worth considering (found it!), Rich Kulawiec Re: [Asrg] A paper/project worth considering (found it!), Seth Re: [Asrg] A paper/project worth considering (found it!), Chris Lewis [Asrg] differential confidence, Dave CROCKER Re: [Asrg] differential confidence, Steve Atkins Re: [Asrg] differential confidence, Chris Lewis <= Re: [Asrg] differential confidence, Dave CROCKER Re: [Asrg] differential confidence, Chris Lewis Re: [Asrg] differential confidence, Dave CROCKER Re: [Asrg] differential confidence, Chris Lewis Re: [Asrg] differential confidence, Chris Lewis Re: [Asrg] differential confidence, J.D. Falk Re: [Asrg] differential confidence, David Nicol Re: [Asrg] differential confidence, der Mouse Re: [Asrg] differential confidence, Dave CROCKER Re: [Asrg] differential confidence, Michael Thomas