Re: [Asrg] A paper/project worth considering (found it!)

On Sun, Dec 14, 2008 at 05:30:24PM -0500, Chris Lewis wrote:

I don't understand this.  I tried to explain this phenomena before.
Didn't you take statistics somewhere?


Given that most of my graduate work was in statistical pattern recognition,
I think that's safe to presume. ;-)

Let's say for sake of argument, AOL's users have a 5% error rate.  5% of
what they report via TIS isn't spam.  That means, on average, 95 out of
100 reports are accurate and it is spam.

You have a FBL.  But you don't send any spam, right?  You only get your
share of the error rate, and none of the accurate ones - because you
don't send any spam.

So, from your perspective, the TIS button is 100% wrong.  For _you_ it
is.  But it's NOT reflective of TIS hits against a network that sends spam.


Of course not: you're correct.  But it is reasonable to presume that a
user population which has generated a 100% error rate on the FP side has
also generated a substantial error rate on the FN side.  (That is, there's
no reason to think they're any more accurate one direction or the other.)

Are you contending that Comcast's or Yahoo's FBLs are yielding correct
TIS hits?  Or do you have FBLs with them at all?


Some of the operations I manage/consult to do.  I haven't completed
analysis of all those yet, so I'm quasi-reserving judgment.  But so far,
out of the data I *have* analyzed: 100% FPs.  It'll be months before
I'm done, I expect, because that's what it took to go through the AOL
results with a mix of automated/manual processes, and to cross-check
against logs, and so on.

What the TIS button does is help highlight situations where the
anti-spam filters aren't working.


Perhaps.  But I think all such instances need to be passed by a clueful,
experienced human for manual review.  That is, I think aggregating the
data and presenting to a person with a note that says "there may be
a problem here" is reaonable, but automated action based on end-user
reports alone is a bad idea.

I also think a much better approach -- which allows a higher degree
of automation because it removes users from the equation -- is to
run a large number of local and remote spamtraps.  After all, if spammer X
targets A local "real" users, then it seems reasonable to guess that X
will also target B local spamtraps and perhaps C remote spamtraps.
Correlation of data between all these makes it possible to identify at
least some spammers before users ever get a chance to use the TIS button.
(Yes, this is a methodology I use, and I use it based on connecting IP
address alone -- that is, I ignore everything else.  Any IP address
connecting to a sufficient number of sufficiently-diverse MXs and
attempting delivery to a sufficient number of spamtraps is up to no good
and is treated as such.)

---Rsk
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
https://www.irtf.org/mailman/listinfo/asrg