ietf-asrg
[Top] [All Lists]

Re: [Asrg] Lets Fix Mailing Lists

2003-03-10 09:20:05
At 2:42 AM -0500 3/10/03, Valdis(_dot_)Kletnieks(_at_)vt(_dot_)edu wrote:
On Mon, 10 Mar 2003 01:37:02 EST, Kee Hinckley said:
 We filter spam based solely on forged headers and private blacklists
 plus user preferences.  The number of those messages that have
 something faked is well over 50%.

OK.. it's been 2 decades since I took statisics, and I've *mostly*
recovered from being a math/physics major, but...

All this says is that if you didn't filter on blacklists and user prefs,
that 100% of the mail that you tagged as having forged headers have forged
headers.  Addition of the other 2 criteria water it down to "over 50%".

I was trying not to go into the grody details. Our false negative rate is <1%. Everytime a user adds someone to a blacklist, or reports a problem, that gets recorded as spam we missed. I'm counting that in the numbers.

Problem 1)  There's no indication of what percent of mail with forged
headers was actually spam - and without your definition of what counts
as "forged", it's hard to tell.  I know that *this* laptop has sent some
rather squirrelly headers when I'm travelling and borrowing some net access.

We aren't deleting any of the mail. The users scan the list and mark the false positives. My numbers were based on the results *after* user confirmation. As I said elsewhere, users do get sloppy, but not to an extent that's going to significantly change these numbers.

Problem 2) There's no indication of what percent of your total mail volume
got tagged as "forged headers".  If it's 1% of your total or 40% of your
total is important, as they have different implications for what solutions
have to be able to do...

We are currently running 26% spam across all users.

Problem 3) There's no indication of what sorts of overlaps there are between
the forged, blacklist, and preferences.  If most of the mail that's forged
is *also* being hit by blacklist and/or preferences, then maybe checking for
forged headers isn't productive.  If it's catching lots of spam that your
blacklists and preferences aren't catching, maybe that's indicative of problems
with those two schemes...

Right. As I said, we used to log that data, but we simply didn't have time to go over all of it. Once things settle down a bit we'll be doing that sort of analysis to tune our filters.

(And no Kee, I'm not picking on you in particular - I didn't accept Vernon's
DCC numbers at face value either.  Nobody here uses the same methodology,
and it's hard to avoid comparing apples to oranges without asking what
sort of seeds were planted in both cases....

No offence taken. Talking about spam with people is very much like talking about user interfaces. Everyone thinks they are an expert, because everyone deals with it every day. In fact it's much more like the blind men and the elephant. I don't claim to be immune to any of that. I have two data samples. One is somewhere.com, which is clearly an anomalous situation. Like striker, it's a harbinger of things to come, but its sample information is wildly biased by mailbombs, forged mailing lists subscriptions and viruses. My other sample is the PureMessaging spam filtering service (soon to be Messagefire... don't ask). It's closer to an end-user analysis, but the sample size is still small and we simply don't have time to turn it into a research project right now.
--
Kee Hinckley
http://www.puremessaging.com/        Junk-Free Email Filtering
http://commons.somewhere.com/buzz/   Writings on Technology and Society

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg