ietf-asrg
[Top] [All Lists]

Re: [Asrg] Opt-out lists and legislation

2003-03-11 00:50:22
On Mon, 10 Mar 2003 18:39:57 EST, Kee Hinckley said:

I currently have a sample database 22,000 confirmed spam messages 
sent to roughly 200 real email accounts.

40% blocked by the country restriction.
  4% blocked due to obvious viruses.
14% blocked due to system blacklist.
<1% blocked by user blacklists.

There's less than three percent overlap between those factors.  The

Actually, there's a hidden assumption here that means that there's
a lot MORE than 3% overlap. Your 14% system blacklist refers to a
blacklist that was tailored thinking "and this list doesn't include
anything from .XY because we country-restrict them already".

What's *really* there is a system blacklist that accounts for 54%
of catches, where 70% of the rules are country-based and the other 30%
are rules to catch stuff the country rules dont....

Pick a country .XY and analyze it carefully - it's fairly likely that
if you didn't filter the country, you'd blacklist 3-4 spamhauses that
are 95% of the problem in that country.

The important question of course becomes whether or not the *rest* of
that country's population will start using e-mail enough to increase the
risk of false positives and skew your stats... ;)

rest are blocked solely on problems we saw with the headers.  There's 
certainly overlap between that and the other factors, but we don't 
currently log it specifically, so I don't know how much.

It would be interesting and informative to have some other numbers.

What percent of mail was tagged with the country restriction but *NOT*
tagged as spam by users? (For instance, it would be quite easy to flag
all mail from .CN as spam - and although my users would probably tag back
100% of the spam from .CN, they'd not tag 100% of the mail from .CN, as
many have relatives there.. The fact that 40% of spam fails the country
test is not at all a reliable predictor unless there is a near-zero rate
of non-spam that fails the country test.

Is the "user blacklist" number the percentage caught by pre-established
user filters, or is that saying that your other checks were 99% effective
in identifying spam and only 1% got through to users for them to report?
Do you have any guesstimates of how much *unreported* spam got through
to the 200 accounts?

Or to turn up the satire, and point out the problem with the analysis:

40% of spammers drank milk at breakfast the day they spammed

I saw an amusing statistic once that 99.97% of all felonies are committed
while breathing air.... ;)

Attachment: pgp8n3oaufndL.pgp
Description: PGP signature