ietf-asrg
[Top] [All Lists]

Re: [Asrg] Opt-out lists and legislation

2003-03-11 10:04:15
At 2:49 AM -0500 3/11/03, Valdis(_dot_)Kletnieks(_at_)vt(_dot_)edu wrote:
On Mon, 10 Mar 2003 18:39:57 EST, Kee Hinckley said:

 I currently have a sample database 22,000 confirmed spam messages
 sent to roughly 200 real email accounts.

 40% blocked by the country restriction.
   4% blocked due to obvious viruses.
 14% blocked due to system blacklist.
 <1% blocked by user blacklists.

 There's less than three percent overlap between those factors.  The

Actually, there's a hidden assumption here that means that there's
a lot MORE than 3% overlap. Your 14% system blacklist refers to a
blacklist that was tailored thinking "and this list doesn't include
anything from .XY because we country-restrict them already".

Yes and no. Country restrictions are per-user, so the blacklist needs to be multi-national. On the other hand, it's generated primarily based on spotting spam that is getting through from fixed sources, and that comes from user feedback. So there is a bias towards countries that people usually receive email from.

What percent of mail was tagged with the country restriction but *NOT*
tagged as spam by users? (For instance, it would be quite easy to flag

Initially the false positive rate starts a little high until the user tunes their filters by specifying which countries they regularly get email from. They can also approve a sender even if we are blocking the sender system wide--user rules win.

all mail from .CN as spam - and although my users would probably tag back
100% of the spam from .CN, they'd not tag 100% of the mail from .CN, as

They'd okay mail from China, and then possibly have to okay senders from some Chinese ISPs on a per-user basis. (E.g. I'm not sure, but I suspect that 163.net is on our blacklist, even though they are legit ISP.)

Is the "user blacklist" number the percentage caught by pre-established
user filters, or is that saying that your other checks were 99% effective
in identifying spam and only 1% got through to users for them to report?

Although users can pre-establish a blacklist, they tend not to. Instead we let them blacklist a sender at the time they report a false negative (spam got through). 1% is the number of subsequent messages blocked by those blacklists.

I've just spent a week or so changing the interface to that part of the system. The previous interface pretty much set up blocking the sender as the default when you reported spam. Now it's a secondary choice, because the fact of the matter is that blacklisting the sending address almost never works. The next email from the spammer uses a different address. So there's no point in filling your blacklist with fake addresses. (Or is it "might have existed by don't exist now" addresses :-). Instead we want to reserve blacklists for blocking email from real people (or domains) that you *really* don't want. So the primary action now is to report the problem so that we can look at it and figure out why we didn't block it. We've also added "unsubscribe" detection. If the message looks clean, but you think it's spam, we'll try unsubscribing you from it. A lot of the non-technical folks can't tell a commercial list that they accidentally subscribed to, from a spam message pretending to be a list. If we can get them off the list, we've done everyone a favor. We'll track responses to those of course--if they keep getting mail from the mailing list then we can put them on the blacklist.

Do you have any guesstimates of how much *unreported* spam got through
to the 200 accounts?

This gets back to the problem I mentioned earlier on the list. You can't trust users to check the email sitting in their blocked queue.

Because we are in (public) beta, our customers have been pretty good about trying to report everything, and we try and make reporting real easy (click on a URL in the message header). My (non-scientific) observation is that if they let all mail go through to their MUA, and filter there, they are better at reporting false negatives (spam in their inbox) than false positives (good stuff in their spam box). If they leave the spam on the server then we get the false positive report automatically because they can either "send" or "approve" the message. "Sending" it doesn't notify us, and it doesn't change their rules. "Approving" whitelists the sender and sends all messages currently held that are from that sender. *That* we get notification of.

Basically, I feel pretty good about our current stats because we know most of the beta testers and they know that they are providing us with useful information in exchange for free spam blocking. As we get into real users I expect that we'll see the accuracy drop off, especially after they've used the system for a while.
--
Kee Hinckley
http://www.puremessaging.com/        Junk-Free Email Filtering
http://commons.somewhere.com/buzz/   Writings on Technology and Society

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg