ietf-asrg
[Top] [All Lists]

Re: [Asrg] A method to eliminate spam

2003-03-19 12:58:43
Kee Hinckley wrote:
At 7:21 AM -0500 3/19/03, Daniel Feenberg wrote:

overloaded, even the most recalcitrant owners eventually close them. In
the end (the "Nash equilibrium") many sites subscribe to a black hole
list, nearly all open relays are closed, and there is no need for
universal agreement to get to that end. It may take a while though.

Why do you think it hasn't happened already. Those lists have been around for years. Have open-relays significantly decreased?

As a percentage of spam?  Absolutely.

Total? Well, given that spam itself is exponentially increasing, I'm not sure whether we can measure that, especially since open proxy/socks has become the technique de-jour. But the numbers I'm going to show below are suggestive that open relay is nowhere near the problem it once was.

Your message prompted me into doing something I should have done for a while - wire in individual blacklist effectiveness into our metrics.

And here are the numbers for the past week - these are based on recipient counts, not message counts.

The first table talks exclusively about the results of our spamtrap, and shows relative effectiveness of the blacklists on a "pure spam" feed.

The second table talks exclusively about the results of the mail addressed to our real users.

The individual lists are annotated when they first appear.

Numbers are counts for the corresponding entry, and percentage of total email received.

Blacklist effectiveness spamtrap only:

BOPM            3666774  50.73 (open proxy/socks)
Flonetwork          233   0.00 (Flowgo/dartmail/doubleclick static list)
IP, NOT BL       101140   1.40 (local "hard" manual blacklist,
                                being phased out)
MONKEYPROXY     4579195  63.36 (open proxy/socks)
NTblack          905852  12.53 (local automated proxy/socks/relay [+])
NTmanual         326783   4.52 (manual blacklist, new version)
OBproxies       1459108  20.19 (proxies/socks)
OBrelays         462877   6.40 (relays)
OK                   42   0.00 (whitelist)
OSinputs         836741  11.58 (Osirus relays)
OSproxy          136594   1.89 (Osirus proxies)
OSsocks         1798424  24.88 (Osirus socks)
SBL              562940   7.79 (SpamHaus spamsource BL)
TOTAL           7227413 100.00
TOTAL BLOCK     6063477  83.59 (total would-be blocked by blacklists)


Blacklist effectiveness on real email:
BOPM             100635   5.34
CONTENT           54802   2.91 (non-IP based filters, not used
                                on spamtrap)
Flonetwork         6096   0.32
IP, NOT BL        34946   1.85
MONKEYPROXY      135285   7.17
NTblack           38608   2.05
NTmanual          30370   1.61
OBproxies         46420   2.46
OBrelays          17419   0.92
OK                 5330   0.28
OSinputs          31922   1.69
OSproxy            2121   0.11
OSsocks           54144   2.87
SBL               51825   2.75
TOTAL           1885655 100.00
TOTAL BLOCK      316567  16.79 (total blocked)

As you can see, relays are quite low. Notice how monkeyproxy and BOPM both trap more than 50% of all inbound spam (to the spamtrap, which is by definition 100% spam - bounces and viruses are already stripped out).

Notice how the blacklists catch 84% of _all_ spam. Pretty darn good actually. But not perfect. That's why we do content-based too.

My guess is that too many people are reluncant to use them. As has been discussed here, black hole lists have a reputation for lack of accountability.

They have a reputation for that, but that's largely false. BOPM, OB* (these two are private lists, but you'd know who it was and how to contact them if you ever hit a OB* blacklist block), MONKEYS[*], OSIRUS and SBL have _excellent_ reputations, and good accountability/contactability.

If automated they have a serious problem with false positives.

This is what the reputation is, but it's pure nonsense. While it is true that "open relay" blacklists have a higher percentage of false positives than the others, the numbers are still _extremely_ low. Secondly, the automated testers are the most accessible ones for fixing of false positives. ORDB is probably the very best of the group - instant delist with subsequent retest and relist if necessary.

[We can't use ORDB, because we have to do zone transfers, and ORDB doesn't permit that.]

And I can show that from the above tables.

First a comment on the "OK" entry. Our procedure for a false positive on a blacklist of any kind causes us to immediately enter a whitelist entry, and queue up a retest to each of the blacklists (where appropriate) for retests. Automated-almost-to-a-single-keystroke process.

[We immediately whitelist, because our DNSBL implementation is by zone-transfer and DNS zone file build. The average latency for a 3rd party delist via these mechanisms can be well in excess of 24 hours.]

Furthermore, many of these whitelist entries are for whole ranges we do in our local blacklist (like 200.148/16 and 200.158/16), and we've just opened up a hole for the _only_ legit mailer in the whole block. [%]

What we don't have right at the moment, is a mechanism for stripping out whitelist entries once the original blacklist entry disappears. I'm working on it, I'm working on it ;-)

So, the "OK" entries are _every_ mail server we've ever whitelisted, despite the fact that the original blacklisting entry has probably long disappeared - so, the "OK" entries are considerably _higher_ than our blacklists would actually block. Further, many of them are not from third party blacklists, but rather from our local listings. Only 42 for the spamtrap. .28% for the production mail. If I were presently able to remove the whitelist entries for the machines no longer open, the numbers would be probably be under .01% for our production systems too.

We get less than 5 false positive reports on average per day.

Spot checks show that at least 95% of all whitelist/retests we've issued have taken effect on the corresponding 3rd party blacklist. Except monkeys[*]

But again, it's true that open relay blacklists have higher false positive rates. Despite being responsible for perhaps 3-4% of all of our IP-based blocks, somewhat more than half of our IP-based false positives are with open relay blacklists. And most of those are with OBrelays.

Why is that?  Simple:

1) machines that were open relays are more likely to have been intended to send email than a simple open proxy or socks server, so, "legit" users are more likely to hit a blacklist entry. Most open proxy or socks hits are _not_ mail servers and were never intended to be. So nobody notices. Nobody cares either (except the spammer, but they don't notice).

2) Lesser used blacklists have higher FP rates, because fewer legit senders hit them. OBrelays is only used by two sites: us, and its maintainer. Despite being _large_ (OB is > 30 million mail addresses), it's still small compared to the coverage of the other lists, hence the relatively higher FP percentage.

3) Most of the open relay FPs are servers that are no longer open but didn't have enough BL coverage to notice. Most of the open proxy/socks hits are servers that are still open.

What does this all mean?

Well, what Joe said - perhaps our "filtering BCP" should _explicitly_ state that all mail filtering systems should be using well known and reputable open relay and open proxy/socks blacklist.

In this way we encourage much greater coverage, so that (a) site owners find out much quicker they have a problem and (b) stale entries are cleaned up much faster. In other words, list accuracy is vastly improved, and broken servers are fixed much faster. Open proxy/socks blacklist usage is already "best practise" with IRC servers. See the BOPM web site.

If manual they cost money. While individuals may have some degree of tolerance for false positives, most companies and ISPs are not so tolerant--all it takes is one bad instance and you're all over the press (college admissions notifications blocked, Mac.com blocking domain renewal emails...).

Look at the above numbers, and remember who we are. Obviously, we're VERY intolerant of false positives. We're doing fine.

[*] I have an issue with MONKEYSPROXY because the criteria for removal isn't "just fix the open socks or proxy and ask for retest" - because asking for the retest has other extraneous requirements. In effect, a MONKEYSPROXY entry either means you have an open proxy/socks, OR, you may simply not have been able to formulate a retest request that MONKEYS would accept We can't do third-party retest requests with MONKEYS, for example.

This does not seem to cause _us_ much trouble in practise (since we whitelist), but if you're high volume like us and not actively whitelisting like us, it may make you think twice about using it, despite how good it is. I'd rather it followed the BOPM or ORDB model here. Still and all, I think we've gotten 5 false positive reports for Monkeys in 3 months.

[+] automated testing is triggered by at least 3 spam-in-hands in a day hitting our spamtrap, one week minimum testing interval. 3 week no repeat expiration. Ignored/not tested/listed if IP already blacklisted elsewhere. Allows us to automatically detect "new" open relays/proxies/socks hitting the spamtrap and publishing blacklist entries to production servers. Experimental. May be decommissioned.

[%] 1000+ IPs spewing email from us from a /16, and 98%+ of them are already listed as open relays/socks/proxies. The rest of them are behaving as if they are. Sigh.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg