ietf-asrg
[Top] [All Lists]

Re: [Asrg] Nucleus of a draft BCP on filtering

2003-03-08 10:36:11
Vernon Schryver wrote:
From: "Chris Lewis" <clewis(_at_)nortelnetworks(_dot_)com>
Alan wrote:
 One glaring difference in behaviour is volume.  But the net isn't
currently set up to even be *aware* of volume.

What about things like the DCC?  The fundamental purpose and idea of
the DCC is to measure the volume of mail messages.

What about things like the DCC?

No, seriously, I think it's instructive to look at the experience with Usenet spam, because it's "farther along the learning curve". There are fundamental differences, which I'll get into in a moment, but much of what I'm going to talk about mirrors the experience with email exactly - only more so.

And with all due respect, DCC isn't ubiquitous enough to really warrant much specific attention from the ratware authors.

In the beginning, there were rate detectors, such as the one I operated starting in 1994 which simply detected the same From: appearing in over X postings in Y minutes.

Human being checks, yup, it's the same stuff, take remedial action. Few false detections, never FPs.

That worked well for quite a while (a couple of years). Never automated that tool to take remedial action on its own - because there _were_ FPs.

Then spammers got smart, and started minimally mutating from lines and subjects, yet, the bodies tended to remain pretty similar.

Then we started doing things like matching NNTP-Posting-Hosts (sound like the peerIP versus bounce rate proposal perhaps? ;-)

Then, Usenet's "DCC" came on the scene, in the guise of something called Cosmo Roadkill. CR does virtually the same thing that DCC does - computes MD5 checksums of dehashed article bodies. Only difference being there was only one of them - it wasn't distributed, but it didn't need to be because of the nature of Usenet.

My detector quickly became obsolete. CR did a vastly better job. Overnight CR outstripped my stuff by an order of magnitude. All unattended. No FPs. So, I essentially switched roles to the small amount of stuff CR didn't handle.

Then came crackerbuster emulators, such as Hipcrime/NewsAgent, the pheromone spammer and such like. Hashbusting galore. As much as hundreds of lines of random words or sentences specifically designed to screw around with "volume detectors". You should have seen some of the gyrations the manual filter designers went through to cope. (ie: extremely complicated regexp analysis of message-id headers looking for artifacts in random number generators)

The changes were dramatic. One day, I was it. Next day CR was it. A little while later, CR's effectiveness dropped to 20%, and Usenet anti-spam relied on (and still does) a small number of singularly _stubborn_ people eyeballing what was going on - which became less and less useful because Hipcrime was also attacking the "curative measures" themselves.

Not only were there hash busting, there were people searching for open NNTP relays (open smtp relays anyone?). Later open proxies (Hipcrime may well have pioneered the technology that's now used to do open socks/http proxying - Hipcrime had his roots in writing some moderately ineffectual (by today's standards) email spamming tools). "Fake" ISPs. The works.

So after 10 years, where is the Usenet now? Well, luckily, not bad (in the mainstream groups). Is that because of the effectiveness of the tools? For a time it was. But now even the tools don't work that effectively for a variety of reasons. The tools were a stopgap. Nowadays, most ISPs capable of running Usenet in a serious fashion have learned how to secure their systems, so abuse is much harder, and a lot more people see every incident.

That's the difference: Usenet is a broadcast medium, and people making bad on it are a lot more obvious. The tools bought time.

But, rogues still occur (see David Ritz's proposed UDP of Telstra).

The main reason why I say this is, like CR, DCC will have a distinct lifetime during which it's useful.

Ask yourself, how well would DCC fair if someone with the experience of you or I started writing ratware with what we know? The thought is appalling.

The lower layers have since practically the beginning been set up to
measure rates and some ISPs have since the 1980's been set u to "specify
and enforce rate limiting."  Recall the rates that said "buy a T1 and
pay for a 56K unless and until you use more."

IIRC, that wasn't so much a factor of detecting the rates, but simply not installing a full set of line cards. At least that is what it was with our partial T3. That would be hard to automate ;-)

You also said "in such a way as to seriously affect spam".  The fact
that all email including spam is a small part of the total bandwidth
used kills the idea of doing much about spam by counting raw bits.

Not even striker can much, even with 99.9999% of their bandwidth being connection attempt rejects...

That's what I thought (and said). I'm hoping someone will prove the both of us wrong. But I'm not holding my breath.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg