[Top] [All Lists]

Re: [Asrg] 2. Analysis (specifically, trap addresses)

2003-10-01 21:09:42
Terry Sullivan wrote:

This particular message actually crosses a couple of different area boundaries. (I assume the message will be recharacterized if/however the chairs see fit.)

A little while back, I posted a message solicting ideas for a kind of "shakedown run"--a "pilot project" that could be run in a short period of time, and I promised to summarize the results back to the list.

Well, I received a total of two responses. One proposed an analysis-and-characterization of "passive" (i.e., archival) data (an excellent proposal, but not quite suitable for "shakedown" purposes). The second suggested studying greylisting along the lines of evan harris' previous work. I've had a brief flurry of off-list correspondence with folks regarding greylisting, and any pilot project would, among other things, need to establish its own baseline data.

Again, the whole idea behind doing a pilot study was to discover what special requirements or problems anti-spam research studies might face. And I think we've discovered the first: trap addresses and their traffic.

Data from multiple independent sources (including Liam Meany, Scott Nelson, and myself) indicate that "otherwise identical" trap addresses receive *vastly* different amounts of spam traffic. This unexplained variance in spam traffic is *very* large and all of it is "statistically 'bad' variance." That "bad variance" at least potentially compromises any ASRG effort that might use trap addresses. (Without some sort of "balancing" in trap address traffic, experimental effects might need to be as as large as an order-of-magnitude in order to achieve "statistical significance.")

I can imagine a couple of possible solutions (though there may also be others I haven't thought of). One might be for ASRG to "borrow" trap addresses from large organizations/ISPs who were (or at least might be?) willing to share. The other is for ASRG to have an independent trap address maintenance effort. I floated this latter idea a while back, and it generated almost no interest at all.

Obviously, a trap-address maintenance effort would be privvy to extremely sensitive data, and would probably need to be a handful of people (at most), whose efforts where "closed" to the larger group. Ultimately, the need/desired outcome would be on-demand availability of an appropriate number of comparably-trafficked trap addresses for research purposes, said addresses to be selected randomly from a larger pool.

Ultimately, it seems to me that there is a need for a some sort of active (though not necessarily particularly time-consuming) trap address maintenance effort. Otherwise, the only real opportunity for *research* (as opposed to debate) lies in analysis/characterization of archival data.

What is the bottom line? Can you elaborate on your desired steps and what resources you require?

As for archival data, SpamArchive and others provide lots of it. The FTC also maintains a spam archive and they might be open to the idea of running something against it if we ask them.

Asrg mailing list