Re: [Asrg] 2. Analysis (specifically, trap addresses)
2003-10-01 21:09:42
Terry Sullivan wrote:
This particular message actually crosses a couple of different area
boundaries. (I assume the message will be recharacterized if/however
the chairs see fit.)
A little while back, I posted a message solicting ideas for a kind of
"shakedown run"--a "pilot project" that could be run in a short
period of time, and I promised to summarize the results back to the
list.
Well, I received a total of two responses. One proposed an
analysis-and-characterization of "passive" (i.e., archival) data (an
excellent proposal, but not quite suitable for "shakedown" purposes).
The second suggested studying greylisting along the lines of evan
harris' previous work. I've had a brief flurry of off-list
correspondence with folks regarding greylisting, and any pilot
project would, among other things, need to establish its own baseline
data.
Again, the whole idea behind doing a pilot study was to discover what
special requirements or problems anti-spam research studies might
face. And I think we've discovered the first: trap addresses and
their traffic.
Data from multiple independent sources (including Liam Meany, Scott
Nelson, and myself) indicate that "otherwise identical" trap
addresses receive *vastly* different amounts of spam traffic. This
unexplained variance in spam traffic is *very* large and all of it is
"statistically 'bad' variance." That "bad variance" at least
potentially compromises any ASRG effort that might use trap
addresses. (Without some sort of "balancing" in trap address
traffic, experimental effects might need to be as as large as an
order-of-magnitude in order to achieve "statistical significance.")
I can imagine a couple of possible solutions (though there may also
be others I haven't thought of). One might be for ASRG to "borrow"
trap addresses from large organizations/ISPs who were (or at least
might be?) willing to share. The other is for ASRG to have an
independent trap address maintenance effort. I floated this latter
idea a while back, and it generated almost no interest at all.
Obviously, a trap-address maintenance effort would be privvy to
extremely sensitive data, and would probably need to be a handful of
people (at most), whose efforts where "closed" to the larger group.
Ultimately, the need/desired outcome would be on-demand availability
of an appropriate number of comparably-trafficked trap addresses for
research purposes, said addresses to be selected randomly from a
larger pool.
Ultimately, it seems to me that there is a need for a some sort of
active (though not necessarily particularly time-consuming) trap
address maintenance effort. Otherwise, the only real opportunity for
*research* (as opposed to debate) lies in analysis/characterization
of archival data.
What is the bottom line? Can you elaborate on your desired steps and
what resources you require?
As for archival data, SpamArchive and others provide lots of it. The FTC
also maintains a spam archive and they might be open to the idea of
running something against it if we ask them.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
|
|