Re: [Asrg] 2. Analysis (specifically, trap addresses)

Terry Sullivan wrote:

This particular message actually crosses a couple of different areaboundaries. (I assume the message will be recharacterized if/howeverthe chairs see fit.)
A little while back, I posted a message solicting ideas for a kind of"shakedown run"--a "pilot project" that could be run in a shortperiod of time, and I promised to summarize the results back to thelist.
Well, I received a total of two responses. One proposed ananalysis-and-characterization of "passive" (i.e., archival) data (anexcellent proposal, but not quite suitable for "shakedown" purposes).The second suggested studying greylisting along the lines of evanharris' previous work. I've had a brief flurry of off-listcorrespondence with folks regarding greylisting, and any pilotproject would, among other things, need to establish its own baselinedata.
Again, the whole idea behind doing a pilot study was to discover whatspecial requirements or problems anti-spam research studies mightface. And I think we've discovered the first: trap addresses andtheir traffic.
Data from multiple independent sources (including Liam Meany, ScottNelson, and myself) indicate that "otherwise identical" trapaddresses receive *vastly* different amounts of spam traffic. Thisunexplained variance in spam traffic is *very* large and all of it is"statistically 'bad' variance." That "bad variance" at leastpotentially compromises any ASRG effort that might use trapaddresses. (Without some sort of "balancing" in trap addresstraffic, experimental effects might need to be as as large as anorder-of-magnitude in order to achieve "statistical significance.")
I can imagine a couple of possible solutions (though there may alsobe others I haven't thought of). One might be for ASRG to "borrow"trap addresses from large organizations/ISPs who were (or at leastmight be?) willing to share. The other is for ASRG to have anindependent trap address maintenance effort. I floated this latteridea a while back, and it generated almost no interest at all.
Obviously, a trap-address maintenance effort would be privvy toextremely sensitive data, and would probably need to be a handful ofpeople (at most), whose efforts where "closed" to the larger group.Ultimately, the need/desired outcome would be on-demand availabilityof an appropriate number of comparably-trafficked trap addresses forresearch purposes, said addresses to be selected randomly from alarger pool.
Ultimately, it seems to me that there is a need for a some sort ofactive (though not necessarily particularly time-consuming) trapaddress maintenance effort. Otherwise, the only real opportunity for*research* (as opposed to debate) lies in analysis/characterizationof archival data.

What is the bottom line? Can you elaborate on your desired steps andwhat resources you require?

As for archival data, SpamArchive and others provide lots of it. The FTCalso maintains a spam archive and they might be open to the idea ofrunning something against it if we ask them.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg