[Top] [All Lists]

[Asrg] 2. Analysis (specifically, trap addresses)

2003-10-01 10:21:30
This particular message actually crosses a couple of different area 
boundaries.  (I assume the message will be recharacterized if/however 
the chairs see fit.)

A little while back, I posted a message solicting ideas for a kind of 
"shakedown run"--a "pilot project" that could be run in a short 
period of time, and I promised to summarize the results back to the 

Well, I received a total of two responses.  One proposed an 
analysis-and-characterization of "passive" (i.e., archival) data (an 
excellent proposal, but not quite suitable for "shakedown" purposes).  
The second suggested studying greylisting along the lines of evan 
harris' previous work.  I've had a brief flurry of off-list 
correspondence with folks regarding greylisting, and any pilot 
project would, among other things, need to establish its own baseline 

Again, the whole idea behind doing a pilot study was to discover what 
special requirements or problems anti-spam research studies might 
face.  And I think we've discovered the first: trap addresses and 
their traffic.

Data from multiple independent sources (including Liam Meany, Scott 
Nelson, and myself) indicate that "otherwise identical" trap 
addresses receive *vastly* different amounts of spam traffic.  This 
unexplained variance in spam traffic is *very* large and all of it is 
"statistically 'bad' variance."  That "bad variance" at least 
potentially compromises any ASRG effort that might use trap 
addresses.  (Without some sort of "balancing" in trap address 
traffic, experimental effects might need to be as as large as an 
order-of-magnitude in order to achieve "statistical significance.")

I can imagine a couple of possible solutions (though there may also 
be others I haven't thought of).  One might be for ASRG to "borrow" 
trap addresses from large organizations/ISPs who were (or at least 
might be?) willing to share.  The other is for ASRG to have an 
independent trap address maintenance effort.  I floated this latter 
idea a while back, and it generated almost no interest at all.

Obviously, a trap-address maintenance effort would be privvy to 
extremely sensitive data, and would probably need to be a handful of 
people (at most), whose efforts where "closed" to the larger group.  
Ultimately, the need/desired outcome would be on-demand availability 
of an appropriate number of comparably-trafficked trap addresses for 
research purposes, said addresses to be selected randomly from a 
larger pool.

Ultimately, it seems to me that there is a need for a some sort of 
active (though not necessarily particularly time-consuming) trap 
address maintenance effort.  Otherwise, the only real opportunity for 
*research* (as opposed to debate) lies in analysis/characterization 
of archival data.

- Terry

Asrg mailing list