[Asrg] 2. Update from the Analysis & Characterization Group

Here's a brief update from the small but highly motivated A&C 
subgroup.

There are multiple experiments underway, at various stages of 
completion.  A&C members are looking at where spam comes from, 
comparing specific characterization/filtering technologies, and in 
general trying substitute empirical data for opinion on almost 
anything related to spam.

Two recent experiments have now borne fruit.  One's interesting 
enough, though not exactly Earth-shattering; the other is much more 
portentious.

1) We undertook an experiment to address the 
   seemingly commonplace belief that "different 
   people get different spam."  For purposes of
   this experiment, we analyzed two large 
   samples of spam (several-K relatively recent
   messages, gathered independently), as well as 
   multiple smaller samples.

   Result: Within nominal limits, everyone gets 
   pretty much the same spam.  While there *may*
   be such a thing as "targeted" spam (which 
   might result from, say, subbing to a 
   particular list or ordering a particular 
   product), the volume of that spam is dwarfed 
   by the "shotgun" spam that everyone gets.

2) We undertook an experiment to address the 
   equally commonplace belief that "spam is 
   volatile," and that spam and spammer tactics
   change rapidly.  This experiment was based on
   analysis of about 2,500 spams accumulated 
   over a period of 2.5 years.

   Result: "Glacial" maybe, but "volatile"?? 
   The closest we've been able to come to 
   identifying "volatility" is a kind of 
   "punctuated equilibrium" model.  Spam does 
   change over time, but *very* slowly.  

   We hope to present the first set of results 
   from this experiment at the MIT spam 
   conference in January.  We're also looking at
   conducting a follow-on experiment, and, if 
   the results cooperate, envision a submission 
   to CACM early next year.

3) Though not an experiment _per se_, we're 
   currently trying to knock down testable
   theories about why any two addresses that are
   *seemingly* nearly-identical in terms of 
   visibility get gob-smackingly different amounts
   of spam.  Several obvious possibilities were
   refuted very quickly, while others have not yet
   been refuted.  (On behalf of A&C, I'd like to 
   thank Scott Nelson for recently taking the time 
   to help us refute one candidate theory.)

   (This latter effort may not sound like much, 
   but unless we can eliminate that "pre-existing" 
   variance, or at least learn enough about it to 
   factor it out, any sort of fruitful/meaningful 
   real-time volumetric study is probably 
   impossible.)

(We now return to regularly scheduled ASRG content.)



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg