ietf-asrg
[Top] [All Lists]

RE: [Asrg] 2.a.1 Analysis of Actual Spam Data - Experimental Design

2003-08-18 06:32:34
On Mon, 18 Aug 2003 10:51:46 +0100, Tom Thomson wrote:

I would take issue with one of your bullet points (the rest look 
pretty
good).

- In a perfect universe, all addresses
  in both groups are served from one
  and only one mail server.  That way,
  "server status" affects all address
  pairs in both groups identically.

  If you do that, all you've demonstrated is that at one particular 
mail
server the null hypothesis appears valid or invalid.  It's perfectly
possible that spammer behaviour varies acording to destination 
domain (top
level domain, anyway) - it certainly appearc to me that it does in 
practise
so vary, since I've not seen French language spam at any of my .uk
addresses, but have seen it at .fr and .be addresses. Maybe paying 
attention
to 550s is something that varies in the same way.

I feel pretty confident that one box can respond to requests sent to 
multiple IP addresses, and therefore can serve as home to an 
arbitrarily large number of different domains.  If these email 
addresses "live" on 60 different machines, then there will be an 
additional mechanical step of "synching" the data from each machine.  
Then too, keeping one machine up for the experimental period strikes 
me as less "overhead" than keeping 60 machines going.  That I can 
see, using multiple boxes only serves as a potential confound, 
because server availability affects spam volume in a systematic way.  
If one machine (or  worse, two) go(es) "hard down" for a week or two, 
the results of the larger experiment are placed at risk.  Addressess 
served by that/those machine(s) will have a lower spam volume, of 
course, but not because of the indepdendent variable.  But, as I 
said, this feature qualifies as nice-to-have, but not required.

However, to the extent that there is some reasonable basis for 
believing that spammers respond differentially to 550s from different 
TLDs, then that imposes an additional requirement: keep the number of 
TLDs small (say, 3: .com/.org/.net), or use a LOT more addresses.

- Terry



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg