ietf-asrg
[Top] [All Lists]

RE: [Asrg] 2.a.1 Analysis of Actual Spam Data - Experimental Design

2003-08-18 02:53:53
I would take issue with one of your bullet points (the rest look pretty
good).

- In a perfect universe, all addresses
  in both groups are served from one
  and only one mail server.  That way,
  "server status" affects all address
  pairs in both groups identically.

If you do that, all you've demonstrated is that at one particular mail
server the null hypothesis appears valid or invalid.  It's perfectly
possible that spammer behaviour varies acording to destination domain (top
level domain, anyway) - it certainly appearc to me that it does in practise
so vary, since I've not seen French language spam at any of my .uk
addresses, but have seen it at .fr and .be addresses. Maybe paying attention
to 550s is something that varies in the same way.

On the question of measurement, I think it's perfectly reasonable to measure
daily volumes, then look at the data and see what is a reasonable period to
use for noise reduction - your point about measuring the right thing rather
than doing noise reduction after measuring the wrong thing is not valid when
both (a) you do not know a priori what the right thing is and (b) you can
take a measurement which contains all the information that would have been
supplied by your guessed "right thing" anyway.

Tom


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg