[Asrg] RE: 2.a.1 Analysis of Actual Spam Data

On Sun, 10 Aug 2003 12:00:05 -0400, 
Scott Nelson <scott(_at_)spamwolf(_dot_)com> wrote:

TS>>But my thoughts keep coming back to the sheer amount 
TS>>of statistical noise in something as volatile as spam 
TS>>volume.  That level of noisewill make meaningful 
TS>>analysis extremely difficult.

SN>No offense, but how do you know there's a large amount of noise
SN>in the data until after you attempt to measure it?
SN>I mean, sure we all /expect/ there will be a lot of noise,
SN>but has anybody actually tried to measure how much noise there is?

Well, first I used the time-honored (if admittedly crude) method of 
direct inspection; short-term fluctations of 100% and more can be 
spotted just by visually scanninng down a list of frequencies.

A somewhat more rigorous (and conveniently scale-invariant) 
"back-of-the-envelope" measurement of the amount of noise in a 
dataset can be obtained from Fisher's coefficient of variation 
(C.V.).  The C.V. for the two (recent) longitudinal samples I happen 
to have handy (from two totally independent sources) are both right 
about 0.30.  So, 30% of the total variance is pure noise (and thence 
"unavailable" for inferential purposes).

To appreciate the impact of a C.V. of 0.3, remember the "research 
question" here: does _A_ cause a reduction in _B_?  It'd be almost 
exactly equivalent to trying to tell how well (or even IF) your new 
diet is working, when the only scale you have to weigh yourself on 
reads 120 pounds one day and 230 the next, when your "actual" weight 
is about 170.  

All of which may help to explains why I've tried (up 'til now) to 
raise a cautionary flag: to have even the tinest hope of being 
detectable, this is gonna hafta be the "800-pound gorilla" of 
effects.  But having said my piece, I hereby resign as chairman of 
the Committee to Try to Save Other Folks' Time/Effort, and return to 
lurker mode.

- Terry




_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

RE: [Asrg] 7. Best Practices - DNSBLs - Article, Yakov Shafranovich

Next by Date:

Re: [Asrg] 6. Proposals - Filters that Fight Back, Chris Lewis

Previous by Thread:

Re: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key reduces spam attacks, Walter Dnes

Next by Thread:

RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key reduces spam attacks, Peter Kay

Indexes:

[Date] [Thread] [Top] [All Lists]