ietf-asrg
[Top] [All Lists]

RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key reduces spam attacks

2003-08-04 09:43:29
I don't see what's controversial by your below. I had to go to the
dictionary at least once and what I was able to make out by the rest
made sense to me.  And the statistical points are very appreciated.

-----Original Message-----
From: Terry Sullivan [mailto:terry(_at_)pantos(_dot_)org] 
Sent: Friday, August 01, 2003 10:32 AM
To: asrg(_at_)ietf(_dot_)org
Subject: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - 
Titan Key reduces spam attacks


(Oh boy, for my first posting to the list, I get to help stir up 
controversy.)

Hello all...

For my money, there are two disjoint issues here that have sorta 
gotten co-mingled in the discussion:

- statistical power

- absence of control condition

Statistically pedantic point: the number of observations in a sample 
influences the magnitude of the effect that can be detected 
analytically--nothing more, nothing less.  Statistical power 
increases monotonically with sample size.  More data are often handy, 
but a relatively small sample can be used successfully to detect very 
large statistical effects.  (Tangential statistically pedantic point: 
with the exception of 0.0, the slope of a regression line is utterly 
uninformative.)

The thing that precludes any legitimate causal inference with these 
data is the absence of a control condition.  But that's an issue of 
logic, not statistical power.  Observing a correlation is a great way 
to generate testable hypotheses; but a tenable claim of causality 
requires actually testing those hypotheses.

I confess that I've not seen these data.  (The .xls file hoses my 
non-Microsoft spreadsheet, and a platform-neutral format is 
unavailable.)  But seeing/not seeing these data doesn't make the 
causal claim viable.  In point of fact, my logs shows a measurable 
downward trend in total spam received since I installed my latest new 
filtering widget.  I am absolutely confident, however, that my 
filtering widget did not *cause* that decline.  It's just a happy 
coincidence.  

On a more philosophical note, I can't help but suspect that 
daily/weekly spam volume is simply way too "noisy" to serve as a 
meaningful standalone measure of anything.  There are just too many 
uncontrolled variables at play.  I've seen short-term longitudinal 
fluctuations in spam volume of 100% and more, and cross-sectional 
differences in excess of 50%.  Any "measure" with that much noise is 
too unreliable (in a statistical sense) to support meaningful 
analysis.

- Terry



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg






_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg