On Sun, 10 Aug 2003 12:00:05 -0400,
Scott Nelson <scott(_at_)spamwolf(_dot_)com> wrote:
TS>>But my thoughts keep coming back to the sheer amount
TS>>of statistical noise in something as volatile as spam
TS>>volume. That level of noisewill make meaningful
TS>>analysis extremely difficult.
SN>No offense, but how do you know there's a large amount of noise
SN>in the data until after you attempt to measure it?
SN>I mean, sure we all /expect/ there will be a lot of noise,
SN>but has anybody actually tried to measure how much noise there is?
Well, first I used the time-honored (if admittedly crude) method of
direct inspection; short-term fluctations of 100% and more can be
spotted just by visually scanninng down a list of frequencies.
A somewhat more rigorous (and conveniently scale-invariant)
"back-of-the-envelope" measurement of the amount of noise in a
dataset can be obtained from Fisher's coefficient of variation
(C.V.). The C.V. for the two (recent) longitudinal samples I happen
to have handy (from two totally independent sources) are both right
about 0.30. So, 30% of the total variance is pure noise (and thence
"unavailable" for inferential purposes).
To appreciate the impact of a C.V. of 0.3, remember the "research
question" here: does _A_ cause a reduction in _B_? It'd be almost
exactly equivalent to trying to tell how well (or even IF) your new
diet is working, when the only scale you have to weigh yourself on
reads 120 pounds one day and 230 the next, when your "actual" weight
is about 170.
All of which may help to explains why I've tried (up 'til now) to
raise a cautionary flag: to have even the tinest hope of being
detectable, this is gonna hafta be the "800-pound gorilla" of
effects. But having said my piece, I hereby resign as chairman of
the Committee to Try to Save Other Folks' Time/Effort, and return to
lurker mode.
- Terry
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg