ietf-asrg
[Top] [All Lists]

Re: [Asrg] Requirements for gathering statistics

2003-03-24 16:27:33
From: "Alan DeKok" <aland(_at_)freeradius(_dot_)org>

We have two pieces of information that I think most significant:

1 Spam is growing at an exponential rate (10% a month by the most
conservative estimates).

  I would like additional data justifying that opinion.  While it's
true anecdotally for most people, I think we need a larger sample to
state it decisively. ...

10%/month is 2.6X per year.  Spam is increasing but 2.X/year is "most
conservative" only if you also believe "SPEWS blocked all of UUNET."

The number of checksums of spam in the DCC network is increasing faster
than that.  However, I think much of that is due to the appearance of
<!--HTML--> hacks and certain other things and the DCC clients that
still have not installed the countermeasures and so are seeing many
checksums for a few spews.
It's also important to pick durations.  In some weeks spam has increased
by more than 10%, but it has decreased in other weeks (e.g Thanksgiving
and Christmas/New Years)  (again, judging from the number of checksums
in the DCC network)


  I would like ASRG to be able to say "we looked at the problem, and
some thousands of MTA administrators reported X amount of
spam. Stastical validy is <foo>.  Data analysis follows.  Therefore,
to address these problems, we have potential solutions A, B, and C.
The cost of these solutions is predicted to be D, E, and F.  The
effectiveness of these solutions is predicted to be G, H, and I."
...

That's going to be hard because so many commentators assume the
inevitability of miraculous deployment.  Even those who agree that
replacing everything is unlikely sometimes slip into claiming that
dividing the deployment problem into stages makes it easy. 
(Counterproofs:  IPv6 and DNSSEC)  You can't really talk about the
quantitative effectiveness of solutions that are as likely to be deployed
and work in the real world as spammers getting religion and stopping
on their own.


] From: Dave Crocker <dhc(_at_)dcrocker(_dot_)net>

] I would appreciate your clarifying your comments.  My confusion is
] between survying attitudes vs. surveying behaviors.
]
] Survey research is excellent for assessing people's attitudes ...

] Survey research is very nearly useless for assessing people's actual
] behaviors. ...

surveying for measurements of relatively abstract numbers like
"total spam" is even more problematic.

Finding representative samples is hard, because spam is so idiosyncratic.
The numbers at http://www.dcc-servers.net/dcc/graphs/comp-rates
show a 2X variation in spam load for some types of oganizations, or at
least some particular outfits.
Note that the names of those outfits are intentionally unstate.  Most
organizations seem to be unenthusiastic about publishing such numbers.


I like seeing numbers that are based on millions of mail messages,
but even they must be viewed as special cases of the global situation.
The more common numbers of the "my personal mailbox has seen X" are
rarely interesting.


Vernon Schryver    vjs(_at_)rhyolite(_dot_)com
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg