ietf-asrg
[Top] [All Lists]

Re: [Asrg] Requirements for gathering statistics

2003-03-24 15:02:27
Dave Crocker <dhc(_at_)dcrocker(_dot_)net> wrote:
I would appreciate your clarifying your comments.  My confusion is
between survying attitudes vs. surveying behaviors.

  I would say that spam is about personal attitudes and network
behaviours.  Personal attitudes are difficult to measure.  Network
behaviour is not.

Survey research is excellent for assessing people's attitudes.  In this
context, "bias" is about preferences. (Even with this we have remarkable
sensitivity to survey question formulation, making valid and meaningful
survey construction something of a black art.)

Survey research is very nearly useless for assessing people's actual
behaviors., past, present or future. This is because people do not track
and record the behaviors objectively, so their "self-report" is one of
their subjective sense of things.

  I agree, so far as your comments apply to individuals.  Individuals
do not objectively record their personal behaviours.  e.g. How many
times did you take a shower last year?  That information is not
generally recorded, and is subject to personal recollection.

 Individuals do, however record some kinds of information about their
public interactions with others (networking).  e.g. What was your
credit card bill for last January?  If you don't know, odds are you
can call the credit card company, and they'll tell you.  e.g. How much
money did you spend last year on your credit card buying flowers for
your wife?  That data is readily available.


  For spam, I was trying to acheive self reporting of behaviours which
are currently recorded, or which were easy to record.  Measuring how
much email an currently MTA gets is trivial.  The data exists, and
it's easy to summarize.  It involves no personal recollection, or
personal bias.

  Deciding *which* emails are spam involves personal bias.  But once
you've decided upon a flavour you like, it's again easy to measure how
you are implementing that bias.  Sufficiently large populations will
result in a bias which can be modelled, and thus accounted for.

For very simple assessments of very simple behaviors (did you wake up
yesterday, or which of two candidates will you vote for today) survey
technology can do pretty well.  Not for anything more complicated.

  Simple behaviours include:

- how many packets were forwarded by your network?
- how many packets were dropped?
- how many SMTP connections did you receive?

  These questions are easy to ask, and easy to answer.  Collecting
such data will allow us to make better statements about how bad the
spam problem really is.  It should also give us some insight as to why
certain behaviours are happening.

  e.g. 10,000 SYN's from spammers a second is a good explanation for
why the ISP has firewalling port 25, and is using a whitelist.


  The point of the gathering these statistics is to gain confidence in
our understanding of the scope of the problem, and of our
understanding of the current methods people use to deal with it.

  Alan DeKok.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg