ietf-asrg
[Top] [All Lists]

[Asrg] Requirements for gathering statistics

2003-03-23 16:31:08
tedgavin(_at_)newsguy(_dot_)com wrote:
On behalf of SpamCon Foundation, we'd be happy to commission a project
to develop exactly the type of data ASRG needs to get things going.

  That sounds like an excellent start (and potentially less work for
me.)

First, can this group agree as to what type of report *would* be helpful
in the group fulfilling its mission? Alan, you seem to have started that
list. Some more detail behind the desire (such as exactly what stats for
spam emanating from certain points) would be nice. 

- Size?
- Content?
- Routes/Paths through the network?
- Sender/Receipient/Ultimate beneficiaries?

  The stats I'm most interested in are listed below.  Some ISP's may
be concerned about releasing "private" information (total traffic,
etc), so the questions may have to be normalized as percentages,
rather than absolute numbers.

  Having historical numbers would also be good.  e.g. Not just the
current numbers, but numbers for a number of previous years.

  I believe that gathering statistics on spam content or origin will
not currently be useful.  I would like to focus on simple & easily
gathered numbers, which will make the analysis less subject to
personal interpretation.  Similarly, the work (and amiguity) in
analysing network routing of spam & sender/recipients is too difficult
to be helpful right now.

  I would also like to focus the stats on domains reporting more than
10,000 spam messages a day.  That will limit the initial size of the
survey, and will reduce the noise due to systems at the low-end.

  Alan DeKok.

--------  Questionaire on Spam for Mail System Administrators ----

I) Size of the problem
 1. Volume
  a) Total spam volume (Mb/day)
  b) total number of spam messages 
  c) total number of "do-nothing" SMTP connections
     (e.g. connect, EHLO, disconnect)

 2. Relative amount of spam
  a) spam as a percentage of total email (spam messages / total messages)
  b) spam as a percentage of *non-spam* email (spam messages / accepted
     messages)

 3. Count of IP's
  a) number of IP's sending spam
  b) number of IP's sending "valid" email (may overlap with 3a)

 4. Costs associated with dealing with spam
  a) administrative (network support solely due to spam)
  b) technical (end-user) support
  c) additional infrastructure/machine costs
  (costs to end users should be initially avoided, as they're more
   difficult to quantify)

II) Current Solutions which address the problem
 1) blacklists
   a) SMTP conversations blocked due to the blacklist, as a percentage
      of the total
      (blocked conversations / total conversations)
   b) SMTP conversations blocked, as a percentage of non-spam email
      (blocked conversations / allowed conversations)

 2) whitelists
   a) spam which is in the whitelists, as a percentage of the total
      number of SMTP conversations
   b) spam which is in the whiteliests, as a percentage of the
      non-spam SMTP conversations

 3) content filtering
   a) percentage of non-spam marked as spam (false positive)
   b) percentage of spam missed by the filter (false negative)
   c) percentage of spam caught by the filter
-----
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg