ietf-asrg
[Top] [All Lists]

Re: [Asrg] Requirements for gathering statistics

2003-03-23 16:55:05
I'm curious what are your ideas on how to do it in more in terms of scientific
research. I mean statistics previously given were very "broad". What I see 
that has to be done is first statistics on how much email goes into what 
types of mailservers (i.e. statistics on amount of email at larger isps - 
aol, earthlink, msn) as compared to amount of email at smaller isps as 
compared to amount of email at free email providers (hotmail, yahoo) as 
compared to amount of email at enterprise as compared to amount of email 
at web-hosting companies as compared to email small individual machines
(hosted on dls, cable, etc). After we have this statistics (how?) we a 
certain sample can be chosen for each of the groups and then statistics 
gathered. Just doing sample at aol or at enterprise will not provide 
accurate statistics!

--------  Questionaire on Spam for Mail System Administrators ----

I) Size of the problem
 1. Volume
Add here first - Total number of email messages. Total amount of email 
received (i.e. traffic statistics on SMTP protocol).

  a) Total spam volume (Mb/day)
  b) total number of spam messages 
  c) total number of "do-nothing" SMTP connections
     (e.g. connect, EHLO, disconnect)

 2. Relative amount of spam
  a) spam as a percentage of total email (spam messages / total messages)
  b) spam as a percentage of *non-spam* email (spam messages / accepted
     messages)
Would like to all see:  size of spam messages / total size of email messages
                        size of regular emails / total size of spam

 3. Count of IP's
  a) number of IP's sending spam
Please by geographical area. When reverse dns is present - by country 
(.cn) and by larger isp net (i.e. dsl at verizon, etc)

  b) number of IP's sending "valid" email (may overlap with 3a)
And its interesting to know how many overlap!!!

 4. Costs associated with dealing with spam
  a) administrative (network support solely due to spam
Difficult to quantify ...
  b) technical (end-user) support
  c) additional infrastructure/machine costs
  (costs to end users should be initially avoided, as they're more
   difficult to quantify)

II) Current Solutions which address the problem
 1) blacklists
   a) SMTP conversations blocked due to the blacklist, as a percentage
      of the total
      (blocked conversations / total conversations)
What kind of blacklists!!!

This would be usefull for additional statistics on particular list of 
respondents especially if we really do it on wide basis.

   b) SMTP conversations blocked, as a percentage of non-spam email
      (blocked conversations / allowed conversations)

 2) whitelists
   a) spam which is in the whitelists, as a percentage of the total
      number of SMTP conversations
   b) spam which is in the whiteliests, as a percentage of the
      non-spam SMTP conversations

 3) content filtering
   a) percentage of non-spam marked as spam (false positive)
This one is really really hard to tell. Most non-spam marked is spam is 
never seen by readers. So actual percentage that is reported would be very 
very small.

   b) percentage of spam missed by the filter (false negative)
   c) percentage of spam caught by the filter
-----
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg