ietf-asrg
[Top] [All Lists]

RE: [Asrg] Requirements for gathering statistics

2003-03-25 10:04:19
"Sauer, Damon" <Damon(_dot_)Sauer(_at_)bellsouth(_dot_)com> wrote:
 The reasons that I do not release my numbers is that, if I tell all of
you
how good or how bad my numbers are, I may become a target by a spammer
that
either a) Wants to take me on as a personal conquest, or b) Accepts my
percentage of spam that I let through as "acceptable".

  I wasn't proposing to make the names public.  A trusted source can
reasonably be expected to gather the statistics and analyse them,
without compromising privacy.

---- Ok. Who? My own mother is just barely above suspect. (just kidding, my
mom puts me on spam lists all the time.. thanks for the internet greeting
card and joke-of-the-day mom!)

 I am VERY familiar with the spam problem. I am not sure what statistics
gathering is going to accomplish, other than justify the continued use of
spam as a marketing tool.

  <sigh> That's why the statistics I was asking for were at the MTA
end, and were focussed on the scope of the current problem, and the
cost and effectiveness of their current solutions.

  There was NOTHING in what I wrote which could be construed as to
measure the effectiveness of spam as a marketing tool.

---- Regardless of how the information is "meant" to be used. Any spammer
can point at these numbers and say- 50% of our junk is getting through <best
Homer voice> Whoo Hoo!. 
 Don't get me wrong, I understand the intent, but the moment you publish the
numbers, every news service in the world is going to pick them up and run
with it. Especially if the data is comprehensive. I would like to know how
exactly do we plan to compile and distribute the information. 
  

Will it help us obtain our goal? I don't think so because it
contains no differentiating research and no control sets.

  My goal in collecting statistics would be to get a quantitative
indication of WHERE in the network spam is a problem, and HOW MUCH of
a problem it currently is.  Without those statistics, we will have NO
BASIS for measuring the effectiveness of any solution.  Therefore, we
will also have no basis for comparing the solutions, or for knowing
when we've "solved" the problem.

  As for "control sets", they're irrelevant to the question of "how
bad is the problem".  We wouldn't ask 10 ISP's for their input on
spam, and then not ask another 10 as a "control set".  That
methodology is nonsensical.

---- I believe that you are not looking at control sets in the correct
light. In ANY scientific study you MUST have control sets or the information
is invalid. I submit to you that a control set would be anyone that has no
anti-spam functionality.
 So, your numbers would be compared to the control set. i.e. An entire
domain with no users that accepts all email (honeypot) would be sufficient I
think.

 Control Set = 100% spam getting through.
 Filtering Alone = blocks X%
 Heuristics = blocks Z%
 and so on....

 Now, if you only base your numbers on what anti-spam systems are doing, but
you have no idea if the actual flow has diminished, it would make look as if
the anti-spam component did a wonderful job that day.


Regards,
Damon Sauer







  Alan DeKok.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg


*****
"The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential, proprietary, and/or
privileged material. Any review, retransmission, dissemination or other use
of, or taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from all
computers."
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg