ietf-asrg
[Top] [All Lists]

Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt

2006-04-02 14:00:55

On Apr 2, 2006, at 12:20 PM, Justin Mason wrote:


Chris Lewis writes:
Justin Mason wrote:
Chris said:
Okay, suggest how a spam/ham collection can be used to measure the
effectiveness of the following techniques that are, or can be, used in
an anti-spam solution:

1) grey listing
2) sender/sender domain verification
3) Challenge/response
4) SPF and DKIM
5) PKI
6) CSV
7) Non-existant users
8) DCC or other distributed checksumming methodologies.

Similarly DNSBLs.

The effectiveness of all of these, except possibly for greylisting and C/R, can be measured accurately. We are doing it in SpamAssassin ;) --
here's how.

First, you accept every message, and record the "real-time" data
points regarding how the message was listed against those services,
and/or how it *would* have been rejected at SMTP transaction time (if
at all). However, you don't reject, you accept everything.

Not rejecting the mail alters the behavior of the sender. That skews
the data from this approach. Doesn't mean it's not a useful metric, but
it's not a real measurement, and can't be used to give good estimates
of real-world performance (at least, not without quite a lot of additional
data, anyway).

It's still a good approach, as long as you remember that different
categories of email delivery will respond differently to changes in
recipient behaviour. Often dramatically so.


Then, later, provide a way for hand-sorting to take place, and compare the results of the hand-sorting with what the various other techniques
would have done with those messages.

Hey presto, you've now got a way to compare accuracy and effectiveness
of those techniques!  Simple as that.

Do you see any provision for that with _any_ of the spam/ham
collections? Can you do that with a pre-existing spam/ham collection if
the technique you were trying to test _wasn't_ being collected at the
time the spam/ham collection was being made?

This points up my very point: in order to do a proper sampling of the
effectiveness of a technique is that you do it in real time, at the time
the emails were sent.  In other words, on real mail streams.

Ah, if you're talking specifically about *pre-existing* spam/ham
collections, then it's correct to say that they won't help, no.   To
measure network test efficacy, you need to create the spam/ham collection
in real-time, during the test.

And to measure actual efficacy you need to actually reject mail,
if what you're measuring is behaviour if you reject mail. (If the
behaviour you're trying to model is "what happens if we bulk folder
or devnull mail detected by a filter?" then that's not a problem,
but one of the usually mentioned advantages of DNSBLs is that
you don't need to accept the mail).

Cheers,
  Steve


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg