Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt


On Apr 2, 2006, at 4:46 PM, Laird Breyer wrote:

On Apr 02 2006, Steve Atkins wrote:

Chris Lewis writes:

Justin Mason wrote:

First, you accept every message, and record the "real-time" data

points regarding how the message was listed against thoseservices,

and/or how it *would* have been rejected at SMTP transaction time
(if
at all). However, you don't reject, you accept everything.


Not rejecting the mail alters the behavior of the sender. That skews

the data from this approach. Doesn't mean it's not a usefulmetric, but

it's not a real measurement, and can't be used to give good estimates
of real-world performance (at least, not without quite a lot of
additional
data, anyway).


During the SMTP transaction, the server might reject the data from the
client, and would do so on _some_ grounds (eg IP address). Strictly
speaking, _these_ grounds are what needs to be logged and verified by
a human somewhere down the line.

As such, it's not necessary to accept the message at all and no sender
skewing takes place. In a formal test, a human adjudicator may well
decide that a single logged IP address + dnsbl response and without an
accepted message body are insufficient grounds for summary rejection,
and count it as an FP. That's still a valid test though, because the
human observer takes the final responsibility, and can explain his
reasoning if necessary.


If you reject the message during testing where you would have accepted
the message in operation then you may change the delivery policy of the

sender to that recipient for all future messages. That invalidatesany datathat includes any future delivery attempts from the same sender (andthat

sender might well be a spammer coming from any of thousands of open
proxies sending hundreds of different messages).

That means that any test where your rejection behavior differs from your

operational rejection behaviour will see different delivery policies,andhence different numbers and origins of delivery attempts. That meansthatyou _cannot_ use this approach of testing multiple rejection criteriaagainsta single delivery stream and get entirely valid data (except for thecase where

you accept all the mail, both in testing and operation).

That doesn't mean it can't provide useful data, but it will not givethe same

answers as if you really rejected the mail in real time as you would in
practice.


For example, the criterion might be consent (as in CAN-SPAM) and the
adjudicator might be a lawyer, and a single IP listed on a dnsbl might
not constitute proof of consent or dissent under the regulations.


And to measure actual efficacy you need to actually reject mail,
if what you're measuring is behaviour if you reject mail. (If the
behaviour you're trying to model is "what happens if we bulk folder
or devnull mail detected by a filter?" then that's not a problem,
but one of the usually mentioned advantages of DNSBLs is that
you don't need to accept the mail).


True, but in that case it is doubly important to not amalgamate
accuracy testing with efficacy testing.  These aspects should be
clearly separated in arguments intended for non-experts.


Neither accuracy nor efficacy are accurately measurable against

multiple different filtering policies in parallel in this sort oftest. By

correcting for sender behaviour (which I've never seen anyone
claiming these sorts of numbers do[1]) you could certainly improve
the quality of the results, but they'd still be skewed relative to using
the same approach during real mail delivery.

It's a metric. It's not a bad metric, but don't mistake it for even an
accurate measure of results on the one set of mail tested.

Cheers,
  Steve


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg