Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt

On 2006-04-03 22:01:13 -0400, Chris Lewis wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Laird Breyer wrote:

On Apr 03 2006, Seth Breidbart wrote:

Describe how you'd test greylisting without perturbing the system.


I already have outlined it several times now. The greylisting system
logs all the events which enter into the final decision. When a mail
is rejected, the record of evidence which triggered that rejection 
(call it R) is kept. 

Later, during QA, some volunteer users are sent a fraction of
rejection records R for mail messages that were intended for them, to
be approved. When the definition is "spam is what I say is spam",
these volunteers are the final arbiters of mail intended for them,
whether other people think they could do a better job or not.

Finally, you measure the number of records R which the volunteers say
are incorrectly rejected by the greylisting system.


Sorry, that notion is nonsensical with greylisting.  Are you aware of
how greylisting works?

[...]

It never permanently rejects.  The only notion of "incorrectly rejected"
there could be is if the implementation is broken and not implementing
greylisting.  In which case, you're not testing greylisting, you're
testing something else.


I am testing whether greylisting has the intended effect. Sure, the MTA
implementing greylisting never issues as 5xx response. But the effect of
one or more 4xx responses may be the same - either because the client
MTA is broken (doesn't retry) or because of the client's queuing setup
and the servers greylisting parameters don't match. 

So if I have records from the greylisting MTA, I can in theory look at
every delivery attempt and classify it as:

1) Delivery was accepted and message was not spam (TN)

2) Delivery was accepted and message was spam (FN)

3) Delivery was temporarily rejected, but a later delivery of the same
   message was accepted. This puts it into one of the two categories
   above, but adds extra info (number of delivery attempts, delay from
   first delivery attempt to acceptance).

4) Delivery was temporily rejected and no later delivery attempt was
   successful. Message was spam (TP).

5) Delivery was temporily rejected and no later delivery attempt was
   successful. Message was not spam (FP).

In practice this is difficult. Normally, greylisting is done at the RCPT
command, so there isn't much information available (EHLO, MAIL, RCPT,
IP-address of client, maybe size of the message). So it is even
difficult to determine whether two delivery attempts are for the same
message, and even more difficult to determine whether the message was
spam. Still, for small samples, where a human can go over the records
and apply real-life knowledge, it should be possible to get a (somewhat
fuzzy) categorization which is better than that determined by
greylisting itself.

(As an aside, we provide our users with a webbased view of the log files
for their own mail addresses, so they can do similar checks - some of
them do and report FPs due to greylisting to us)

Greylisting could be delayed until after the message is received. This
would make provide more information for auditing, but it alters the
algorithm enough that it cannot be considered the same algorithm any
more (so you cannot do this "just for testing", you have to do it in
production, too).

        hp

-- 
   _  | Peter J. Holzer    | Ich sehe nun ein, dass Computer wenig
|_|_) | Sysadmin WSR       | geeignet sind, um sich was zu merken.
| |   | hjp(_at_)hjp(_dot_)at         |
__/   | http://www.hjp.at/ |    -- Holger Lembke in dan-am

pgp83xqpTja1o.pgp
Description: PGP signature

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg