Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt


On Mar 30, 2006, at 2:27 PM, Daniel Feenberg wrote:

A better study of false positives would require a large corpus ofknown good mail for a diverse set of destinations, with connectingMTA IP addresses. One could query the DNSBLs for those IPaddresses, and calculate the probability that a legitimate messagewould be blocked. But I haven't found a corpus of known good mail.One source would be email confirmations of mailing-list signups, ifanyone would like to share that with me. The saved mail file of anindividual isn't very representative even if it is large.

Looking only at mailing list signups won't necessarily be arepresentative sample because mailing lists will tend to be clusteredand possibly run on different servers from general email.

Tracking replies based on references headers and linking those to therecord of the received mail should give a better overall picture ofyour users good mail sources. You would of corse need to filter outreplies to abuse desks, vacation auto responders and forwardingaccounts. There may be other anomalies in you mail pattern so somerandom sampling should be done to look for anything that might skewthe results.

The overlap between the good mail sources and blacklisted sources iswhere most of the false positives are going to be. The magnitude ofthis overlap will be good enough for the first order estimate of thefalse positive potential.

The set of mail sources that are not identified as good in the abovetracking and not identified as bad in the blacklists would besomething to investigate further.




_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg