On Mar 30, 2006, at 2:27 PM, Daniel Feenberg wrote:
A better study of false positives would require a large corpus of
known good mail for a diverse set of destinations, with connecting
MTA IP addresses. One could query the DNSBLs for those IP
addresses, and calculate the probability that a legitimate message
would be blocked. But I haven't found a corpus of known good mail.
One source would be email confirmations of mailing-list signups, if
anyone would like to share that with me. The saved mail file of an
individual isn't very representative even if it is large.
Looking only at mailing list signups won't necessarily be a
representative sample because mailing lists will tend to be clustered
and possibly run on different servers from general email.
Tracking replies based on references headers and linking those to the
record of the received mail should give a better overall picture of
your users good mail sources. You would of corse need to filter out
replies to abuse desks, vacation auto responders and forwarding
accounts. There may be other anomalies in you mail pattern so some
random sampling should be done to look for anything that might skew
the results.
The overlap between the good mail sources and blacklisted sources is
where most of the false positives are going to be. The magnitude of
this overlap will be good enough for the first order estimate of the
false positive potential.
The set of mail sources that are not identified as good in the above
tracking and not identified as bad in the blacklists would be
something to investigate further.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg