ietf-asrg
[Top] [All Lists]

Re: [Asrg] Comments on draft-church-dnsbl-harmful-01.txt

2006-04-01 14:11:58
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Laird Breyer wrote:
On Mar 31 2006, Chris Lewis wrote:


whatever the incentive. So claiming incentive as an argument for the
success of dnsbls won't (and really shouldn't) sway standards committees.

You're missing the intent.  The intent is to show that our FP statistics
have pretty strong validity, full stop.  In that, it should sway those
wanting such statistics as proof/disproof of the success of DNSBLs.


But in terms of arguments, a generic success is canceled out by a generic
failure.  You need to qualify your success claim with a domain of 
applicability

I'm a lot closer to showing a generic success, than Church is of even a
limited scope failure.

He's reporting for a spam filter responsible for handling at most a
handful of users.  I have 65,000, and handle 500 times as much email per
_day_ as his (I believe) multiple day dataset.  Sites handling 1000
times as much email as I do are ALSO using DNSBLs.

I also have far more stringent requirements regarding FPs than most,
because a single lost email could cost us millions (of my employer's) money.

We do this for a living.  I've been living in this dataset for over 10
years.  As have some of the bigger environments, who've also found
DNSBLs to be successful.

The fact that we even have to answer to such nonsense, instead of
summarily throwing that in the trash is an insult to professionals.

But we do it anyway, because we're professionals.

which covers your users but is narrow enough to not cover cases where
others might legitimately claim failure.

I don't presuppose to cover vanity or hobby domains.  Which is pretty
clearly what Church is talking about.

The author of the draft isn't the only one who claims to have been
bitten by dnsbls, there are plenty of rants on the net, e.g.
http://paulgraham.com/spamhausblacklist.html

Paul chose to deliberately misrepresent the situation for political
ends.  As did Moveon.  As did Gilmore.  And at least _two_ of those
listings were legitimate spam blocks. There are similarly plenty of
issues about other types of filtering goofing up.

Come guys, _no_ filter is perfect.  You can't simply pick the horror
story about the filtering technique you don't like, and ignore the rest.

While that doesn't make their claims of failure statistically
significant, it means that your counterclaim of strong success of
dnsbls will necessarily be treated very skeptically, unless you can
explain where in the email usage pattern your results apply, ie how
can your claim be true and also the church claim be true simultaneously?

Church has a personal axe to grind, his usage patterns/user community
are unknown, his "preferred" filter is only described in terms of
unverifiable handwaving - no numbers, and is of a size that can best be
described as statistically irrelevant.

That should be enough.

What's a plausible distinguishing feature? Quality of blacklist, 
type of user, ... ?

Yes.  And others.

It simply isn't possible to generate statistical accuracy in this field.
Spam/Ham collections large enough to be useful can't be generated
timely enough to give accurate measures of real-time reputation systems.

I'm not convinced it can't be. 

Okay, suggest how a spam/ham collection can be used to measure the
effectiveness of the following techniques that are, or can be, used in
an anti-spam solution:

1) grey listing
2) sender/sender domain verification
3) Challenge/response
4) SPF and DKIM
5) PKI
6) CSV
7) Non-existant users
8) DCC or other distributed checksumming methodologies.

Similarly DNSBLs.

The NIST TREC people
(http://plg.uwaterloo.ca/~gvcormac/spam/) are interested in this very
kind of problem, so it's at least "on the radar", whether it will
ultimately be successful or not.

They've been at that for rather a long time.  They should be doing a
reasonably good job for a specific class of filtering technology.

But it's not _spam_ filtering technology.

_Anything_ that uses "adjudicators" to determine whether an email is
spam or ham is clearly missing an important clue: spam isn't about
content, it's about _consent_.  It is a behaviour (sending without
consent), NOT what is in the email.

DNSBLs that can, for example, detect compromised machines or illegal
methods being used to send the email have a better handle on intent,
than an "adjudicator" could possibly have.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQCVAwUBRC7oDJ3FmCyJjHfhAQIb5wP/aknRrWADXG0eC27TfP2Z2Mz/+Fxy5GLV
y21XYvu6hQoSSPY3vdKfDmumcLHMRKCG+xMT/VmJgb9jy1WtES9M3LDmQMws+jcz
8sdM9Ef/cUC3ZrtIplBV2ugtfNEbWvq5ZzMbjDwR7R+aSlJxKqInrTeYvlniP7CE
BM7/6XQjJ2Q=
=FdiU
-----END PGP SIGNATURE-----

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg