Re: [ietf-dkim] Updated implementation report

-----Original Message-----
From: ietf-dkim-bounces(_at_)mipassoc(_dot_)org 
[mailto:ietf-dkim-bounces(_at_)mipassoc(_dot_)org] On Behalf Of Rolf E. 
Sonneveld
Sent: Friday, October 01, 2010 2:24 PM
To: ietf-dkim(_at_)mipassoc(_dot_)org
Subject: Re: [ietf-dkim] Updated implementation report

Remark: I'd suggest to transform the AOL data to also contain
percentages, just like the results of the other (OpenDKIM statistics)
project. It will make it easier to compare (the results of) the two
'projects' and in general, percentages give a better view on what was
measured.


OK, I'll look into that for the next version.

If this is the #1 reason that verifications fail, would there be room
for a new canonicalization scheme, to improve verification rates? I
know
there are MTA's, implementing the principle of 'garbage in - garbage
out', just like there are MTA's implementing the principle of 'be
liberal in what you accept, be strict in what you emit'; the latter may
add a missing Date field, or correct a syntactically wrong Date field,
or modify To fields to match RFC5322 etc. This has been discussed
before, and it is impossible to come up with a canonicalization scheme
that addresses 100% of these modifications, but if we can address the
top 5 or top 10 types of modifications (and hence reasons for
verification failure), we might be able to further improve the
verification score, dont' we? Murray, do we have any figures about the
total percentage of DKIM signatures, which were invalidated by header
modifications, and a complete list of which headers were modified? Are
we talking about 5%, 1%, 0.1%, 0.01%?


I don't think the data support any conclusions about possible canonicalization 
improvements yet.  I also can't (after admittedly spending only about 90 
seconds on it) imagine what that might look like, short of something like 
"relaxed plus no punctuation outside angle brackets" or the like.

The OpenDKIM statistics collected so far show that of 135549 signatures 
received, 121821 passed (meaning 13728 failed).  There were also 6194 that 
passed (inasmuch as the header hashes lined up) but the body hash didn't.  So 
we see that failures caused by header changes beat body changes by just over 
two-to-one.

What we don't currently collect is a list of signed fields that were deleted in 
transit, nor do we collect a list of fields that were signed but not actually 
present when signing to prevent their later addition.  We also don't collect 
what exactly the various in-transit modifications were.  We also do only a 
limited approximate matching; if a field changes a great deal, we don't count 
it in order to avoid false positives (e.g. two Received: header fields could be 
compared and cause a false positive).  And since the contents of "z=" are 
assumed to be an accurate reflection of the signed header fields for the 
purposes of this study, we rely on people to add them and do so correctly.  So 
it's not a precise study, and we certainly don't collect enough information to 
say what the precise changes are that cause failures, but it still does reveal 
some interesting stuff.

This raises another question: a DKIM verification failure in itself is
not a problem: a spammer signing with an incorrect signature, or
replaying old (DKIM) message headers with new spam content will cause
verification failures, which is how it is supposed to be. However,
another category of DKIM verification failures may have to do with
header modifications by downstream MTA's, invalidating DKIM signatures.
The question here is: how can we gather statistics about these two (and
maybe there are more) different categories of verification failures, or
how can we differentiate between these two (or more) categories?


I suspect the best you can do, short of getting everyone along the way to save 
copies of all signed traffic for a while, is to encourage wider deployment of 
"z=" by both signers and verifiers.  That'll also be hard to do though since it 
doesn't actually help anyone achieve anything, other than those of us 
interested in advancing the protocol, and it just makes headers bigger which 
sometimes runs into processing limits.  And several implementations probably 
didn't even bother to add support for it.

I think it's rare to find anyone validating signatures except at the sender and 
the receiver, so any mid-stream rewrites of stuff are happening silently.

Another thing that might be interesting to collect is a study of which MTAs are 
making which changes that break signatures, or which signature-breaking MLM 
actions are the most common.  These studies would be a lot harder to conduct, 
however.

_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html