Very interesting data.
Too bad all of the domains with 100% failure rate are *all* hashed.
I'm surprised that failed(body) is zero. I would have expected that to
fail more often due to mailing list modifications.
Some possible enhancements:
*) It would be interesting seeing some of this data graphed against time.
*) Of the l= uses, how many were l=0 vs. l=some-other-value?
*) Can differentiation be made between syntax errors in the DNS entry
and syntax errors in the signature?
Tony Hansen
On 8/4/2010 2:00 PM, Murray S. Kucherawy wrote:
We've started gathering data from a few of our installations that have
chosen to submit it to us. With only four sources reporting, we can
already see some interesting pieces of information.
A report is generated based on our accumulated data every half hour at
http://www.opendkim.org/stats/report.html.
First, some explanation, as the reports are currently somewhat crude:
- Each record in the database represents a single received message.
- In the signature algorithm table, "0" is rsa-sha1, "1" is rsa-sha256.
- In the two canonicalization tables, "0" is simple, "1" is relaxed.
- In the pass/fail rate tables, "failed(body)" indicates a message
where "bh" changed between the signer and the verifier.
- Data submitters are given the option to anonymize their data. This
is done by MD5-ing the From: domain and the submitting IP address,
allowing aggregation of data on common sources but only limited
reverse-engineering of it. This is why the domain names in some cases
are hashes and not real data.
- Mailing list traffic is detected by identifying List-* header fields
or a "Precedence: list" header field. If people have additional ways
to suggest identifying list traffic, please let me know.
- ADSP "passed" currently includes things with valid author domain
signatures, for which ADSP is actually not checked. This will be
broken out in our next release.
The very interesting things to note so far:
- "relaxed" is the most popular header canonicalization, but I think
we expected that. "relaxed" is also the most popular body
canonicalization, which is not the general advice we give, though I
suspect this is skewed by the fact that that's what gmail.com uses.
- Almost 90% of DKIM signatures survive, unless they go through lists
in which case the success rate plunges to 32%.
- Just under half of all signed mail passes through five hops total
(some of which may be pre-signature).
- Most DKIM signatures pass as long as they go through three or fewer
hops. After that, survivability drops dramatically.
- Not a single signature has failed as a result of body changes (apart
from what the canonicalizations tolerate).
- Third-party signatures appear to have a much higher failure rate
than author signatures.
Upcoming revisions to our collection mechanisms include:
- Tracking use of "g=" in keys.
- More detailed analysis of ADSP.
- Tracking of DNSSEC use with respect to DKIM keys.
- Ability to produce reports for each reporting site rather than only
aggregation. (We can do that now but because of our current schema,
it's expensive.)
- Ability to exclude anonymized data from certain reports.
- When "z=" tags are used, identification of which fields are being
changed in transit.
We need more data! OpenDKIM users are encouraged to enable the
statistics code and participate in the program (though, of course, you
are under no obligation to do so). Instructions were sent to the
opendkim-users list on July 30th, as well as information already
available in the stats/README file in the source distribution.
Feedback from both groups is welcome.
-MSK
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html