My main guideline is that the canonicalization process
does not undermine the meaning of the data.
I don't understand why preserving the "meaning" of the data is at all
relevant. The canonicalized form is, after all, a transient not intended to
be used in place of the true or original form at all.
Making sure we are all very clear about the nature and purpose of
canonicalization, as used by DKIM, is not a small point. Should there be
changes in the language of the draft to try to work harder, at ensuring the
reader understands this point?
In terms of the engineering challenges/tradeoffs of canonicalization, I found
one of Mark's notes interesting:
In the DKIM space, canonicalization has to serve two goals. One is
survivability across common mangling, the second is resistance to abuse. The
more flexibility the canonicalization algorithm allows, the more
possibilities you allow for re-formatting abuse.
Essentially all canonicalizations throw away some data - put another way,
they allow the insertion of some data without affecting the results of the
verification.
The question you have to ask is this: if you allow the insertion of some data
in a way that does not affect verification, can a bad guy take advantage of
that? More importantly can you assure that a bad guy cannot take advantage of
that.
....
Is that a risk today? Maybe not. Is it a risk for new forms of content
invented a year from now? Who knows for sure? I for one would not sign my
name on the dotted line saying that any sort of canonicalization that
ignores some content, is safe from abuse.
d/
---
Dave Crocker
Brandenburg InternetWorking
+1.408.246.8253
dcrocker a t ...
WE'VE MOVED to: www.bbiw.net