On Jul 18, 2005, at 9:00 AM, Arvel Hathcock wrote:
My main guideline is that the canonicalization process
does not undermine the meaning of the data.
I don't understand why preserving the "meaning" of the data is at
all relevant. The canonicalized form is, after all, a transient
not intended to be used in place of the true or original form at
all. The true form remains unchanged and only the signer/verifier
consumes the canonicalized form and surely it doesn't care at all
what the "meaning" of the data is. So, I don't understand why
"meaning" must be preserved in the canonicalization routine. As
long as the signer and verifier agree on how the canonicalization
is performed it wouldn't matter even if we strrev'ed the entire
input would it?
While this is definitely a valid statement, "only the signer/verifier
consumes the canonicalized form", injecting and removing whitespace
and newlines at different locations can dramatically change the
meaning of a message, while not altering the signature. This
weakness could easily become a favorite pastime, where people have
fun altering messages they receive.
It would appear the "nowsp" method is a bit extreme. There does
seems to be room for a middle-ground. Leaving the "nowsp" on the
table, would there be room for a more conservative canonicalization
method that does not attempt to deal with the line too long issue? It
would seem the sender would be able to judge the need for this, and
may wish to prevent their messages from being mangled with modified
new lines and white space and then resent with their signature still
valid. The "simple" mode seems too simple as it does not handle many
common line alterations. A common ploy to obfuscate text is to
shove it well beyond the normal right margin. Seldom is there an
indication this text is not visible. Of course, "nowsp" could not
detect the use of this ploy.
Perhaps DKIM could have three canonicalization methods: simple,
moderate, nowsp.
-Doug