ietf-mailsig
[Top] [All Lists]

Re: DKIM: Canonicalization

2005-07-17 21:59:39

On July 17, 2005 at 16:16, Tony Hansen wrote:

Corollary: Yes, we can come up with examples where internal whitespace
is significant. Yes, we can come up with examples where trailing
whitespace is significant. Yes, we can come up with examples where
trailing trailing newlines are significant. (If you can't, try harder.)

Can you please help me out?

The question for me is: How much do the different algorithms affect
security for typical use? And how much does the security degrade when
the non-optimal case is present? And is that acceptable?

Agreed.  I've tried to mention guidelines in previous posts, but
I am not sure I was clear, or explicit, enough.

My main guideline is that the canonicalization process does not
undermine the meaning of the data.  This is really a security
concern: the integrity of the data needs to be protected.  However,
that integrity does not necessarily imply that the data must be
octet-for-octet the same.

To determine what the "meaning" is, we first look at some standards.
For example, RFC-2822 states that header field names are to be
treated case-insensitve.  Therefore, it seems pretty clear that
all header field names should be converted to lower (or upper) case
during canonicalization.

RFC-2822 states that header fields should be unfolded before
any further process (including syntactic) is done.  Therefore,
it seems clear to me that header fields should be unfolded during
canonicalization.

With the above, the DKIM simple algorithm violates the meaning of the
data.  I.e. Simple changes the meaning of the data.  A <CRLF><WS> now
has meaning in folded header fields when RFC-2822 states it should not.
Ditto for header field names.

The nowsp algorithm also violates the guideline, with the handling
of whitespace being the source of dispute.

For headers, again, we look at RFC-2822.  It is clear that there are
cases where whitespace has no signficanse (FWS), and others where
is uncertain (like for unstructured header fields).  We also know
that there are cases where whitespace may be added at the
end of field (however, this becoming more rare over time).

Therefore, to simplify things, we try to come up with an algorithm
that is sufficient to be applicable to all header fields with
acceptable probability that the meaning of a field is not changed
due to canonicalization.

One proposal is to strip all trailing WSP from a header field (after
unfolding).  Can we agree that this is something that should be done?
In DKIM, ironically (compared to what is done for the body), leaves
trailing WSP.  Only FWSP is removed.  I think this is inconsistent.

For the body, we unfortunately have to rely on what occurs in the
real-world and that whitespace modification happens for some systems.
What I tried to do when coming up with my proposal (see earlier post)
to deal with this without changing the meaning of the data.

This lead to my proposol of stripping all WSP at the end each line
and removing trailing LWSP at the end of the body.

Since you have implied that there are cases that such canonicalization
rules can change the meaning of data, will you please provide them.
And as you do so, please explain the security risk that such changes
have on the examples.

--ewh


<Prev in Thread] Current Thread [Next in Thread>