ietf-dkim
[Top] [All Lists]

Re: [ietf-dkim] canonicalized null body and dkim

2007-01-08 04:56:04
On Sun, 07 Jan 2007 18:49:50 -0000, Eric Allman <eric+dkim(_at_)sendmail(_dot_)org> wrote:

I have (finally) managed to slog my way through all the messages on this topic. Let me start out by saying that I don't see the ambiguity in the current text:

        If there is no trailing CRLF on the message, a CRLF is added.
        It makes no other changes to the message body. In more formal
        terms, the "simple" body canonicalization algorithm converts
        "0*CRLF" at the end of the body to a single "CRLF".

So if the message ends without a CRLF (which should only be possible using CHUNKING) one gets added. In particular, this is important because if a message is sent using CHUNKING through one relay and DATA through another, the CRLF will have to be added to get the <CRLF>.<CRLF>.

Indeed there is no ambiguity in that, but that is because you have only quoted half the text. The full text is:

   The "simple" body canonicalization algorithm ignores all empty lines
   at the end of the message body.  An empty line is a line of zero
   length after removal of the line terminator.  If there is no trailing
   CRLF on the message, a CRLF is added.  It makes no other changes to
   the message body.  In more formal terms, the "simple" body
   canonicalization algorithm converts "0*CRLF" at the end of the body
   to a single "CRLF".

Observe carefully that the text some times tells you to consider the "message", and somtimes the "message body" (which I take to mean exactly the <body>, if any, defined by RFC 2822).

Consider the example, in DATA format:

   Field: foobar<CRLF>.
   <CRLF>
   <CRLF>
   .<CRLF>

The ".<CRLF>" is evidently not a part of either the message or of the message body. The "message body" consists of "<CRLF>". Let us apply the sentences of 3.4.3 one by one.

   The "simple" body canonicalization algorithm ignores all empty lines
   at the end of the message body.

To see what an "empty line" is we need one more sentence;

   An empty line is a line of zero
   length after removal of the line terminator.

So the line "<CRLF>" (which is the whole of the message body) IS an empty line. That empty line is at the end of the message body, so we ignore it. That leaves the message

   Field: foobar<CRLF>.
   <CRLF>

Take the next sentence:

   If there is no trailing
   CRLF on the message, a CRLF is added.

There IS a trailing CRLF on the message (NB, it does not say "message body" there), so we add nothing. We still have:

   Field: foobar<CRLF>.
   <CRLF>

Take the next sentence:

   It makes no other changes to
   the message body.

So we are finished. We have a message with an empty <body>, so that empty <body> is what we hash.

Take the next sentence:

   In more formal terms, the "simple" body
   canonicalization algorithm converts "0*CRLF" at the end of the body
   to a single "CRLF".

That is supposed to produce the same result, so start over with the original message:

   Field: foobar<CRLF>.
   <CRLF>
   <CRLF>
   .<CRLF>

of which we have already seen that "<CRLF>" is the body.

Indeed if contains ""0*CRLF" at its end (in fact, it contains 1*CRLF), so we convert it to "<CRLF>", and that is what we hash.

Therefore, the description in the first four sentences produces a different result to the supposedly identifal description in the fifth sentence.

Q.E.D.

Now it appears that some implementations have followed one interpretation and some the other, so something needs to be fixed. My suggested wording is:

The "simple" body canonicalization removes empty lines from the end of the body until either the last line is non-empty, or no lines remain. An empty
   line is a line of zero length after removal of any terminating CRLF. If
   the body is not now empty and the last line is not already terminated by
   CRLF, a CRLF is added to it.

      INFORMATIVE NOTE: Following [RFC 2822}, the CRLF which separates the
      header fields from the body is NOT part of the body, and therefore is
      never presented to the signing or verification algorithm. In the case
of a pure binary message (such as one with a Content-Transfer-Encoding of 'binary') the concept of "lines" may not be meaningful. Nevertheless, wherever the pair of octets that represent CRLF happens to occur, that
      is to be considered as the end of a "line" for the purposes of this
      canonicalization algorithm.

That follows what I consider to be both the spirit and the letter of the first four sentences, at the expense of ignoring and renmoving the fifth sentence.

In particular, it leads to the easily remembered invariant:

   "After canonicalization, there will NEVER be an empty line at the end
    of what remains to be hashed."

--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131     Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
_______________________________________________
NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html