Re: [ietf-dkim] canonicalized null body and dkim

On Sun, 07 Jan 2007 18:49:50 -0000, Eric Allman <eric+dkim(_at_)sendmail(_dot_)org>wrote:

I have (finally) managed to slog my way through all the messages on thistopic. Let me start out by saying that I don't see the ambiguity in thecurrent text:
        If there is no trailing CRLF on the message, a CRLF is added.
        It makes no other changes to the message body. In more formal
        terms, the "simple" body canonicalization algorithm converts
        "0*CRLF" at the end of the body to a single "CRLF".
So if the message ends without a CRLF (which should only be possibleusing CHUNKING) one gets added. In particular, this is importantbecause if a message is sent using CHUNKING through one relay and DATAthrough another, the CRLF will have to be added to get the <CRLF>.<CRLF>.

Indeed there is no ambiguity in that, but that is because you have onlyquoted half the text. The full text is:


   The "simple" body canonicalization algorithm ignores all empty lines
   at the end of the message body.  An empty line is a line of zero
   length after removal of the line terminator.  If there is no trailing
   CRLF on the message, a CRLF is added.  It makes no other changes to
   the message body.  In more formal terms, the "simple" body
   canonicalization algorithm converts "0*CRLF" at the end of the body
   to a single "CRLF".

Observe carefully that the text some times tells you to consider the"message", and somtimes the "message body" (which I take to mean exactlythe <body>, if any, defined by RFC 2822).


Consider the example, in DATA format:

   Field: foobar<CRLF>.
   <CRLF>
   <CRLF>
   .<CRLF>

The ".<CRLF>" is evidently not a part of either the message or of themessage body. The "message body" consists of "<CRLF>". Let us apply thesentences of 3.4.3 one by one.


   The "simple" body canonicalization algorithm ignores all empty lines
   at the end of the message body.

To see what an "empty line" is we need one more sentence;

   An empty line is a line of zero
   length after removal of the line terminator.

So the line "<CRLF>" (which is the whole of the message body) IS an emptyline.That empty line is at the end of the message body, so we ignore it. Thatleaves the message


   Field: foobar<CRLF>.
   <CRLF>

Take the next sentence:

   If there is no trailing
   CRLF on the message, a CRLF is added.

There IS a trailing CRLF on the message (NB, it does not say "messagebody" there), so we add nothing. We still have:


   Field: foobar<CRLF>.
   <CRLF>

Take the next sentence:

   It makes no other changes to
   the message body.

So we are finished. We have a message with an empty <body>, so that empty<body> is what we hash.


Take the next sentence:

   In more formal terms, the "simple" body
   canonicalization algorithm converts "0*CRLF" at the end of the body
   to a single "CRLF".

That is supposed to produce the same result, so start over with theoriginal message:


   Field: foobar<CRLF>.
   <CRLF>
   <CRLF>
   .<CRLF>

of which we have already seen that "<CRLF>" is the body.

Indeed if contains ""0*CRLF" at its end (in fact, it contains 1*CRLF), sowe convert it to "<CRLF>", and that is what we hash.

Therefore, the description in the first four sentences produces adifferent result to the supposedly identifal description in the fifthsentence.


Q.E.D.

Now it appears that some implementations have followed one interpretationand some the other, so something needs to be fixed. My suggested wordingis:

The "simple" body canonicalization removes empty lines from the end ofthebody until either the last line is non-empty, or no lines remain. Anempty

   line is a line of zero length after removal of any terminating CRLF. If
   the body is not now empty and the last line is not already terminated by
   CRLF, a CRLF is added to it.

      INFORMATIVE NOTE: Following [RFC 2822}, the CRLF which separates the
      header fields from the body is NOT part of the body, and therefore is
      never presented to the signing or verification algorithm. In the case

of a pure binary message (such as one with aContent-Transfer-Encodingof 'binary') the concept of "lines" may not be meaningful.Nevertheless,wherever the pair of octets that represent CRLF happens to occur,that

      is to be considered as the end of a "line" for the purposes of this
      canonicalization algorithm.

That follows what I consider to be both the spirit and the letter of thefirst four sentences, at the expense of ignoring and renmoving the fifthsentence.


In particular, it leads to the easily remembered invariant:

   "After canonicalization, there will NEVER be an empty line at the end
    of what remains to be hashed."

--
Charles H. Lindsey ---------At Home, doing my own thing------------------------

Tel: +44 161 436 6131 Web: http://www.cs.man.ac.uk/~chl

Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
_______________________________________________

NOTE WELL: This list operates according tohttp://mipassoc.org/dkim/ietf-list-rules.html