On Sun, 07 Jan 2007 18:49:50 -0000, Eric Allman <eric+dkim(_at_)sendmail(_dot_)org>
wrote:
I have (finally) managed to slog my way through all the messages on this
topic. Let me start out by saying that I don't see the ambiguity in the
current text:
If there is no trailing CRLF on the message, a CRLF is added.
It makes no other changes to the message body. In more formal
terms, the "simple" body canonicalization algorithm converts
"0*CRLF" at the end of the body to a single "CRLF".
So if the message ends without a CRLF (which should only be possible
using CHUNKING) one gets added. In particular, this is important
because if a message is sent using CHUNKING through one relay and DATA
through another, the CRLF will have to be added to get the <CRLF>.<CRLF>.
Indeed there is no ambiguity in that, but that is because you have only
quoted half the text. The full text is:
The "simple" body canonicalization algorithm ignores all empty lines
at the end of the message body. An empty line is a line of zero
length after removal of the line terminator. If there is no trailing
CRLF on the message, a CRLF is added. It makes no other changes to
the message body. In more formal terms, the "simple" body
canonicalization algorithm converts "0*CRLF" at the end of the body
to a single "CRLF".
Observe carefully that the text some times tells you to consider the
"message", and somtimes the "message body" (which I take to mean exactly
the <body>, if any, defined by RFC 2822).
Consider the example, in DATA format:
Field: foobar<CRLF>.
<CRLF>
<CRLF>
.<CRLF>
The ".<CRLF>" is evidently not a part of either the message or of the
message body. The "message body" consists of "<CRLF>". Let us apply the
sentences of 3.4.3 one by one.
The "simple" body canonicalization algorithm ignores all empty lines
at the end of the message body.
To see what an "empty line" is we need one more sentence;
An empty line is a line of zero
length after removal of the line terminator.
So the line "<CRLF>" (which is the whole of the message body) IS an empty
line.
That empty line is at the end of the message body, so we ignore it. That
leaves the message
Field: foobar<CRLF>.
<CRLF>
Take the next sentence:
If there is no trailing
CRLF on the message, a CRLF is added.
There IS a trailing CRLF on the message (NB, it does not say "message
body" there), so we add nothing. We still have:
Field: foobar<CRLF>.
<CRLF>
Take the next sentence:
It makes no other changes to
the message body.
So we are finished. We have a message with an empty <body>, so that empty
<body> is what we hash.
Take the next sentence:
In more formal terms, the "simple" body
canonicalization algorithm converts "0*CRLF" at the end of the body
to a single "CRLF".
That is supposed to produce the same result, so start over with the
original message:
Field: foobar<CRLF>.
<CRLF>
<CRLF>
.<CRLF>
of which we have already seen that "<CRLF>" is the body.
Indeed if contains ""0*CRLF" at its end (in fact, it contains 1*CRLF), so
we convert it to "<CRLF>", and that is what we hash.
Therefore, the description in the first four sentences produces a
different result to the supposedly identifal description in the fifth
sentence.
Q.E.D.
Now it appears that some implementations have followed one interpretation
and some the other, so something needs to be fixed. My suggested wording
is:
The "simple" body canonicalization removes empty lines from the end of
the
body until either the last line is non-empty, or no lines remain. An
empty
line is a line of zero length after removal of any terminating CRLF. If
the body is not now empty and the last line is not already terminated by
CRLF, a CRLF is added to it.
INFORMATIVE NOTE: Following [RFC 2822}, the CRLF which separates the
header fields from the body is NOT part of the body, and therefore is
never presented to the signing or verification algorithm. In the case
of a pure binary message (such as one with a
Content-Transfer-Encoding
of 'binary') the concept of "lines" may not be meaningful.
Nevertheless,
wherever the pair of octets that represent CRLF happens to occur,
that
is to be considered as the end of a "line" for the purposes of this
canonicalization algorithm.
That follows what I consider to be both the spirit and the letter of the
first four sentences, at the expense of ignoring and renmoving the fifth
sentence.
In particular, it leads to the easily remembered invariant:
"After canonicalization, there will NEVER be an empty line at the end
of what remains to be hashed."
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131
Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html