Re: [Fwd: Re: [ietf-dkim] canonicalized null body and dkim]

Charles, you missed the cases introduced by RFC 3030 and the CHUNKING
ESMTP extension.

Further comments below.

        Tony Hansen
        tony(_at_)att(_dot_)com

Charles Lindsey wrote:

Let us be VERY careful here. Start from RFC 2822:

message         =       (fields / obs-fields)
                        [CRLF body]
body            =       *(*998text CRLF) *998text

So a <body> can be EMPTY, and its last line might not have a CRLF.

The CRLF following the header fields is NOT part of the <body>.

If the <body> is absent (indistinguishable from an empty <body>) that
CRLF after the header fields can be omitted.

Now look at RFC 2821:

   The mail data is terminated by a line containing only a period, that
   is, the character sequence "<CRLF>.<CRLF>" (see section 4.5.2).  This
   is the end of mail data indication.  Note that the first <CRLF> of
   this terminating sequence is also the <CRLF> that ends the final line
   of the data (message text) or, if there was no data, ends the DATA
   command itself.

So, even if you have a body with no CRLF, as permitted by RFC 2822, you
can't actually transmit it by RFC 2821


Correct, 2822 requires complete lines to be transmitted, that is, lines
ending in CRLF.

(well, you might transmit it by
UUCP, and you might encapsulate in in a message/rfc822 within some
multipart).


Add in RFC 3030 and CHUNKING, and you get the ways of transmitting
messages using ESMTP that are RFC 2822-compliant but not RFC
2821-compliant. Add in the MIME RFCs and you also eliminate the
requirements that the lines be limited to 998 characters and consist of
text.

So we have the following cases. The dotted lines enclose what is, by RFC
2822 definition, the <body>, and is therefore what will get hashed or
canonicalized by dkim-base, as currently worded. The ".CRLF" is the RFC
2821 DATA terminator.


Correct.

1) ordinary message with <body> of 1 non-empty line:

Last-Header: foobarCRLF
CRLF
---------------------
barbazCRLF
---------------------
.CRLF

2) <body> consisting of 2 empty lines

Last-Header: foobarCRLF
CRLF
---------------------
CRLF
CRLF
---------------------
.CRLF

3) <body> consisting of 1 empty line

Last-Header: foobarCRLF
CRLF
---------------------
CRLF
---------------------
.CRLF

4) <body> containing no lines

Last-Header: foobarCRLF
CRLF
---------------------
---------------------
.CRLF

5) message with absent <body>

Last-Header: foobarCRLF
.CRLF

Now apply simple canonicalization to all those cases, using:

   "In more formal terms, the "simple" body canonicalization algorithm
    converts "0*CRLF" at the end of the body to a single "CRLF"."

Making the entirely reasonable assumption that "body" means exactly what
RFC 2822 defines it to mean, then here is what gets hashed in all of
those cases:

1) ordinary message with <body> of 1 non-empty line:
---------------------
barbazCRLF
---------------------

2) <body> consisting of 2 empty lines
---------------------
CRLF
---------------------

3) <body> consisting of 1 empty line
---------------------
CRLF
---------------------

4) <body> containing no lines
---------------------
CRLF
---------------------

5) message with absent <body>
---------------------
---------------------


I contend that the current wording in base-07 also requires that example
5 canonicalize into a

---------------------
CRLF
---------------------

Even when the body doesn't exist, it still must be treated as having 0
lines following, which still canonicalize to a CRLF.

But even with my contention on case #5, I don't disagree with your
conclusions here:

That is undoubtedly what the "formal terms" in dkim-base undoubtedly SAY.

It is NOT what the "informal" words in dkim-base say.
It is NOT what version -01 of DK says.
It is NOT what version -06 of DK says.
It is NOT what Eric's three examples claim.
It is entirely possible that is is NOT what dkim-base was INTENDED to say.


That's why the issue was raised.

I firmly believe that we *intended* to canonicalize each of these cases
into the empty body

---------------------
---------------------

        Tony Hansen
        tony(_at_)att(_dot_)com

PS. For completeness, the only missing cases, after taking into
consideration RFC 3030 and MIME, are as follows. *These* are the reason
that the 0*CRLF rule was added and where it needs to be applied:

6) ordinary message with <body> of >1 non-empty line, not ending in CRLF

Content-Type: binary
Last-Header: foobarCRLF
CRLF
---------------------
somethingCRLF
anything
---------------------

7) ordinary message with <body> of 1 non-empty line, not ending in CRLF

Content-Type: binary
Last-Header: foobarCRLF
CRLF
---------------------
anything
---------------------

Now apply simple canonicalization to all those cases, using:

   "In more formal terms, the "simple" body canonicalization algorithm
    converts "0*CRLF" at the end of the body to a single "CRLF"."

This winds up adding a CRLF to the last line of both of these cases, so
here is what gets hashed in all of these additional cases:

6) ordinary message with <body> of >1 non-empty line, not ending in CRLF

---------------------
somethingCRLF
anythingCRLF
---------------------

7) ordinary message with <body> of 1 non-empty line, not ending in CRLF

---------------------
anythingCRLF
---------------------

_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html