ietf-dkim
[Top] [All Lists]

[ietf-dkim] More on naked CR canonicalization

2006-07-14 16:58:00

In a word, it's very, very real, and I believe that simple canonicalization will need to be changed to accommodate this. I just read through RFC 2822 and it has this
to say:

2.3. Body

  The body of a message is simply lines of US-ASCII characters.  The
  only two limitations on the body are as follows:

  - CR and LF MUST only occur together as CRLF; they MUST NOT appear
    independently in the body.

  - Lines of characters in the body MUST be limited to 998 characters,
    and SHOULD be limited to 78 characters, excluding the CRLF.

  Note: As was stated earlier, there are other standards documents,
  specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049]
  that extend this standard to allow for different sorts of message
  bodies.  Again, these mechanisms are beyond the scope of this
  document.

and Section 4.0:

  Finally, certain characters that were formerly allowed in messages
  appear in this section.  The NUL character (ASCII value 0) was once
  allowed, but is no longer for compatibility reasons.  CR and LF were
  allowed to appear in messages other than as CRLF; this use is also
  shown here.

This says that free (naked) CR or LF's are specifically *disallowed* in RFC2822 compliant implementations. However, section 4 mentions the uncomfortable truth that they were allowed in previous (ie, 822) revisions. In fact, those implementations are as common sendmail since even sendmail 8.13.6 still allows naked CR's to be
transmitted instead of CR stuffed as it does with LF stuffing.

As I said, what we've been seeing is that naked CR's are a not uncommon occurrance, and my suspicion is that they are often due to windows file system's use of CRLF as its line terminator, though I'm sure there are lots of other reasons that this is common.

To my mind, the canonicalization problem is actually a problem of simple too as we really need to accommodate the entire[2]822 universe, and regardless of whether 2822 outlaws it, there's lots of perfectly useful applications that are unaware or
unconcerned that they are not in spec.

I believe that the *only* change needed to simple is to say that free CR or LF
as defined in RFC2822 MUST be canonicalized to CRLF, and this problem
goes away.

      Mike

_______________________________________________
NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html