In a word, it's very, very real, and I believe that simple
canonicalization will need
to be changed to accommodate this. I just read through RFC 2822 and it
has this
to say:
2.3. Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
- CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.
- Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other standards documents,
specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049]
that extend this standard to allow for different sorts of message
bodies. Again, these mechanisms are beyond the scope of this
document.
and Section 4.0:
Finally, certain characters that were formerly allowed in messages
appear in this section. The NUL character (ASCII value 0) was once
allowed, but is no longer for compatibility reasons. CR and LF were
allowed to appear in messages other than as CRLF; this use is also
shown here.
This says that free (naked) CR or LF's are specifically *disallowed* in
RFC2822
compliant implementations. However, section 4 mentions the uncomfortable
truth
that they were allowed in previous (ie, 822) revisions. In fact, those
implementations
are as common sendmail since even sendmail 8.13.6 still allows naked
CR's to be
transmitted instead of CR stuffed as it does with LF stuffing.
As I said, what we've been seeing is that naked CR's are a not uncommon
occurrance,
and my suspicion is that they are often due to windows file system's use
of CRLF as
its line terminator, though I'm sure there are lots of other reasons
that this is common.
To my mind, the canonicalization problem is actually a problem of simple
too as we
really need to accommodate the entire[2]822 universe, and regardless of
whether
2822 outlaws it, there's lots of perfectly useful applications that are
unaware or
unconcerned that they are not in spec.
I believe that the *only* change needed to simple is to say that free CR
or LF
as defined in RFC2822 MUST be canonicalized to CRLF, and this problem
goes away.
Mike
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html