[Top] [All Lists]

Re: Revisiting RFC 2822 grammar (quoted-pair)

2004-01-18 07:45:09

Adam M. Costello wrote:

Bruce Lilly <blilly(_at_)verizon(_dot_)net> wrote:

Adam M. Costello wrote:

Consider the following example:

From: "foo\
bar" <blah(_at_)example>

You have misquoted me (or more likely, your MUA has).  The example was:

From: "foo\
bar" <blah(_at_)example>

The space before "bar" is crucial, because the alternative (without the
space) is another equally interesting example:
Yes, my MUA (Mozilla 1.6) did it.

In fact, when you presented your own example, I thought you omitted the
space deliberately:

From: "Foo Bar" <"foo\

I did. It was intended that the CRLF was part of the local-part (ignoring whether or not that was sensible or advisable). And I didn't want there to be confusion about
whether it was CRLF or CRLFSP.

I don't expect us to be able to settle this.  I think RFC 822 is not
self-consistent on this issue.

Agreed, and that's an indication in favor of deprecation.

If backslash-escaped CR and LF (ignoring NUL for the moment) were
permitted, one could have:

From: "foo\CR\LFbar" <blah(_at_)example>

etc., which ought to present no problems;

Until it gets converted to the local line-ending conventions.  Your
example contains all three possibilities: CR not followed by LF, LF not
preceeded by CR, and CRLF (terminating the field).  Imagine saving this
message to an mbox file on a Unix machine, where lines are terminated
by LF.  How will you do it?  Normally CRLF gets translated to LF, but
that's not reversible if the input already contains LF not preceeded by
In this case, there's no reason to unescape before saving, and therefore there's no
CRLF (as opposed to \CR\LF) to convert.

Conversion of line endings is tricky business, and I'd recommend against it except when saving an attachment of text type. I suspect there would be problems on many systems
saving a message with structure:

multipart/mixed, content-transfer-encoding 8bit
   application/octet-stream, content-transfer-encoding 8bit
       some binary content including 0x0a 0x0d 0x0a 0x0d

Assuming line endings must be converted, doing it correctly requires parsing the
MIME structure -- the embedded 0x0d 0x0a in the binary content must not be
altered. In the example if 0x0d 0x0a (i.e. CRLF) is converted to a lone 0x0a, the
four-octet sequence becomes the three-octet sequence 0x0a 0x0a 0x0d.  As you
noted, that's not a reversible transformation; it's likely to yield the five-octet
sequence 0x0d 0x0a 0x0d 0x0a 0x0d.

Other problems:  How would this field display?  Could it be cut and

Display is one issue. Cut and paste should be verbatim, i.e. using the on-the-wire

I think any sort of control characters in header fields, other than
CRLF (as a unit) and maybe TAB, is asking for headaches.  Even TAB is
somewhat troublesome.
True, including security implications for some control characters. Probably the safest for display purposes is to use a textual representation for control characters, possibly with some form of highlighting to avoid confusion with literal text.

P.S. The tendency of your MUA to drop spaces at the beginnings of lines
is probably related to its use (or misuse) of format=flowed.

I notice that each of your paragraphs consists of multiple "paragraphs"
in the
format=flowed sense, so that they don't actually flow, but instead end
up looking
like this paragraph.
format=flowed is another issue, regarding which I haven't yet added my 2 cents worth. So here it is, FWIW. Format=flowed is far too complex for text/plain -- it amounts to a markup language (a minimalist one, perhaps, but markup nevertheless). Implementation differences (and/or lack of implementation support) is probably indicative of the complexity and incompatibility with text/plain. IMO it would be best for it to have its own subtype,
just as other markup languages do (e.g. text/html, text/richtext).