Re: Revisiting RFC 2822 grammar (quoted-pair)


Adam M. Costello wrote:

Bruce Lilly <blilly(_at_)verizon(_dot_)net> wrote:

Adam M. Costello wrote:

Consider the following example:

From: "foo\
bar" <blah(_at_)example>


You have misquoted me (or more likely, your MUA has).  The example was:

From: "foo\
bar" <blah(_at_)example>

The space before "bar" is crucial, because the alternative (without the
space) is another equally interesting example:

Yes, my MUA (Mozilla 1.6) did it.

In fact, when you presented your own example, I thought you omitted the
space deliberately:

From: "Foo Bar" <"foo\
bar"@example>

I did. It was intended that the CRLF was part of the local-part(ignoring whetheror not that was sensible or advisable). And I didn't want there to beconfusion about

whether it was CRLF or CRLFSP.

I don't expect us to be able to settle this.  I think RFC 822 is not
self-consistent on this issue.

Agreed, and that's an indication in favor of deprecation.

If backslash-escaped CR and LF (ignoring NUL for the moment) were
permitted, one could have:

From: "foo\CR\LFbar" <blah(_at_)example>

etc., which ought to present no problems;


Until it gets converted to the local line-ending conventions.  Your
example contains all three possibilities: CR not followed by LF, LF not
preceeded by CR, and CRLF (terminating the field).  Imagine saving this
message to an mbox file on a Unix machine, where lines are terminated
by LF.  How will you do it?  Normally CRLF gets translated to LF, but
that's not reversible if the input already contains LF not preceeded by
CR.

In this case, there's no reason to unescape before saving, and thereforethere's no

CRLF (as opposed to \CR\LF) to convert.

Conversion of line endings is tricky business, and I'd recommend againstit except whensaving an attachment of text type. I suspect there would be problems onmany systems

saving a message with structure:

multipart/mixed, content-transfer-encoding 8bit
   application/octet-stream, content-transfer-encoding 8bit
       some binary content including 0x0a 0x0d 0x0a 0x0d

Assuming line endings must be converted, doing it correctly requiresparsing the

MIME structure -- the embedded 0x0d 0x0a in the binary content must not be

altered. In the example if 0x0d 0x0a (i.e. CRLF) is converted to a lone0x0a, the

four-octet sequence becomes the three-octet sequence 0x0a 0x0a 0x0d.  As you

noted, that's not a reversible transformation; it's likely to yield thefive-octet

sequence 0x0d 0x0a 0x0d 0x0a 0x0d.

Other problems:  How would this field display?  Could it be cut and
pasted?

Display is one issue. Cut and paste should be verbatim, i.e. using theon-the-wire

representation.

I think any sort of control characters in header fields, other than
CRLF (as a unit) and maybe TAB, is asking for headaches.  Even TAB is
somewhat troublesome.

True, including security implications for some control characters.Probably thesafest for display purposes is to use a textual representation forcontrol characters,possibly with some form of highlighting to avoid confusion with literaltext.

P.S. The tendency of your MUA to drop spaces at the beginnings of lines
is probably related to its use (or misuse) of format=flowed.

I notice that each of your paragraphs consists of multiple "paragraphs"
in the
format=flowed sense, so that they don't actually flow, but instead end
up looking
like this paragraph.

format=flowed is another issue, regarding which I haven't yet added my 2cents worth.So here it is, FWIW. Format=flowed is far too complex for text/plain --it amounts toa markup language (a minimalist one, perhaps, but markup nevertheless).Implementationdifferences (and/or lack of implementation support) is probablyindicative of the complexityand incompatibility with text/plain. IMO it would be best for it tohave its own subtype,

just as other markup languages do (e.g. text/html, text/richtext).