Adam M. Costello wrote:
Bruce Lilly <blilly(_at_)verizon(_dot_)net> wrote:
Adam M. Costello wrote:
Consider the following example:
From: "foo\
bar" <blah(_at_)example>
You have misquoted me (or more likely, your MUA has). The example was:
From: "foo\
bar" <blah(_at_)example>
The space before "bar" is crucial, because the alternative (without the
space) is another equally interesting example:
Yes, my MUA (Mozilla 1.6) did it.
In fact, when you presented your own example, I thought you omitted the
space deliberately:
From: "Foo Bar" <"foo\
bar"@example>
I did. It was intended that the CRLF was part of the local-part
(ignoring whether
or not that was sensible or advisable). And I didn't want there to be
confusion about
whether it was CRLF or CRLFSP.
I don't expect us to be able to settle this. I think RFC 822 is not
self-consistent on this issue.
Agreed, and that's an indication in favor of deprecation.
If backslash-escaped CR and LF (ignoring NUL for the moment) were
permitted, one could have:
From: "foo\CR\LFbar" <blah(_at_)example>
etc., which ought to present no problems;
Until it gets converted to the local line-ending conventions. Your
example contains all three possibilities: CR not followed by LF, LF not
preceeded by CR, and CRLF (terminating the field). Imagine saving this
message to an mbox file on a Unix machine, where lines are terminated
by LF. How will you do it? Normally CRLF gets translated to LF, but
that's not reversible if the input already contains LF not preceeded by
CR.
In this case, there's no reason to unescape before saving, and therefore
there's no
CRLF (as opposed to \CR\LF) to convert.
Conversion of line endings is tricky business, and I'd recommend against
it except when
saving an attachment of text type. I suspect there would be problems on
many systems
saving a message with structure:
multipart/mixed, content-transfer-encoding 8bit
application/octet-stream, content-transfer-encoding 8bit
some binary content including 0x0a 0x0d 0x0a 0x0d
Assuming line endings must be converted, doing it correctly requires
parsing the
MIME structure -- the embedded 0x0d 0x0a in the binary content must not be
altered. In the example if 0x0d 0x0a (i.e. CRLF) is converted to a lone
0x0a, the
four-octet sequence becomes the three-octet sequence 0x0a 0x0a 0x0d. As you
noted, that's not a reversible transformation; it's likely to yield the
five-octet
sequence 0x0d 0x0a 0x0d 0x0a 0x0d.
Other problems: How would this field display? Could it be cut and
pasted?
Display is one issue. Cut and paste should be verbatim, i.e. using the
on-the-wire
representation.
I think any sort of control characters in header fields, other than
CRLF (as a unit) and maybe TAB, is asking for headaches. Even TAB is
somewhat troublesome.
True, including security implications for some control characters.
Probably the
safest for display purposes is to use a textual representation for
control characters,
possibly with some form of highlighting to avoid confusion with literal
text.
P.S. The tendency of your MUA to drop spaces at the beginnings of lines
is probably related to its use (or misuse) of format=flowed.
I notice that each of your paragraphs consists of multiple "paragraphs"
in the
format=flowed sense, so that they don't actually flow, but instead end
up looking
like this paragraph.
format=flowed is another issue, regarding which I haven't yet added my 2
cents worth.
So here it is, FWIW. Format=flowed is far too complex for text/plain --
it amounts to
a markup language (a minimalist one, perhaps, but markup nevertheless).
Implementation
differences (and/or lack of implementation support) is probably
indicative of the complexity
and incompatibility with text/plain. IMO it would be best for it to
have its own subtype,
just as other markup languages do (e.g. text/html, text/richtext).