[Top] [All Lists]

Re: Prohibition of EBCDIC in text/plain

1995-06-08 15:31:08
while fighting another battle, I came across this issue again.
Ned's latest draft says:

The canonical form of any MIME text type MUST represent a line
break as a CRLF sequence.  Similarly, any occurrence of CRLF
in text MUST represent a line break.  Use of CR and LF outside
of line break sequences is also forbidden.

This forbids, among others, ISO 10646 UCS-2 and EBCDIC as text/plain
character sets.


In the transfer form, it is easy to tell why.
However, why should the message I have written here be outlawed?

Mostly because of conversions to and from local canonical form. Many existing
mail systems simply convert text material to local canonical form, which in
turn can change line termination sequences from CRLF to CR, LF, or something
out-of-band. These conversions need to be transparent, so stray CR and LF that
aren't part of a line termination sequence are disallowed.

Transfer encodings do not necessary protect you from such conversions. See

Was this message legal under RFC 1521 rules?

Yes, but it did not interoperate across platforms.

The best reason I could think of was to keep sanity when crossing gateways
that routinely remove content-transfer-encodings, but that does not strike
me as the most compelling thing in the world.

It seems pretty compelling to me, especially if we ever intend to upgrade the
SMTP transport infrastructure.

However, there are also cases where mail agents (not necessary gateways)
absolutely have to "routinely remove" transfer encodings. No other course of
action is possible, since people on the non-MIME side of things tend to object
pretty strongly to getting a bunch of "base64 shit" (words I've heard more than
once, I'm afraid) in their mail.

The choice, then, is simple: Either you ban the use of stray CR and LF in text
or else you require agents to maintain a comprehensive list of all the
character sets and whether or not conversion to canonical form is possible
and/or necessary. (Steve Dorner in fact proposed adding a new parameter
available for all content types to indicate whether or not canonicalization
should be done.)

Also, it is "cleaner" to have the number of possible charsets be limited,
but is this better done by recommendation or by fiat?

This was not imposed by fiat. This issue has been discussed endlessly -- I have
many hundreds of messages from a bunch of different lists in my archives on
this topic. The current documents are the result of input from dozens of
people, including John Myers, Chris Newman, Steve Dorner, Keith Moore, John
Klensin, myself, and lots of others as well. The current solution was the best
we could come up with.