[Top] [All Lists]

Re: Prohibition of EBCDIC in text/plain

1995-06-09 09:25:19
The choice, then, is simple: Either you ban the use of stray CR and LF in 
or else you require agents to maintain a comprehensive list of all the
character sets and whether or not conversion to canonical form is possible
and/or necessary.

There's something interesting about this. We've been unable to control
the proliferation of zillions of charset registrations, so the idea of
'a comprehensive list of all the character sets' sounds daunting.
However, it isn't necessary that the agent know all of the charset
registrations if it were possible to determine SOLELY FROM THE CHARSET
NAME whether such the charset used something other than CR and LF as a
line break sequence, even with a stupid lexical trick like

This is just a way of hiding canonicalization information in the charset, as
opposed to having its own new field. As such, it inherits all of the same
problems that the new field approach has, not the least of which is that it is
a fundamental change that would both require resetting the standard to proposed
as well as modifications to all existing MIME agents.

If we were to choose this path I'd prefer to have an explicit field that would
work for types other than text. As long as you're going to break everything you
might as well do it right... But we've already decided against all this -- the
loss of the last four years would effectively kill MIME.

I apologize; I hate to rehash something that you've discussed
endlessly without having also suffered through the hundreds and
hundreds of messages, but I think disallowing the simplest binary
representation of 16-bit charsets on-the-wire seems like a serious
restriction and worthy of just a little more mooting.

Well, I would first take issue with it being uncategorically simpler. In fact
it depends on how you define "simple" -- my definition of "simple" would
involve backwards compatibility which would translate into the raw 16 bit form
being substantially more complex than UTF-8 or UTF-8.

And second, there is nothing about this that disallows the use of 16-bit
Unicode. There is in fact no problem with using in MIME -- the only problem
is using it in subtypes of text. All that's needed is a definition of either
a subtype of application or the definition of a new top-level content-type
(e.g. widetext). This does not seem like an undue burden given that the
viewing application is guaranteed to be substantially different for information
of this sort.