Excerpts from mail: 16-Dec-92 Re: printable multibyte enc..
henry(_at_)zoo(_dot_)toronto(_dot_)edu (1116)
Just a straight 16-bit representation of a 16-bit character has two problems.
First, if most of the characters are in fact ASCII -- often the case -- then
it is twice as big as it needs to be. Second, it often includes octets that
cause trouble for software, e.g. ASCII NULs (not an issue if it's inside
a MIME encoding, but significant in other contexts). Straight-16-bit would
often be the preferred representation inside programs, but for storage and
transmission (and use in filenames etc.), an encoding which avoids these
problems is desirable.
Well, if most of the characters are in fact ASCII, you can then us
quoted-printable, right? And troublesome characters like NUL can be
encoded using either quoted-printable OR base64, right? I still don't
see why 16-bit text needs to be handled any differently than any other
binary data. Again, I may just be very dense on this issue, but I
really don't see any problems.
UTF-2, in particular, is an encoding of 16-bit characters that represents
ASCII characters as themselves (one octet apiece) and is "file-system
safe", avoiding octets that have special meaning to common software.
That's fine. It seems to me that the right way to do 10646 in MIME is
to have a character set something like "ISO-10646-UTF-2", and to say
that the raw data for a MIME text/* entity of this character set is text
in UTF-2. Then, to encode it for safe mail transport, we treat it as an
octet stream and apply either base64 or quoted-printable as a transport
encoding, dependng on whether or not this particular text is mostly
ASCII. I don't see where we need any new mechanisms. -- Nathaniel