ietf-822
[Top] [All Lists]

Re: printable wide character (was "multibyte") encodings

1993-01-22 00:53:07

2) "UTF-2":  canonical form is a UTF-2 stream.

The problem with UTF-2 is that it uses the 8th bit, necessitating a
Content-Transfer-Encoding of either Base64 (which would make
mostly-English messages unreadable in the installed base) or
Quoted-Printable, which would be rather lengthy.

With pure ASCII, 7BIT encoding is sufficient.
With mostly ASCII text, Quoted-printable is probably adequate.
It depends on how often the non-ASCII characters are used.

For the MIME standard I do not think we should use encodings like UTF-2.
Instead we need an alternativ to Base64 for encoding ISO 10646 coded characters.
Observe that I say ISO 10646, not Unicode! The encoding must be able to handle
32 bits per character code.
We should probably have an IsoBase64 and an IsoBase128 for efficient transfer
over 7 bit and 8 bit channels.



I thought we were originally aiming for ONE universal charset.

I don't think we are likely to settle on any single charset anytime
soon...since no solution is likely to please everybody...but UTF-2 or
unicode might become the charset of choice for mixed-language text.

We should aim for ONE character coding for use when transfering mail with
MIME (ISO 10646). What you use locally at your site is your concern.
ISO 10646 is the logical choice. It has ASCII, ISO 8859-1 and Unicode as
true subsets both in character graphs and character coding.

    Dan

--
Dan Oscarsson
Telia Research AB                       Email: 
Dan(_dot_)Oscarsson(_at_)malmo(_dot_)trab(_dot_)se
Box 85
201 20  Malmo, Sweden