Re: printable wide character (was "multibyte") encodings

(Erik's mail that I'm responding to here is rather old, but I'm catching
up on backlog that built up while I was away for a while.)

what I've been hearing on this list -- a bit recently and a very great
deal a year ago -- are variations on the theme of "now we have 10646,
and it is universal, let's try to move quickly to it and drop the use of
all 'local' character sets (like ASCII) in Internet mail"...


I'd like to hear what Henry's "intent" was.


While there may be some people who would advocate this -- I frankly catch
strong hints of it in the "let's forget encodings and just rewrite all
the protocols for 16-bit characters" suggestions -- I'm not among them.

As a low-level example, even if Unicode becomes the standard character
set everywhere, I strongly suspect that there will be differences in
encoding:  the English-speakers won't want to use two octets for each
ASCII character, so they would use something like UTF-2, while the
Japanese-speakers (who need two octets pretty well all the time anyway)
might want to use the flat encoding to minimize overhead for Kanji.

And of course, that "if" is a big one.  In particular, 2022-JP is
not going to go away any time soon, and at least some subset of
Japanese dislike Unicode enough to prefer alternatives.

The most we can hope for, I think, is to establish a preferred default
that applies in the absence of pre-agreement between the two ends, or
in cases where the material is being archived or broadcast and there
is no possibility of pre-agreement.  For this purpose, there is strong
reason to prefer something like UTF-2 that will minimize breakage in
the vast body of 8-bit code.

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry