Re: printable wide character (was "multibyte") encodings

Most of this is personal speculation and bias, not any claim to
knowledge or crystal-ball-ownership.


Yes, unfortunately, a lot of standards work involves "personal
speculation and bias", since nobody can accurately predict the future.

   SC2 claims this is a 32 bit standard.  Until and unless they change
that, it is what I was referring to.


Good thing I asked then, coz my initial assumption was that you were
referring to the 16-bit form, since you had been talking about the
number of octets required for Asian characters in UTF-2 (i.e. 3, 4 or
more).

  It seems to me that you are making a series of closely-related
assumptions here.  The main one is that, within a group of people
communicating in a particular language (English within the US, Japanese
within Japan, even conversations *in* Japanese between people in Japan
and Japanese-speakers in the US), we are never going to see 10646 used
and, hence, what we need is some (possibly extraordinary) mechanism to
handling an "occasional Japanese character" now and then.


Yes, that is what I'm assuming, except for the 3rd situation that you
mentioned ("conversations *in* Japanese between people in Japan and
Japanese-speakers in the US").  I can imagine such conversations being
carried out either in a UTF-2 based encoding or in ISO-2022-JP, even
if such traffic were limited.

But
what I've been hearing on this list -- a bit recently and a very great
deal a year ago -- are variations on the theme of "now we have 10646,
and it is universal, let's try to move quickly to it and drop the use of
all 'local' character sets (like ASCII) in Internet mail".  If that is
the intent, then a system that rewards some languages and penalizes
others is a pretty terrible idea.


I'd like to hear what Henry's "intent" was.


Taking a different perspective for the moment, two of the main things
that people strive for in this area are:

    (1) to avoid hostility to the installed base

    (2) to use only one multilingual charset for all Internet mail

The email gurus on this list have frequently talked about (1), and as
you may guess, I agree with them rather strongly.  I also agree with
the people who talk about (2).

In a forum such as this I hate to do this, but for the purposes of
discussion let's focus on Japan and the US, and then try to reach the
goals (1) and (2).  It is immediately obvious (at least to me) that an
ISO-2022-JP based method is the only way to comply with both (1) and
(2) in Internet email.

Unfortunately, the average American does not even want to discuss
2022, let alone implement and install it.  But both the Japanese and
the Americans want to extend their current encodings for
multilinguality.  That is why I'm saying that we're doomed to at least
two multilingual encodings on the Internet.

However, I don't have a crystal ball either, so...


Cheers,
Erik


PS  Usenet news in America would have problems with ESC, so an ISO-2022-JP
    based multilingual encoding would be hostile to that installed base.

    This, and arguments about statefulness, etc, are the probable reasons
    for the average American's reluctance to discuss 2022.

    Sigh.  But it's not the end of the world.  Let's just get on with UTF-2.