Re: printable multibyte encodings

Masataka Ohta,

Your arguments are generally of the category "non sequitur", the logical
fallacy that the conclusion "does not follow" from its premises.  This
gets a little annoying after a while, so please try to employ more rigor
to your arguments and analysis.

Apart from the non sequiturs, you are factually wrong in directing your
flames against the Europeans.

ISO DIS 10646, which provided different planes for Chinese, Japanese,
and Korean was a European proposal, in the main.  This DIS was voted
down, as you might recall, in part because of the appearance of a
renegade group of character set people representing U.S. companies who
openly threatened to ignore an ISO standard, and started to create their
own, resulting in a dramatically inferior product in many serious and
knowledgeable minds.  Europeans were not even in on this shenanigan to
start with, and European input has been minor after JTC 1/SC 2/WG 2 went
ahead with a merger, effectively giving in to Unicode.

Many are those who have regretted this, and thought that ISO should take
a stronger stance against such irresponsibility and "we have market
share" bullying, including the intensive marketing of a competing
standard by the Unicode consortium, to the direct detriment of ISO 10646
as proposed.  The inclusion of a "UCS Transformation Form" to make it
possible to use existing equipment is a blemish of biblical proportions
on this standard, and an inescapable proof of correctness of the
argument that caused the large unused areas in the original proposal and
the complex encoding scheme in the initial DIS.  The need for more code
space also turned out to be hysterical, as the unused codes in ISO
10646-1:1993 number more than those taken by the C0 and C1 space of ISO
DIS 10646 (the former).

So who did this?  The Unicoders.  Not Americans, but a few American
companies with "market share" behind them.  Argumentum ad baculum, if I
ever saw it in practice.  The Europeans were largely overrun by this
argument, and those who approved the first DIS were seriously arguing
against a merger.  Because ISO rules make it very difficult to get past
two negative votes in a row, and there was no way but to attempt a
compromise, we were pragmatic enough to yield to Unicode.  Pragmatism
never pays off, and the pragmatics of the Japanese, Chinese and perhaps
Korean position today is to continue to use ISO 2022-based multibyte
standards.  Thus, with severe myopia in hindsight, it might look as if
Unicode was pushed by Europeans because Europeans are the only ones who
are likely to use it, and even they will continue to use ISO 8859-1 for
a long time.

If you look for the facts, Masataka Ohta, you'll find that Europeans
were as much overrun by Unicode as you feel that you are.  Why, then,
did this thing get adopted?  Excellent question.  Welcome to the
political world of ISO standards, and the fact that lots of countries
rubber-stamp the proposals and vote regardless of the opinions of
national experts, who avoid the process precisely because of the mis-
representations that some countries and individuals engage in, and the
need to fight it.  For instance, Turkey pushed the replacement of
Icelandic characters in the ISO 8859-1 page (U00xx) with their own, not
even bothering to put the Icelandic characters somewhere else.  Large
amounts of money and time went down the drain to stop this silliness.
Voting on ISO 10646 was overwhelming, partly because of this turkey.  It
may have been easier for Japan to argue against unification, and be
heard, in the absence of the Turkish plot.  Go flame them, if you
please.

I can understand that Americans get really pissed off by the America-
bashing that some European countries (e.g., Denmark in SC 22/WG 14)
engage in as a matter of routine, and I can understand that Europeans
get really pissed off at people who blame them for things they didn't
have any choice in (however real or perceived that is).  Finally, the
message I get from you is a Nippon superiority complex that has been
hurt, much like some European countries (e.g., Denmark in SC 22/WG 14)
display an inferiority complex and use it to instill guilt in the U.S.
delegations for being larger than them.  After having seen a couple
countries send delegates to "voice the Irish opinion" or some such
bullshit at the WG level, I can understand that people get disgusted and
leave.  That we get a lot of this nationalistic bullshit from Japan to
boot is enough to make me puke, and if it weren't for the fact that I
_care_ about these standards and about character sets, I wouldn't have
held out this long.  No wonder people leave this arena.  And you're
causing serious people who could have helped your case to leave by being
a major pain in the butt, Masataka Ohta.



ANYWAY, now that we have an International Standard for a Universal
Multiple-Octet Coded Character Set, let's figure out ways to convert
into and out of this set.  I welcome the unique names that have been
assigned to each characters, because it gives us a unique handle that we
can refer to in whatever coding or representation that we actually use.
No, I don't expect ISO 10646 to be widely used or supported.  The
doubling of the bandwidth required to handle it in the raw, and/or the
doubled memory requirements for processing it (regardless of UTF
encoding) are enough reasons to avoid it for a while.  That said, I
expect it to work as an interchange and reference standard, much like
ISO 6937 was a reference standard seldom used (and now discontinued as a
project in ISO, or to be discontinued, I don't have the timing info).

The task, then, must be to describe other encoding schemes in terms of
ISO 10646, and provide mappings between the actually used encodings and
the Platonic ideal of ISO 10646.  For interchange purposes, this can be
used to map out of and into either a character code or a markup scheme
without information loss, where the "local" form is optimized for
processing, or storage, or whatever.  Awareness of which character set
and encoding is used must necessarily increase, and that's good.  Over
time, those systems that can handle the interchange form raw will win
out as more systems do interchange in ISO 10646, and we will gradually
move towards acceptance.  This might take ten years, or more, and in
that time, we might have a new contender for the acronym "UCS".

So, in conclusion, stop the whining and get on with your life, Ohta-san.
If you think ISO 10646 is so bad, come up with something better.  ISO
10646 is due for formal review in 1998 at the latest, and standards do
get demoted to "historical" also in ISO.  You can continue to use ISO
2022 as long as you want.  If it covers your needs, great.  If you wish
to talk to someone who can't hack it, agree on an interchange form.  My
bet is that that's going to be ISO 10646.  Remember that ASCII means
"American Standard Code for Information Interchange", a surprisingly
accurate name.  Think of UCS as UCSII.

Now, let's return to whatever it was that we were doing before this
erupted.  Non-denominational seasonal greetings to all of you, and a
special plea to Ohta-san to do his part of making peace on earth by
taking a long vacation.

Thank you for your time.

Best regards,
</Erik>
----
Erik Naggum                 ISO  8879 SGML                    +47 295 0313
                            ISO 10744 HyTime
<erik(_at_)naggum(_dot_)no>            ISO  9899 C                 Memento, 
terrigena
<enag(_at_)ifi(_dot_)uio(_dot_)no>           ISO 10646 UCS             Memento, 
vita brevis