ietf-822
[Top] [All Lists]

Re: Ohta

1993-03-01 14:54:12
... Han unification is a data error.  So-called ``identical'' characters
have different graphical characteristics, different linguistic semantics, and
different interpretations in the different East Asian languages which use
them.  It is very important to have the characters ordered in the proper (and
unique) order for each of these languages.

As I understand it, Unicode/10646/whatever makes no attempt to address
issues of semantics, interpretation, or collating order; criticizing it
for not solving these problems strikes me as distinctly peculiar.
Semantics and interpretation have never been character-set issues at all;
ASCII does not tell you that "PUXE" on a door in Brazil is pronounced
(roughly) "push", or that it in fact means "pull" (!), never mind telling
you that it indicates the direction in which the door opens.  And there
has never been any particularly strong relationship between character-set
ordering and any other kind of ordering; not even English sorts in ASCII
order (with uppercase Z preceding lowercase A).  These issues are clearly
important, but they are irrelevant to the merits (or lack thereof) of a
character set.

The fundamental problem of Han unification is whether the graphical
characteristics of supposedly-identical characters are close enough
that pretending they are identical will not cause problems.  It is not
necessary that their appearances really *be* identical; there are at
least three distinct graphical forms of the digit "4" in use in North
America (in fact, a credit-card bill I happen to have handy uses all
three on the same sheet of paper...), with no problems resulting.
What matters is questions like "will these different forms ever
appear in the same document with distinct meanings, in the absence
of clear contextual distinctions (e.g. different languages)?".

Han unification seems to me to be a very Euro-centric viewpoint...
... a character set issue that is of major
importance to nearly a 1/3 of the world's population is treated lightly.

Considering an issue with great care and deep insight is not incompatible
with deciding that other issues are more important.  As anyone who has
worked on standards knows, there is a very large difference between
ignoring a proposal and rejecting it.  Of course, it's common for the
proponents of rejected proposals to claim that the merit of their idea
is so obvious that it would surely have been accepted if the standards
committee had really paid attention to it.  Not true.

These 16-bit efficiency issues will come to be seen as being as silly as the
concerns that somebody is ``wasting system resources'' by running a 32K
program instead of a 24K program that uses overlays seems today.

I believe that Knuth once observed, roughly, "in no other engineering
discipline is a well-understood, easily-obtained 10% improvement viewed
with contempt".

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread] Current Thread [Next in Thread>