Re: 10646, and all that

     Now the issue we are fighting with is whether the mappings of 10646
provide sufficiently precise character abstractions that the results can
be rendered into displayed characters (in some choice of font) without
loss of information.  For Western European languages, the answer is
pretty much "yes, it does".  For Asian languages, the answer is, at
best, more complicated, and we can clearly construct examples of where
it does not.


John, I'm trying not to get on your nerves, but can you give us an
example of an Asian character where the rendering in some font would
lead to a "loss of information"?

Here are some examples that have been brought to my attention, all
from the 2nd DIS of 10646: 4e0e, 5094, and 8aa7.

Re: "loss of information": If the receiver is Japanese and he sets his
Han font to a Japanese font, then he will see the character displayed
the way he is used to.  So there is no "loss of information".

On the other hand, if the sender is Japanese and he wants to show how
he writes his name to his Chinese colleague, and if his name just
happened to contain one of the few characters where there is a
noticeable difference in the typical CJK renderings, then there would
be "loss of information" if he sent the 10646 characters without any
language info and without any font info.  But that's *his* fault, for
not providing that info.

Whether he omitted that info deliberately or by accident is, of
course, another issue.  That's the "education" issue that I mentioned
earlier.  Users of 10646 need to be educated about the glyph
differences, in much the same way that ASCII users are educated about
the differences between the various glyphs for "a".

I will add that many Japanese are already acutely aware of glyph
differences, particularly when it comes to personal names.  Why, just
the other day, a Japanese guy mentioned on fj.kanji that the name of
the essayist Hyakken Uchida could not be written in JIS since JIS
unified two of the glyphs for "ken".  This was a particularly
delightful example, since I pointed out in my reply that Unicode does
*not* unify these two, so that essayist's name can be rendered
properly.  So, in a sense, Unicode is "better" than JIS.

Unicode did not unify those two because they were not unified in some
other East Asian standard.  Since Unicode "incorporates" many of those
standards, Unicode is effectively less unified than all of the CJKT
standards (in some sense).

This is still not a "font" issue, it is an issue of
whether a particular code point is displayed in an acceptable (any
acceptable) font for Japanese or whether it is displayed in an
acceptable font for Chinese or Korean, given that those font choices are
largely disjoint.


Choosing to use the word "acceptable" is exactly right.  Some Japanese
seem simply not to *accept* CJK unification.  That's their
prerogative, of course, but I will remind them that PCs and other
personal computing devices are getting more powerful and cheaper, and
the majority of good operating system software originates in the US.
*Some* Japanese may end up using Unicode without really knowing it!
(Yes, the "education" issue could be a problem, but I doubt that it
would matter for "most" people.)

*Other* Japanese would refuse to use Unicode products or the Unicode
features of products.  It'll be interesting to see just how successful
(if at all) Unicode is in Japan.

(Note that I'm *not* saying that Japanese cannot write software.  Many
of them write very good software.  Some Americans use such software
everyday.  But it is fair to say that most of the OSs actually used in
Japan today originate in the US.)


Erik