Re: Unicode. Perl does the right thing?




On Fri, 25 Oct 2002, Autrijus Tang wrote:

On Fri, Oct 25, 2002 at 02:53:43PM +0900, Dan Kogai wrote:

use charanames ":zh";
print "\N{sheng1}";


17 characters from the Big5 range has the 'sheng1' pronounciation;
no doubt many more in the Unihan range.

use charanames ":zh";
print "\N{saeng}";


  Needless to say, there are many CJK characters with the Korean
pronunciation 'saeng', let alone  a Korean Hangul syllable with that
pronunciation. Besides, there are some characters with multiple readings.
So, this doesn't work for Korean, either.

This "internal code of Han characters" has been discussed in depth
here by Mr Zhu Bang-Fu and friends; the consensus is that there's
no way to uniquely identify one character from another depending
only on a single 'natural' index (Cang-Jie, pinyin, etc) -- you
will end up with fixed ordering ("\N{sheng1-0001}") instead, which
is not more legible than "\x{751f}".


  In a sense, it's even worse than "\x{751f}" unless there's a
machine-readable mapping table (as well as  printed human readable)
from sheng1-NNNN's to Unicode code points. Otherwise, one  would
have  to refer  to the Unicode code chart anyway.

  How about radical-stroke-pronunciation index? Even with this
triple index system, there may be degeneracies to lift....

  Another possibility is 'meaning-pronunciation' index. I believe
this is one of a few ways to refer to CJK characters (say, over the phone)
in all CJK countries. However, to do this, we need much more raw data
(more or less like a small dictionary) than UniHan DB provides because
it lists meanings of characters in English only.


  Jungshik