perl-unicode

Re: Encode::CJKguide

2002-03-26 17:34:58
On Wed, 27 Mar 2002, Markus Kuhn wrote:

Dan Kogai wrote on 2002-03-26 22:35 UTC:
Side note: I still think, Encode should have used the encoding tables
that are already provided by the operating system where available. For
example on Linux, the iconv() function with glibc 2.2 or newer does
already provide access to all the necessary tables. I observe at the
moment, that almost a dozen different programming language communities
reinvent the recoding wheel simultaneously and independently, even
though portable C libraries such as libiconv are already available for
exactly the same purpose.

  I certainly feel the same way as you do. I thought
a portable implementation of iconv() in libiconv would prevent
the prolification of (potentially incompatible) encoding converters.
I wsa wrong.  I found myself
having to check and contribute to/correct, if necessary, all the 
incarnation of encoding converters (involving Korean
and sometimes other CJK) in Perl, Java, ICU, PHP, Mozilla, X11,
libiconv/glibc and so forth. It would be much better if
libiconv/glibc were used everywhere.  Encode doesn't support
a lot of encodings all of which are available in iconv() (glibc's
and libconv's). 

please clarify that this text represents Dan Kogai's personal and
possibly uninformed opinion on character encodings and their history,
and not some consens of everyone involved in the Perl 5.8 release.
I think this text is still in very early alpha testing ...

  As I wrote already, this disclaimer absolutely needs to be put in. 

Many of which have a rather Japan-specific and sometimes semi-informed
view of Unicode and often do not at all represent Chinese or Korean
views on issues such as Han unification. Please remember: CJK != Japan
and there are also many good or better Korean and Chinese web pages on
these issues.

   Koreans are for Unicode almost unanimously.  Han Unification
has never been as large an issue in Korea as in Japan. 

You should definitely also add a pointer to the Unihan database, which
is the most comprehensive existing source of cross-reference and
encoding conversion data between the different Han encodings:

http://www.unicode.org/Public/UNIDATA/Unihan.txt

  I also like to add that ISO 10646:2000-1 and ISO 10646:2001-2
need to be consulted before making any premature judgement on
Han Unification. As you or someone else mentioned in another
forum, TUS 3.0 gave some misconception about Han Unification
by listing a single glyph for each Han Ideograph. On the other hand,
ISO 10646:2000-1 and ISO 10646:2001-2 list five glyphs (SC,TC,
K,J,and V) and browsing thru the table, one realize how little
difference there is among them (sure,there are differences, but
I don't think those differences warrants so much fuss about Han
Unification.). More often than not, I thought IRG didn't go
far enough in Han Unification because some characters appear
to need to be unified in my eyes. (perhaps, the source separation
rule kept them distinct.)

   Jungshik Shin