perl-unicode

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 11:58:06
Marco,

  Thank you for elaborating my points.

On 2002.02.02, at 01:40, Marco Cimarosti wrote:
<< The entire former contents of this directory are obsolete and have been
moved to the OBSOLETE directory.  The latest information may be found
in the Unihan.txt file in the latest Unicode Character Database.
August 1, 2001. >>

And don't bother to download the 23 Mb
<http://www.unicode.org/Public/UNIDATA/Unihan.txt> file, because it contains
only mappings for kanji's.

Yes. That's the point #0. Unihan.txt is no replacement for MAPPINGS. Maybe I can come up with a script which generates a table out of it but this kind of attitude is far from nice.
  And Unihan.txt also lacks 8bit mappings like JISX-0201.

So, go directly to
<http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/>, where you can
find the old data, along with a note about mapping errors:

  But this time, they are right about being OBSOLETE.

Below is some analysis by Asmus Freytag of specific problems raised by T.
Kubota in this document:
        http://www.debian.or.jp/~kubota/unicode-symbols.html

  English version also available as

        http://www.debian.or.jp/~kubota/unicode-symbols.html.en

  And let me quote the part which is significant.

ASCII and JIS X 0201 Roman

When converting EUC-JP and Shift_JIS, handling of 0x5c and 0x7e can be a problem. Since both encodings have long history and Japanese people have lot of experience how to handle them, I now introduce it.

Solution is very simple. Just regard YEN SIGN and REVERSE SOLIDUS as a different glyphs of the same character. Then, distinction between ASCII and JIS X 0201 Roman can be neglected.

  Has anyone of Unicode Consortium seen this one?

Dan