perl-unicode

RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 09:46:51
Dan Kogai wrote:
   As I addressed to unicode(_at_)unicode(_dot_)org,  Yet another problems 
that 
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone 
so I don't 
have a practical way to check the mapping.  I want the mapping back!

The Unicode site is a little bit labyrinthic, sometimes.

The web version of the data seems more up to date than the ftp site. But
don't bother to go on <http://www.unicode.org/Public/MAPPINGS/EASTASIA/>,
because it only contains a note which reads:

<< The entire former contents of this directory are obsolete and have been
moved to the OBSOLETE directory.  The latest information may be found
in the Unihan.txt file in the latest Unicode Character Database.
August 1, 2001. >>

And don't bother to download the 23 Mb
<http://www.unicode.org/Public/UNIDATA/Unihan.txt> file, because it contains
only mappings for kanji's.

So, go directly to
<http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/>, where you can
find the old data, along with a note about mapping errors:

<< [...]
Below is some analysis by Asmus Freytag of specific problems raised by T.
Kubota in this document:
        http://www.debian.or.jp/~kubota/unicode-symbols.html
[...]
The following are available as Full Width characters in the FFxx range.
Therefore, the mappings of these characters are incorrect. This appears to
be a *mapping file issue* as far as these characters are concerned
        FILE JIS0208.TXT------
        0x2140  U+005C  Na  # REVERSE SOLIDUS
        0x215D  U+2212  N  # MINUS SIGN
        0x2171  U+00A2  Na  # CENT SIGN
        0x2172  U+00A3  Na  # POUND SIGN
        0x224C  U+00AC  Na  # NOT SIGN
[...]
        FILE JIS0212.TXT------
        0x2243  U+00A6  Na  # BROKEN BAR
        0x2234  U+00AF  Na  # MACRON
        0x2237  U+007E  Na  # TILDE
[...] >>

I don't know if this helps solving your issues.

_ Marco