perl-unicode

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 09:47:12
ICU's pedantic form

The goal for ICU is to be charset neutral, and support all of the
conversions that are in modern use. There are a large number of
variants of character sets; you can use the one you want. See:

http://oss.software.ibm.com/icu/charset/index.html

Mark

----- Original Message -----
From: "Dan Kogai" <dankogai(_at_)dan(_dot_)co(_dot_)jp>
To: "Nick Ing-Simmons" <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com>
Cc: "Nick Ing-Simmons" <nick(_at_)ing-simmons(_dot_)net>; "SADAHIRO Tomoyuki"
<bqw10602(_at_)nifty(_dot_)com>; <perl-unicode(_at_)perl(_dot_)org>; 
<unicode(_at_)unicode(_dot_)org>
Sent: Friday, February 01, 2002 07:46
Subject: Re: ICU's uconv vs Linux iconv and UTF-8


On 2002.02.02, at 00:37, Nick Ing-Simmons wrote:
  Oh, yes.  This is the problem of the original Unicode 2.x map;
It is
not ASCII preservative.  I have posted this problem to perl-
unicode(_at_)perl(_dot_)org when I first released Jcode.  Several
discussions
later, I made Jcode so that it preserves ASCII by default and
added
$Jcode::Unicode::PEDANTIC to change the behavior

Ah. I take your point. If we used ICU's pedantic form
Both UNIX ~/foo and MS C:\Foo get mangled.

EXACTLY!

The other differences (having looked at diff in yudit) seems to be
mapping 「 (cent),」 (pound) ,ャ (not) and one of the longer
dashes to
different width variants (full width for ICU).

I am going off ICU ...

   As I addressed to unicode(_at_)unicode(_dot_)org,  Yet another problems 
that
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone so I
don't
have a practical way to check the mapping.  I want the mapping back!

Dan