perl-unicode

RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 11:33:52
As part of the mystery of CJK encodings I notice that IBM's ICU's 
uconv and SuSE6.4 linux iconv differ as to the UTF-8 representation 
if table.euc

Both converters will round-trip with themselves and give byte exact 
copy of table.euc

Weirdly they differ in how they map '\' and '~' in ASCII space as 
well as some spots in higher characters.

That is understandable if they use different tables. The question is which
one is the "right" EUC-JP, and which one do users want? ICU, as well as
iconv, could have two tables with the different mappings. The question then
is how to label them, and whether the labeling should be compatible between
the two.

Linux iconv will not take ICU's UTF-8.
ICU's uconv will read the iconv output but does produce same as
original
table.euc.

I find the same statement confusing. Are you saying that uconv's UTF-8 is
ill-formed? Nick, Would you mind email me (and just me, not the list) your
table.euc sample file?

Thanks,
YA