perl-unicode

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 11:50:39
I'll answer this one.

On 2002.02.02, at 03:28, Yves Arrouye wrote:
That is understandable if they use different tables. The question is which
one is the "right" EUC-JP, and which one do users want? ICU, as well as
iconv, could have two tables with the different mappings. The question then is how to label them, and whether the labeling should be compatible between
the two.

I don't know which one is 'right'. But most practical and widely-used (euc-jp) is as follows;

\x00     - \x7f         Maps to US-ASCII
\xa1a1   - \xfefe               Maps to JISX-0208 (aka Zenkaku)
\x8ea1   - \x8edf               Maps to JISX-0201 (aka Hankaku)

  In addition, extended form of euc-jp also includes;

\x8fa1a1 - \x8ffefe     Maps to JISX-0212

That's what iconv, Tcl's *.enc, and my humble Jcode think what euc-jp is.

I find the same statement confusing. Are you saying that uconv's UTF-8 is ill-formed? Nick, Would you mind email me (and just me, not the list) your
table.euc sample file?

Go get Jcode.pm via http://search.cpan.org/search?dist=Jcode and check under t/ directory. You can find table.euc and x0212.euc.

Dan