Re: ICU's uconv vs Linux iconv and UTF-8

On 2002.02.01, at 23:57, Mark Leisher wrote:

    Dan> FYI I have reported this brain-dead mapping problem to Unicode
    Dan> Consortium but never got an answer.  Well, they are not public
Dan> society in a way they charge for the membership to sayanything. One
    Dan> of the reasons so many Japanese love to hate Unicode...
This kind of false information is why many Japanese continue to love tohateUnicode. If you were actually on the Unicode mailing list, youwouldn't be
repeating garbage like this.
Sign up and send a message about the mapping tables. You will get ananswer.

I have signed up to unicode(_at_)unicode(_dot_)org a long ago and I thought I didsince I am still getting invitation to conferences and such. But Ichecked lister(_at_)unicode(_dot_)org and it did subscribe my address again insteadof getting an error message saying I have already subscribed. Hmm....Anyway, I have resubscribed so here I go....Okay. Here is. let me begin with the original message. Sorry forrepetition, folks in perl-unicode(_at_)perl(_dot_)org(_dot_)

On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
and SuSE6.4 linux iconv differ as to the UTF-8 representation iftable.euc
Both converters will round-trip with themselves and give byte exact
copy of table.euc

Weirdly they differ in how they map '\' and '~' in ASCII space as
well as some spots in higher characters.
Oh, yes. This is the problem of the original Unicode 2.x map; It isnot ASCII preservative. I have posted this problem to perl-unicode(_at_)perl(_dot_)org when I first released Jcode. Several discussionslater, I made Jcode so that it preserves ASCII by default and added$Jcode::Unicode::PEDANTIC to change the behavior
  Here is the exerpt from Jcode::Unicode

VARIABLES
       $Jcode::Unicode::PEDANTIC
           When set to non-zero, x-to-unicode conversion becomes
           pedantic.  That is, '\' (chr(0x5c)) is converted to
           zenkaku backslash and '~" (chr(0x7e)) to JIS-x0212
           tilde.

           By Default, Jcode::Unicode leaves ascii ([0x00-0x7f])
           as it is.
Linux iconv will not take ICU's UTF-8.
ICU's uconv will read the iconv output but does produce same asoriginal
table.euc.
So far as I see Linux iconv is ascii-preservative while ICS's isUnicode-strict.
  From Perl's point of view ASCII preservative should be default.
FYI I have reported this brain-dead mapping problem to UnicodeConsortium but never got an answer. Well, they are not public societyin a way they charge for the membership to say anything. One of thereasons so many Japanese love to hate Unicode...
Our current euc-jp.ucm is compatible with Linux iconv.
  Right choice.

Dan the Man with So Many Charsets to Deal With

Now let me repeat the same question I have asked a long ago. Why isthe Unicode - JISX2xxx map remains so that it does not preserve ASCIIpart? Despite the fact most converters ignores the original map andleaves ASCII part as is?One more question. Where has the contents inftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ gone?


_____  Dan Kogai
  __/ ____   CEO, DAN co. ltd.
 /__ /-+-/  2-8-14-418 Shiomi Koto-ku Tokyo 135-0052 Japan
   /--/--- mailto: dankogai(_at_)dan(_dot_)co(_dot_)jp / http://www.dan.co.jp/ 
---------
__/  /    Tel:+81 3-5665-6131   Fax:+81 3-5665-6132
         GPG Key: http://www.dan.co.jp/~dankogai/dankogai.gpg.asc