On 2002.02.01, at 23:57, Mark Leisher wrote:
Dan> FYI I have reported this brain-dead mapping problem to Unicode
Dan> Consortium but never got an answer. Well, they are not public
Dan> society in a way they charge for the membership to say
anything. One
Dan> of the reasons so many Japanese love to hate Unicode...
This kind of false information is why many Japanese continue to love to
hate
Unicode. If you were actually on the Unicode mailing list, you
wouldn't be
repeating garbage like this.
Sign up and send a message about the mapping tables. You will get an
answer.
I have signed up to unicode(_at_)unicode(_dot_)org a long ago and I thought I did
since I am still getting invitation to conferences and such. But I
checked lister(_at_)unicode(_dot_)org and it did subscribe my address again instead
of getting an error message saying I have already subscribed. Hmm....
Anyway, I have resubscribed so here I go....
Okay. Here is. let me begin with the original message. Sorry for
repetition, folks in perl-unicode(_at_)perl(_dot_)org(_dot_)
On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
and SuSE6.4 linux iconv differ as to the UTF-8 representation if
table.euc
Both converters will round-trip with themselves and give byte exact
copy of table.euc
Weirdly they differ in how they map '\' and '~' in ASCII space as
well as some spots in higher characters.
Oh, yes. This is the problem of the original Unicode 2.x map; It is
not ASCII preservative. I have posted this problem to perl-
unicode(_at_)perl(_dot_)org when I first released Jcode. Several discussions
later, I made Jcode so that it preserves ASCII by default and added
$Jcode::Unicode::PEDANTIC to change the behavior
Here is the exerpt from Jcode::Unicode
VARIABLES
$Jcode::Unicode::PEDANTIC
When set to non-zero, x-to-unicode conversion becomes
pedantic. That is, '\' (chr(0x5c)) is converted to
zenkaku backslash and '~" (chr(0x7e)) to JIS-x0212
tilde.
By Default, Jcode::Unicode leaves ascii ([0x00-0x7f])
as it is.
Linux iconv will not take ICU's UTF-8.
ICU's uconv will read the iconv output but does produce same as
original
table.euc.
So far as I see Linux iconv is ascii-preservative while ICS's is
Unicode-strict.
From Perl's point of view ASCII preservative should be default.
FYI I have reported this brain-dead mapping problem to Unicode
Consortium but never got an answer. Well, they are not public society
in a way they charge for the membership to say anything. One of the
reasons so many Japanese love to hate Unicode...
Our current euc-jp.ucm is compatible with Linux iconv.
Right choice.
Dan the Man with So Many Charsets to Deal With
Now let me repeat the same question I have asked a long ago. Why is
the Unicode - JISX2xxx map remains so that it does not preserve ASCII
part? Despite the fact most converters ignores the original map and
leaves ASCII part as is?
One more question. Where has the contents in
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ gone?
_____ Dan Kogai
__/ ____ CEO, DAN co. ltd.
/__ /-+-/ 2-8-14-418 Shiomi Koto-ku Tokyo 135-0052 Japan
/--/--- mailto: dankogai(_at_)dan(_dot_)co(_dot_)jp / http://www.dan.co.jp/
---------
__/ / Tel:+81 3-5665-6131 Fax:+81 3-5665-6132
GPG Key: http://www.dan.co.jp/~dankogai/dankogai.gpg.asc