On 2002.02.01, at 19:58, Dan Kogai wrote:
On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
Weirdly they differ in how they map '\' and '~' in ASCII space as
well as some spots in higher characters.
Oh, yes. This is the problem of the original Unicode 2.x map; It is
not ASCII preservative.
[snip]
So far as I see Linux iconv is ascii-preservative while ICS's is
Unicode-strict.
From Perl's point of view ASCII preservative should be default.
With a good reason. The original mapping of Unicode renders any
(EUC|JIS|SHIFTJIS)-written perl scripts (or C codes) unusable. In
Japan '\' has been mapped to Yen mark (Because it happened to be at
localizable area in ASCII. I believe localizable area in ASCII is
causing a lot of headache for such folks as Danish which exploits this
feature to fullest extent). So source codes in Japan comes with lots of
yen marks instead of backslash.
Most Japanese (at least coders) unconsciously think "they may look
different so far as they mean the same" so they kept using "yen mark".
Now there comes nice people from Unicode who said "Make whatever looks
the same to the same code point". To the most of Japanese this was
definitely "Chiisana Shinsetsu, Ookina Osewa" (So kind of you but none
of your biz).
Well, Unicode consortium did make compromise. They made the code
mapping a superset of JISX2xx so you can map EUC/JIS/SHIFTJIS to Unicode
and map it back and still get the same result (even much debated
'hankaku kana' (Halfwidth kana) get distinct code points). I don't know
why they didn't leave ASCII alone....
Dan