perl-unicode

More on Unicode Mappings

2002-02-01 04:22:23
On 2002.02.01, at 19:58, Dan Kogai wrote:
On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
Weirdly they differ in how they map '\' and '~' in ASCII space as
well as some spots in higher characters.

Oh, yes. This is the problem of the original Unicode 2.x map; It is not ASCII preservative.
 [snip]
So far as I see Linux iconv is ascii-preservative while ICS's is Unicode-strict.
  From Perl's point of view ASCII preservative should be default.

With a good reason. The original mapping of Unicode renders any (EUC|JIS|SHIFTJIS)-written perl scripts (or C codes) unusable. In Japan '\' has been mapped to Yen mark (Because it happened to be at localizable area in ASCII. I believe localizable area in ASCII is causing a lot of headache for such folks as Danish which exploits this feature to fullest extent). So source codes in Japan comes with lots of yen marks instead of backslash. Most Japanese (at least coders) unconsciously think "they may look different so far as they mean the same" so they kept using "yen mark". Now there comes nice people from Unicode who said "Make whatever looks the same to the same code point". To the most of Japanese this was definitely "Chiisana Shinsetsu, Ookina Osewa" (So kind of you but none of your biz). Well, Unicode consortium did make compromise. They made the code mapping a superset of JISX2xx so you can map EUC/JIS/SHIFTJIS to Unicode and map it back and still get the same result (even much debated 'hankaku kana' (Halfwidth kana) get distinct code points). I don't know why they didn't leave ASCII alone....

Dan