More on Unicode Mappings

On 2002.02.01, at 19:58, Dan Kogai wrote:

On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
Weirdly they differ in how they map '\' and '~' in ASCII space as
well as some spots in higher characters.
Oh, yes. This is the problem of the original Unicode 2.x map; It isnot ASCII preservative.
 [snip]
So far as I see Linux iconv is ascii-preservative while ICS's isUnicode-strict.
  From Perl's point of view ASCII preservative should be default.

With a good reason. The original mapping of Unicode renders any(EUC|JIS|SHIFTJIS)-written perl scripts (or C codes) unusable. InJapan '\' has been mapped to Yen mark (Because it happened to be atlocalizable area in ASCII. I believe localizable area in ASCII iscausing a lot of headache for such folks as Danish which exploits thisfeature to fullest extent). So source codes in Japan comes with lots ofyen marks instead of backslash.Most Japanese (at least coders) unconsciously think "they may lookdifferent so far as they mean the same" so they kept using "yen mark".Now there comes nice people from Unicode who said "Make whatever looksthe same to the same code point". To the most of Japanese this wasdefinitely "Chiisana Shinsetsu, Ookina Osewa" (So kind of you but noneof your biz).Well, Unicode consortium did make compromise. They made the codemapping a superset of JISX2xx so you can map EUC/JIS/SHIFTJIS to Unicodeand map it back and still get the same result (even much debated'hankaku kana' (Halfwidth kana) get distinct code points). I don't knowwhy they didn't leave ASCII alone....

Dan