[Encode] How to support (Apple's) compound Unicode characters?

On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote:

  Okay.  I've checked

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/
One more time and it seems that other missing encodings are availableas well, such as korean. I'll look into that.

I think I have found the reason why some of the encodings were missingfrom Tcl's *.enc, which later turned into *.ucm.Apple makes use of Unicode compound characters too extensively, whichdoesn't go well with .ucm, not to mention *.enc


http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT

# Apple additions - vertical forms
0xEB41  0x3001+0xF87E   # vertical form for IDEOGRAPHIC COMMA

  ^^^^^^Mac Japanese, then Unicode Character
Encode/macJapan.ucm

<UF8B5> \xEB\x41 |0 # Private Use

So they are already conflicting. While MacJapanese doesn't have many,MacKorean does have lots of them. No wonder it is not listed on Tcl.

I wonder which one I should trust but I have reasons to believe Appleis still considering the map @ unicode.org canonical. Take HFS+, forexample. The word 'Hangul' consists of two syllables, two characters inKSC5601 (han-gul). But on HFS+, it is broken up to h-a-n-g-u-l.Though it is possible to mangle enc2xs to make such mappings (it canhandle, in theory, any nbyte-nbyte conversion), the UCM format does notseem to be designed that way.Hmm.... Let me think about it for a while... Well, it's only vendormapping and Encode support has already matched that of major browsers.So it is already practical enough and I believe the level of support isgood enough for 5.8.0. Maybe those vendor mapping that are missing bediverted to Encode::Vendors::(Apple|MS) or something....


Dan the Encode Maintainer