perl-unicode

[Encode] How to support (Apple's) compound Unicode characters?

2002-03-30 01:02:47
On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote:
  Okay.  I've checked

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/

One more time and it seems that other missing encodings are available as well, such as korean. I'll look into that.

I think I have found the reason why some of the encodings were missing from Tcl's *.enc, which later turned into *.ucm. Apple makes use of Unicode compound characters too extensively, which doesn't go well with .ucm, not to mention *.enc

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
# Apple additions - vertical forms
0xEB41  0x3001+0xF87E   # vertical form for IDEOGRAPHIC COMMA
  ^^^^^^Mac Japanese, then Unicode Character
Encode/macJapan.ucm
<UF8B5> \xEB\x41 |0 # Private Use

So they are already conflicting. While MacJapanese doesn't have many, MacKorean does have lots of them. No wonder it is not listed on Tcl.

I wonder which one I should trust but I have reasons to believe Apple is still considering the map @ unicode.org canonical. Take HFS+, for example. The word 'Hangul' consists of two syllables, two characters in KSC5601 (han-gul). But on HFS+, it is broken up to h-a-n-g-u-l. Though it is possible to mangle enc2xs to make such mappings (it can handle, in theory, any nbyte-nbyte conversion), the UCM format does not seem to be designed that way. Hmm.... Let me think about it for a while... Well, it's only vendor mapping and Encode support has already matched that of major browsers. So it is already practical enough and I believe the level of support is good enough for 5.8.0. Maybe those vendor mapping that are missing be diverted to Encode::Vendors::(Apple|MS) or something....

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>