On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote:
Okay. I've checked
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/
One more time and it seems that other missing encodings are available
as well, such as korean. I'll look into that.
I think I have found the reason why some of the encodings were missing
from Tcl's *.enc, which later turned into *.ucm.
Apple makes use of Unicode compound characters too extensively, which
doesn't go well with .ucm, not to mention *.enc
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
# Apple additions - vertical forms
0xEB41 0x3001+0xF87E # vertical form for IDEOGRAPHIC COMMA
^^^^^^Mac Japanese, then Unicode Character
Encode/macJapan.ucm
<UF8B5> \xEB\x41 |0 # Private Use
So they are already conflicting. While MacJapanese doesn't have many,
MacKorean does have lots of them. No wonder it is not listed on Tcl.
I wonder which one I should trust but I have reasons to believe Apple
is still considering the map @ unicode.org canonical. Take HFS+, for
example. The word 'Hangul' consists of two syllables, two characters in
KSC5601 (han-gul). But on HFS+, it is broken up to h-a-n-g-u-l.
Though it is possible to mangle enc2xs to make such mappings (it can
handle, in theory, any nbyte-nbyte conversion), the UCM format does not
seem to be designed that way.
Hmm.... Let me think about it for a while... Well, it's only vendor
mapping and Encode support has already matched that of major browsers.
So it is already practical enough and I believe the level of support is
good enough for 5.8.0. Maybe those vendor mapping that are missing be
diverted to Encode::Vendors::(Apple|MS) or something....
Dan the Encode Maintainer