On Thu, 27 Mar 2003 10:02:28 +0900
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> wrote:
SADAHIRO-san and cp9?? experts,
On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
+<U20AC> \x80 |0 # EURO SIGN
Is this right? Yes, U20AC is indeed missing from cp936.ucm but see
this;
(snip)
So far as I check the Microsoft's pages
http://www.microsoft.com/typography/unicode/cscp.htm ->
http://www.microsoft.com/globaldev/reference/wincp.mspx ->
http://www.microsoft.com/globaldev/reference/dbcs/936.htm
it indeed does use \x80 (though only \x00-\xFF are covered; Where the
heck is the FULL MAP!?). But it seem this only applies to 936. 932
(Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950
(Traditional Chinese; Big5-based) all leave \x80 blank.
I would like more confirmation from experts; cp936.ucm has been
overhauled with a help of MORIYAMA san and back then and at that time
FULL map was available from the URIs above. And I think \x80 was not
used for EURO SIGN back then.
I'm not any expert, but at least, I can tell you
that you can get the official full maps
by clicking a gray box (like [81], [81], ..., [FE])
in http://www.microsoft.com/globaldev/reference/dbcs/936.htm
or http://www.microsoft.com/globaldev/reference/dbcs/936/936_81.htm
http://www.microsoft.com/globaldev/reference/dbcs/936/936_82.htm
etc.
This table does not include any UDC mappings
as well as the table provided on unicode.org.
I don't know why Microsoft has ceased to provides UDC mapping.
http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
Oh, I still have a copy of full mapping that was one available via URI
above. Let's see...
cp936.txt says...
CODEPAGE 936 ; PRC GBK (XGB) - ANSI, OEM
CPINFO 2 0x3f 0x003f ; DBCS CP, Default Char = Question Mark
MBTABLE 130
0x00 0x0000 ;Null
[snip]
0x20 0x0020 ;Space
[snip]
0x7f 0x007f ;^?
0x80 0x0080 ;<80>
0xff 0xf8f5 ;<FF>
\x80 is mentioned but not mapped to EURO SIGN.
Please somebody tell me where to find the FULL map.
Dan the Encode Maintainer with Too Many (Dead) Links to Follow
IBM's ICU provides another table, which includes UDC mappings
and Unicode-to-CodePage fallbacks (i.e. denoted by |1).
http://oss.software.ibm.com/cvs/icu/charset/data/ucm/windows-936-2000.ucm
EURO SIGN is assigned between Unicode version 2.0 and 2.1.
cf. Unicode 2.1, UTR #8, http://www.unicode.org/reports/tr8/
Your table should be an older one than Unicode 2.0.
SADAHIRO Tomoyuki