SADAHIRO-san and cp9?? experts,
On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
+<U20AC> \x80 |0 # EURO SIGN
Is this right? Yes, U20AC is indeed missing from cp936.ucm but see
this;
grep U20AC ucm/cp*.ucm
/Users/dankogai/work/Encode/ucm/cp1250.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1251.ucm:<U20AC> \x88 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1252.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1253.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1254.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1255.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1256.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1257.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1258.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp874.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp949.ucm:<U20AC> \xA2\xE6 |0 # EURO
SIGN
/Users/dankogai/work/Encode/ucm/cp950.ucm:<U20AC> \xA3\xE1 |0 # EURO
SIGN
\x80 SEEMS right for single-byte CPs but they are mapped differently in
CP949 and CP950.
So far as I check the Microsoft's pages
http://www.microsoft.com/typography/unicode/cscp.htm ->
http://www.microsoft.com/globaldev/reference/wincp.mspx ->
http://www.microsoft.com/globaldev/reference/dbcs/936.htm
it indeed does use \x80 (though only \x00-\xFF are covered; Where the
heck is the FULL MAP!?). But it seem this only applies to 936. 932
(Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950
(Traditional Chinese; Big5-based) all leave \x80 blank.
I would like more confirmation from experts; cp936.ucm has been
overhauled with a help of MORIYAMA san and back then and at that time
FULL map was available from the URIs above. And I think \x80 was not
used for EURO SIGN back then.
Oh, I still have a copy of full mapping that was one available via URI
above. Let's see...
cp936.txt says...
CODEPAGE 936 ; PRC GBK (XGB) - ANSI, OEM
CPINFO 2 0x3f 0x003f ; DBCS CP, Default Char = Question Mark
MBTABLE 130
0x00 0x0000 ;Null
[snip]
0x20 0x0020 ;Space
[snip]
0x7f 0x007f ;^?
0x80 0x0080 ;<80>
0xff 0xf8f5 ;<FF>
\x80 is mentioned but not mapped to EURO SIGN.
Please somebody tell me where to find the FULL map.
Dan the Encode Maintainer with Too Many (Dead) Links to Follow