perl-unicode

Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-26 18:30:05
SADAHIRO-san and cp9?? experts,

On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
+<U20AC> \x80     |0 # EURO SIGN

Is this right? Yes, U20AC is indeed missing from cp936.ucm but see this;

grep U20AC ucm/cp*.ucm
/Users/dankogai/work/Encode/ucm/cp1250.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1251.ucm:<U20AC> \x88 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1252.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1253.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1254.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1255.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1256.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1257.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1258.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp874.ucm:<U20AC> \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp949.ucm:<U20AC> \xA2\xE6 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp950.ucm:<U20AC> \xA3\xE1 |0 # EURO SIGN

\x80 SEEMS right for single-byte CPs but they are mapped differently in CP949 and CP950.
So far as I check the Microsoft's pages

http://www.microsoft.com/typography/unicode/cscp.htm ->
http://www.microsoft.com/globaldev/reference/wincp.mspx ->
http://www.microsoft.com/globaldev/reference/dbcs/936.htm

it indeed does use \x80 (though only \x00-\xFF are covered; Where the heck is the FULL MAP!?). But it seem this only applies to 936. 932 (Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 (Traditional Chinese; Big5-based) all leave \x80 blank.

I would like more confirmation from experts; cp936.ucm has been overhauled with a help of MORIYAMA san and back then and at that time FULL map was available from the URIs above. And I think \x80 was not used for EURO SIGN back then.

Oh, I still have a copy of full mapping that was one available via URI above. Let's see...

cp936.txt says...
CODEPAGE 936            ; PRC GBK (XGB) - ANSI, OEM

CPINFO 2 0x3f 0x003f    ; DBCS CP, Default Char = Question Mark

MBTABLE 130

0x00    0x0000  ;Null
[snip]
0x20    0x0020  ;Space
[snip]
0x7f    0x007f  ;^?
0x80    0x0080  ;<80>
0xff    0xf8f5  ;<FF>

\x80 is mentioned but not mapped to EURO SIGN.

Please somebody tell me where to find the FULL map.

Dan the Encode Maintainer with Too Many (Dead) Links to Follow