perl-unicode

Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-27 07:30:03

On Thu, 27 Mar 2003 10:02:28 +0900
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> wrote:

SADAHIRO-san and cp9?? experts,

On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
+<U20AC> \x80     |0 # EURO SIGN

Is this right?  Yes, U20AC is indeed missing from cp936.ucm but see 
this;
(snip)

So far as I check the Microsoft's pages

http://www.microsoft.com/typography/unicode/cscp.htm ->
http://www.microsoft.com/globaldev/reference/wincp.mspx ->
http://www.microsoft.com/globaldev/reference/dbcs/936.htm

it indeed does use \x80 (though only \x00-\xFF are covered;  Where the 
heck is the FULL MAP!?).  But it seem this only applies to 936.  932 
(Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 
(Traditional Chinese; Big5-based) all leave \x80 blank.

I would like more confirmation from experts;  cp936.ucm has been 
overhauled with a help of MORIYAMA san and back then and at that time 
FULL map was available from the URIs above.  And I think \x80 was not 
used for EURO SIGN back then.

I'm not any expert, but at least, I can tell you
that you can get the official full maps
by clicking a gray box (like [81], [81], ..., [FE]) 
in http://www.microsoft.com/globaldev/reference/dbcs/936.htm

or http://www.microsoft.com/globaldev/reference/dbcs/936/936_81.htm
   http://www.microsoft.com/globaldev/reference/dbcs/936/936_82.htm
etc.

This table does not include any UDC mappings
as well as the table provided on unicode.org.
I don't know why Microsoft has ceased to provides UDC mapping.

http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

Oh, I still have a copy of full mapping that was one available via URI 
above.  Let's see...

cp936.txt says...
CODEPAGE 936            ; PRC GBK (XGB) - ANSI, OEM

CPINFO 2 0x3f 0x003f    ; DBCS CP, Default Char = Question Mark

MBTABLE 130

0x00    0x0000  ;Null
[snip]
0x20    0x0020  ;Space
[snip]
0x7f    0x007f  ;^?
0x80    0x0080  ;<80>
0xff    0xf8f5  ;<FF>

\x80 is mentioned but not mapped to EURO SIGN.

Please somebody tell me where to find the FULL map.

Dan the Encode Maintainer with Too Many (Dead) Links to Follow


IBM's ICU provides another table, which includes UDC mappings
and Unicode-to-CodePage fallbacks (i.e. denoted by |1).

http://oss.software.ibm.com/cvs/icu/charset/data/ucm/windows-936-2000.ucm

EURO SIGN is assigned between Unicode version 2.0 and 2.1.
cf. Unicode 2.1, UTR #8, http://www.unicode.org/reports/tr8/

Your table should be an older one than Unicode 2.0.

SADAHIRO Tomoyuki