perl-unicode

Re: Jcode->new(q(~Greetings!~), 'utf8')->sjis eq '~Greetings~' ?

1999-07-18 10:19:38
[copy mailed to author]

On Sun, 18 Jul 1999 04:17:36   Dan Kogai wrote:

* AND HERE IS THE PROBLEM.  The conversion table

 ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0201.TXT

states that 2 charcodes in ASCII area [\x00-\xff] are mapped to oblivion.

No, it doesn't say that at all. \x7E is properly defined as the 'overline' 
character and \x5C is properly defined as the 'yen' symbol. JIS-X-201 is not 
compatible with ASCII. The mapping table is correct. 

SHIFT_JIS, even though SHIFT_JIS (the most widely-used Japanese Charset so 
far) is SUPPOSED TO BE compatible with ASCII.


No, that is not correct either. Shift_JIS is not compatible with ASCII either 
since it too is based on JIS-X-201. Note that the Microsoft Windows Shift_JIS 
variant called 'code page 932' is ASCII compatible. See the table in 

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

I believe you meant that _code page 932_ is 'the most widely-used Japanese 
encoding so far'.

In any case, this incompatibility has caused problems for decades. That is why 
many mapping tables allow these two characters to pass to and from 
ASCII-compatible character encodings unchanged.

The Unicode mapping tables don't do this because they are based on the official 
standard. 

Your solution seems reasonable in that you provide an option to control whether 
the 'official' or the 'customary' mapping is used. You might consider providing 
separate mappings for code page 932 and generic Shift_JIS since they are not 
the same thing. There are more differences than just these two characters. An 
analysis of the tables provided on ftp.unicode.org will reveal the differences.

I would recommend the book 'CJKV Information Processing' by Ken Lunde. It is a 
great way to learn about Asian character encodings.

=Ed Batutis
independent i18n consultant



--== Sent via Deja.com http://www.deja.com/ ==--
Share what you know. Learn what you don't.

<Prev in Thread] Current Thread [Next in Thread>