[copy mailed to author]
On Sun, 18 Jul 1999 04:17:36 Dan Kogai wrote:
* AND HERE IS THE PROBLEM. The conversion table
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0201.TXT
states that 2 charcodes in ASCII area [\x00-\xff] are mapped to oblivion.
No, it doesn't say that at all. \x7E is properly defined as the 'overline'
character and \x5C is properly defined as the 'yen' symbol. JIS-X-201 is not
compatible with ASCII. The mapping table is correct.
SHIFT_JIS, even though SHIFT_JIS (the most widely-used Japanese Charset so
far) is SUPPOSED TO BE compatible with ASCII.
No, that is not correct either. Shift_JIS is not compatible with ASCII either
since it too is based on JIS-X-201. Note that the Microsoft Windows Shift_JIS
variant called 'code page 932' is ASCII compatible. See the table in
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
I believe you meant that _code page 932_ is 'the most widely-used Japanese
encoding so far'.
In any case, this incompatibility has caused problems for decades. That is why
many mapping tables allow these two characters to pass to and from
ASCII-compatible character encodings unchanged.
The Unicode mapping tables don't do this because they are based on the official
standard.
Your solution seems reasonable in that you provide an option to control whether
the 'official' or the 'customary' mapping is used. You might consider providing
separate mappings for code page 932 and generic Shift_JIS since they are not
the same thing. There are more differences than just these two characters. An
analysis of the tables provided on ftp.unicode.org will reveal the differences.
I would recommend the book 'CJKV Information Processing' by Ken Lunde. It is a
great way to learn about Asian character encodings.
=Ed Batutis
independent i18n consultant
--== Sent via Deja.com http://www.deja.com/ ==--
Share what you know. Learn what you don't.