Hello folks,
Hi. My name is Dan Kogai. I have just uploaded Jcode perl module to
CPAN recently. This module is designed as a successor of jcode.pl (If you
are a Perl coder in Japan, you gotta know that script). You can find more
about it at http://openlab.ring.gr.jp/Jcode/ . One of the major
enhancements of Jcode.pm from jcode.pl is the ability to handle Unicode (
UCS2 and UTF8, so far).
Now here is the question. my 1st implementation of UCS2 <-> EUC-JP
conversion was very simple; just faithfully obey the rule that Unicode Inc.
casts. It seemed okay until one of my friends gave me the following
complaint.
Jcode->new('~k16', 'utf8')->sjis doesn't return '~k16' !!
And here is why.
* Jcode->new stores the string in EUC-JP (That's the only code perl can
swallow in the script. jperl can swallow SHIFT_JIS as well but that's
another issue) Conversion is taken if necessary.
* Here we have explictly stated that '~k16' is a UTF8 string so Jcode->new
tries to convert it.
* UTF8 is first converted to UCS2 then EUC.
* Since UTF8 leaves ASCII as it is, '~k16' is just '~k16'. In UCS2, that's
"\x00~\x00k\x001\x006". So far so good.
* AND HERE IS THE PROBLEM. The conversion table
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0201.TXT
states that 2 charcodes in ASCII area [\x00-\xff] are mapped to oblivion.
They are '\' (chr(0x5c)) and '~' (chr(0x7e). In this case. '~' is mapped
to "\x8f\xa2\xb7" in EUC-JP.
* However. The "\x8f\xa2\xb7" belongs to JIS0212, which is UNSUPPORTED in
SHIFT_JIS, even though SHIFT_JIS (the most widely-used Japanese Charset so
far) is SUPPOSED TO BE compatible with ASCII.
* So you end up with "[unknown]k16", instead of "~k16".
So I tweak the code a little bit. After ver. 0.40, Jcode.pm leaves
ASCII as it is unless $Jcode::Unicode::PEDANTIC is set to non-zero.
My question is, Will the Unicode-savvy perl behave pedantically or not.
If so, All the tildes, used so often in Perl, will be nothing but line
noise on most platforms used in Japan.
To whom I talk to, I don't know except this ML...
Dan the Camel Abuser
________ DAN Kogai (CEO, DAN co. ltd.)
_/ __ Tel:+81 3-5433-7565 Fax:+81 3-5433-7566
/_ /+/ 6-35-5 Shimouma Setagaya Tokyo 154-0002 Japan
_/-/---- http://www.dan.co.jp/ -----------------------------------