perl-unicode

Jcode->new(q(~Greetings!~), 'utf8')->sjis eq '~Greetings~' ?

1999-07-17 12:32:27
Hello folks,

  Hi.  My name is Dan Kogai.  I have just uploaded Jcode perl module to 
CPAN recently.  This module is designed as a successor of jcode.pl (If you 
are a Perl coder in Japan, you gotta know that script).  You can find more 
about it at http://openlab.ring.gr.jp/Jcode/ .  One of the major 
enhancements of Jcode.pm from jcode.pl is the ability to handle Unicode (
UCS2 and UTF8, so far).
  Now here is the question.  my 1st implementation of UCS2 <-> EUC-JP 
conversion was very simple; just faithfully obey the rule that Unicode Inc. 
casts.  It seemed okay until one of my friends gave me the following 
complaint.

    Jcode->new('~k16', 'utf8')->sjis doesn't return '~k16' !!

  And here is why.

* Jcode->new stores the string in EUC-JP (That's the only code perl can 
swallow in the script.  jperl can swallow SHIFT_JIS as well but that's 
another issue)  Conversion is taken if necessary.
* Here we have explictly stated that '~k16' is a UTF8 string so Jcode->new 
tries to convert it.
* UTF8 is first converted to UCS2 then EUC.
* Since UTF8 leaves ASCII as it is, '~k16' is just '~k16'.  In UCS2, that's 
"\x00~\x00k\x001\x006".  So far so good.
* AND HERE IS THE PROBLEM.  The conversion table

   ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0201.TXT

states that 2 charcodes in ASCII area [\x00-\xff] are mapped to oblivion.  
They are '\' (chr(0x5c)) and  '~' (chr(0x7e).  In this case. '~' is mapped 
to "\x8f\xa2\xb7" in EUC-JP.

* However.  The "\x8f\xa2\xb7" belongs to JIS0212, which is UNSUPPORTED in 
SHIFT_JIS, even though SHIFT_JIS (the most widely-used Japanese Charset so 
far) is SUPPOSED TO BE compatible with ASCII.

* So you end up with "[unknown]k16", instead of "~k16".

  So I tweak the code a little bit.  After ver. 0.40,  Jcode.pm leaves 
ASCII as it is unless $Jcode::Unicode::PEDANTIC is set to non-zero.
  My question is, Will the Unicode-savvy perl behave pedantically or not.  
If so, All the tildes, used so often in Perl, will be nothing but line 
noise on most platforms used in Japan.

  To whom I talk to, I don't know except this ML...

Dan the Camel Abuser

________ DAN Kogai (CEO, DAN co. ltd.)
      _/ __  Tel:+81 3-5433-7565          Fax:+81 3-5433-7566
     /_ /+/  6-35-5 Shimouma Setagaya Tokyo 154-0002 Japan
     _/-/---- http://www.dan.co.jp/ -----------------------------------

<Prev in Thread] Current Thread [Next in Thread>