perl-unicode

jisx0212 support in Encode::JP is close

2002-03-19 02:14:36
Perl Encode Hackers,

I have been annoyed by the fact that Encode::JP is yet to support JISX0212-1990. Though this charset is hardly used, this is official part of euc-jp today as well as iso-2022-jp. Without it, euc-jp support is hardly complete. Desperately I reviewed Nick's compile implementation once again and found that there is no reason compile cannot handle 3-byte code. "Oh man!" I shouted, because JIXX0212 in euc-jp is represented as 3-byte, 0x8F + (jisx0212 & 0x8080).
  I have created a ucm file called euc-jp+0212.ucm that looks like this;

> diff -u Encode/euc-jp.ucm Encode/euc-jp+0212.ucm | less
--- Encode/euc-jp.ucm   Tue Mar 12 04:56:36 2002
+++ Encode/euc-jp+0212.ucm      Tue Mar 19 17:51:32 2002
@@ -1,7 +1,7 @@
 # compile -o Encode/euc-jp.ucm Encode/euc-jp.enc
 <code_set_name> "euc-jp"
 <mb_cur_min> 1
-<mb_cur_max> 2
+<mb_cur_max> 3
 <subchar> \x3F
 #
 CHARMAP
@@ -210,7 +210,6 @@
 <UFF9D> \x8E\xDD |0 # HALFWIDTH KATAKANA LETTER N
 <UFF9E> \x8E\xDE |0 # HALFWIDTH KATAKANA VOICED SOUND MARK
 <UFF9F> \x8E\xDF |0 # HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
-<U008F> \x8F |0 # <control>
 <U0090> \x90 |0 # <control>
 <U0091> \x91 |0 # <control>
 <U0092> \x92 |0 # <control>
@@ -7106,4 +7105,6149 @@
 <U7464> \xF4\xA4 |0 # CJK Ideograph
 <U51DC> \xF4\xA5 |0 # CJK Ideograph
 <U7199> \xF4\xA6 |0 # CJK Ideograph
+<U00A1> \x8F\xA2\xC2 |0 # CJK Ideograph
....

  That is,

* <mb_cur_max> is now 3, instead of 2
* \x8F is no longer control character, but the first byte of 3-byte represented jisx0212.
* The rest of table I have grabbed out of Jcode (Jcode/Unicode/table.h)

  and modified JP/Makefile.PL so it uses new table.  Voila!  It worked!
Since Encode/JP/JIS.pm and Encode/JP/ISO_2022_JP is already coded to handle jisx0212 (if euc-jp supports that), it automagically adds jisx0212 support to other encodings as well I need to fix pod and t/JP.t so it tests 0212 part but I will upload new Encode package within 24 hours.
  Thank you Nick for making compile this smart!

Dan the Man with a New Encoding

<Prev in Thread] Current Thread [Next in Thread>