perl-unicode

Re: 5.8 roadmap and Encode

2002-03-02 05:14:16
On Sat, Mar 02, 2002 at 11:12:42AM +0000, Nick Ing-Simmons wrote:
This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use
that functionality for Encode.pm?
The .ucm format can cope:

Thanks! I'm done with conversion and tested against libiconv. Patch follows;
files are available at <http://autrijus.org/ucm.tar.gz>.

Libiconv's GB18030 table elicited some warnings from compile:

    Unicode character 0xfdXX is illegal at ../compile line 81, <E> line 39659.

The range is question is fdxx and ffxx. Is that anything to worry about?

Also, the resulting file size is quite hefty:

-rw-r--r--  1 root  512  1688107 Mar  2 19:51 euc-tw.ucm
-rw-r--r--  1 root  512  1543333 Mar  2 19:51 gb18030.ucm

And they add ~600k to the compressed perl distribution. Is that acceptable?

The good news is there won't be anything else that big coming from the Chinese
front; aside from HZ, perl's support could be considered complete.

Thanks,
/Autrijus/

diff -ur Encode/CN/Makefile.PL Encode.new/CN/Makefile.PL
--- Encode/CN/Makefile.PL       Sat Mar  2 11:45:11 2002
+++ Encode.new/CN/Makefile.PL   Sat Mar  2 19:52:53 2002
@@ -6,6 +6,7 @@
              GBK      => ['gbk.enc'],
              GB2312   => ['gb2312.enc'],
              GB12345  => ['gb12345.enc'],
+             GB18030  => ['gb18030.ucm'],
              CP936    => ['cp936.enc'],
              'ISO-IR-165' => ['iso-ir-165.enc'],
              );
--- Encode/Encode.pm    Sat Mar  2 11:45:11 2002
+++ Encode.new/Encode.pm        Sat Mar  2 20:10:56 2002
@@ -170,7 +170,7 @@
 # TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
 # TODO: HP-UX '15' encodings japanese15 korean15 roi15
 # TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 EUC-TW HZ
+# TODO: Chinese encodings HZ
 # TODO: Armenian encoding ARMSCII-8
 # TODO: Hebrew encoding ISO-8859-8-1
 # TODO: Thai encoding TCVN
diff -ur Encode/TW/Makefile.PL Encode.new/TW/Makefile.PL
--- Encode/TW/Makefile.PL       Sat Mar  2 11:45:11 2002
+++ Encode.new/TW/Makefile.PL   Sat Mar  2 19:52:46 2002
@@ -5,6 +5,7 @@
 my %tables = ('BIG5'           => ['big5.enc'],
              'BIG5-HKSCS'      => ['big5-hkscs.enc'],
              'CP950'           => ['cp950.enc'],
+             'EUC-TW'          => ['euc-tw.ucm'],
              );
 
 my $name = 'TW';

Attachment: pgpnfQzQETGy7.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>