On Sat, Mar 02, 2002 at 11:12:42AM +0000, Nick Ing-Simmons wrote:
This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use
that functionality for Encode.pm?
The .ucm format can cope:
Thanks! I'm done with conversion and tested against libiconv. Patch follows;
files are available at <http://autrijus.org/ucm.tar.gz>.
Libiconv's GB18030 table elicited some warnings from compile:
Unicode character 0xfdXX is illegal at ../compile line 81, <E> line 39659.
The range is question is fdxx and ffxx. Is that anything to worry about?
Also, the resulting file size is quite hefty:
-rw-r--r-- 1 root 512 1688107 Mar 2 19:51 euc-tw.ucm
-rw-r--r-- 1 root 512 1543333 Mar 2 19:51 gb18030.ucm
And they add ~600k to the compressed perl distribution. Is that acceptable?
The good news is there won't be anything else that big coming from the Chinese
front; aside from HZ, perl's support could be considered complete.
Thanks,
/Autrijus/
diff -ur Encode/CN/Makefile.PL Encode.new/CN/Makefile.PL
--- Encode/CN/Makefile.PL Sat Mar 2 11:45:11 2002
+++ Encode.new/CN/Makefile.PL Sat Mar 2 19:52:53 2002
@@ -6,6 +6,7 @@
GBK => ['gbk.enc'],
GB2312 => ['gb2312.enc'],
GB12345 => ['gb12345.enc'],
+ GB18030 => ['gb18030.ucm'],
CP936 => ['cp936.enc'],
'ISO-IR-165' => ['iso-ir-165.enc'],
);
--- Encode/Encode.pm Sat Mar 2 11:45:11 2002
+++ Encode.new/Encode.pm Sat Mar 2 20:10:56 2002
@@ -170,7 +170,7 @@
# TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
# TODO: HP-UX '15' encodings japanese15 korean15 roi15
# TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 EUC-TW HZ
+# TODO: Chinese encodings HZ
# TODO: Armenian encoding ARMSCII-8
# TODO: Hebrew encoding ISO-8859-8-1
# TODO: Thai encoding TCVN
diff -ur Encode/TW/Makefile.PL Encode.new/TW/Makefile.PL
--- Encode/TW/Makefile.PL Sat Mar 2 11:45:11 2002
+++ Encode.new/TW/Makefile.PL Sat Mar 2 19:52:46 2002
@@ -5,6 +5,7 @@
my %tables = ('BIG5' => ['big5.enc'],
'BIG5-HKSCS' => ['big5-hkscs.enc'],
'CP950' => ['cp950.enc'],
+ 'EUC-TW' => ['euc-tw.ucm'],
);
my $name = 'TW';
pgpnfQzQETGy7.pgp
Description: PGP signature