I just finished the HZ support; since its escape rule is a little bit
beyond "E" format's capability (specifically, the escaping ~~ and the
to-be-ignored ~\n), I opted for a regex-based approach.
Tested against libiconv's test file, and found no problems. However
I encountered a little problem trying to apply this patch (seems that
'patch' won't generate new directory for me), so the HZ.pm is attached
separately. Please
1. remove Encode/Encode/HZ.enc
2. mkdir Encode/lib/Encode/CN, and put HZ.pm there
3. apply the rest of patch, as included below.
Aside from HZ support, this patch also makes ::TW and ::CN try to
autoload Encode::HanExtra. The reason is it should be transparent to
user (after downloading HanExtra.pm) whether we choose to put some
encoding into the core or not. It also fixed a couple \t nits in POD.
I'll work on the tests when I find some time...
Thanks,
/Autrijus/
diff -dur Encode/CN/CN.pm Encode.new/CN/CN.pm
--- Encode/CN/CN.pm Tue Mar 5 06:50:47 2002
+++ Encode.new/CN/CN.pm Tue Mar 5 09:59:25 2002
@@ -1,8 +1,13 @@
package Encode::CN;
-use Encode;
our $VERSION = '0.02';
+
+use Encode;
+use Encode::CN::HZ;
use XSLoader;
XSLoader::load('Encode::CN',$VERSION);
+
+local $@;
+eval "use Encode::HanExtra"; # load extra encodings if they exist
1;
__END__
@@ -25,7 +29,8 @@
gb2312 The raw (low-bit) GB2312 character map
gb12345 Traditional chinese counterpart to GB2312 (raw)
iso-ir-165 GB2312 + GB6345 + GB8565 + additions
- cp936 Code Page 936, also known as GBK (Extended GuoBiao)
+ cp936 Code Page 936, also known as GBK (Extended GuoBiao)
+ hz 7-bit escaped GB2312 encoding
To find how to use this module in detail, see L<Encode>.
@@ -35,9 +40,10 @@
separately on CPAN, under the name L<Encode::HanExtra>. That module
also contains extra Taiwan-based encodings.
-=head1 BUGS
+This module will automatically load L<Encode::HanExtra> if you have it on
+your machine.
-The C<HZ> (Hanzi) escaped encoding is not supported.
+=head1 BUGS
ASCII part (0x00-0x7f) is preserved for all encodings, even though it
conflicts with mappings by the Unicode Consortium. See
diff -dur Encode/KR/KR.pm Encode.new/KR/KR.pm
--- Encode/KR/KR.pm Tue Mar 5 06:50:47 2002
+++ Encode.new/KR/KR.pm Tue Mar 5 10:01:05 2002
@@ -1,6 +1,7 @@
package Encode::KR;
-use Encode;
our $VERSION = '0.02';
+
+use Encode;
use XSLoader;
XSLoader::load('Encode::KR',$VERSION);
@@ -23,7 +24,7 @@
euc-kr EUC (Extended Unix Character)
ksc5601 Korean standard code set
- cp949 Code Page 949 (EUC-KR + Unified Hangul Code)
+ cp949 Code Page 949 (EUC-KR + Unified Hangul Code)
To find how to use this module in detail, see L<Encode>.
diff -dur Encode/MANIFEST Encode.new/MANIFEST
--- Encode/MANIFEST Tue Mar 5 06:50:47 2002
+++ Encode.new/MANIFEST Tue Mar 5 10:00:38 2002
@@ -95,7 +95,6 @@
Encode/gb1988.enc
Encode/gb2312.enc
Encode/gsm0338.enc
-Encode/HZ.enc
Encode/iso-ir-165.enc
Encode/ir-197.enc
Encode/jis0201.enc
@@ -155,6 +154,7 @@
lib/Encode/Unicode.pm
lib/Encode/utf8.pm
lib/Encode/XS.pm
+lib/Encode/CN/HZ.pm
lib/Encode/Tcl/Escape.pm
lib/Encode/Tcl/Extended.pm
lib/Encode/Tcl/HanZi.pm
diff -dur Encode/TW/TW.pm Encode.new/TW/TW.pm
--- Encode/TW/TW.pm Tue Mar 5 06:50:47 2002
+++ Encode.new/TW/TW.pm Tue Mar 5 09:59:21 2002
@@ -1,9 +1,13 @@
package Encode::TW;
-use Encode;
our $VERSION = '0.02';
+
+use Encode;
use XSLoader;
XSLoader::load('Encode::TW',$VERSION);
+local $@;
+eval "use Encode::HanExtra"; # load extra encodings if they exist
+
1;
__END__
=head1 NAME
@@ -23,7 +26,7 @@
big5 The original Big5 encoding
big5-hkscs Big5 plus Cantonese characters in Hong Kong
- cp950 Code Page 950 (Big5 + Microsoft vendor mappings)
+ cp950 Code Page 950 (Big5 + Microsoft vendor mappings)
To find how to use this module in detail, see L<Encode>.
@@ -32,6 +35,9 @@
Due to size concerns, C<EUC-TW> (Extended Unix Character) and C<BIG5PLUS>
(CMEX's Big5+) are distributed separately on CPAN, under the name
L<Encode::HanExtra>. That module also contains extra China-based encodings.
+
+This module will automatically load L<Encode::HanExtra> if you have it on
+your machine.
=head1 BUGS
--- Encode/Encode.pm Tue Mar 5 06:50:47 2002
+++ Encode.new/Encode.pm Tue Mar 5 10:05:33 2002
@@ -173,7 +173,6 @@
# TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
# TODO: HP-UX '15' encodings japanese15 korean15 roi15
# TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings HZ
# TODO: Armenian encoding ARMSCII-8
# TODO: Hebrew encoding ISO-8859-8-1
# TODO: Thai encoding TCVN
HZ.pm
Description: Perl program
pgpO1embqGnzi.pgp
Description: PGP signature