On Thu, Feb 28, 2002 at 06:21:03PM +0200, Jarkko Hietaniemi wrote:
- 'hz' and 'iso-2022-cn', two different encoding tables for gb2312
described above.
This isn't there? I remember seeing HZ.enc?
Apparently its support was not completed, which involves escape sequence
processing.
- 'gb18030', used in glibc2.2, is a superset of gbk, which is a super
set of gb2312; we should use that instead of 'gbk' if we want gbk
support.
This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use
that functionality for Encode.pm?
Anyway, 'gbk' is done, which is probably more usable (and recognizable).
- 'iso-ir-165', a different extension to gb2312, adding gb6345 and
gb8565 support. Not in wide use.
Done as 'iso-ir-165'.
- 'big5p', the Big5+ Traditional Chinese encoding, is similarily a
This was deemed unneccessary, as it didn't have an iana entry and isn't
really deployed in any sense.
- 'big5-hkscs', a different extension to big5, adding characters used
is Hong Kong, incompatible with big5p.
This, however, is dominant in Hong Kong. Done as 'big5-hkscs'.
All trivial Chinese encodings are done; patch against blead follows. They
were generated by GNU libiconv's test/*.TXT, and tested against gnu iconv.
The TODO is now 'euc-tw'(which covers most of the modern cns-11643),
'gb18030', and 'hz'.
That patch also correct a small problem in Encode/lib/Encode/XS.pm that says
version is 0.30, which didn't agree with 0.40 in Encode.pm. Oh, and the HZ
comment doesn't belong in TW.
The actual encoding files (to be placed in Encode/) are available separately
at <http://autrijus.org/zh_enc.tar.gz>.
Thanks,
/Autrijus/
diff -dur Encode/CN/CN.pm Encode.2/CN/CN.pm
--- Encode/CN/CN.pm Sun Feb 17 01:12:34 2002
+++ Encode.2/CN/CN.pm Sat Mar 2 07:18:43 2002
@@ -6,3 +6,4 @@
1;
__END__
+todo: HZ (Escape-based)
diff -dur Encode/CN/Makefile.PL Encode.2/CN/Makefile.PL
--- Encode/CN/Makefile.PL Tue Feb 26 06:59:47 2002
+++ Encode.2/CN/Makefile.PL Sat Mar 2 07:17:43 2002
@@ -3,9 +3,11 @@
use ExtUtils::MakeMaker;
my %tables = (EUC_CN => ['euc-cn.enc'],
+ GBK => ['gbk.enc'],
GB2312 => ['gb2312.enc'],
GB12345 => ['gb12345.enc'],
CP936 => ['cp936.enc'],
+ 'ISO-IR-165' => ['iso-ir-165.enc'],
);
my $name = 'CN';
diff -dur Encode/Encode.pm Encode.2/Encode.pm
--- Encode/Encode.pm Fri Mar 1 11:18:44 2002
+++ Encode.2/Encode.pm Sat Mar 2 07:57:21 2002
@@ -170,7 +170,7 @@
# TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
# TODO: HP-UX '15' encodings japanese15 korean15 roi15
# TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 GBK Big5-HSKCS EUC-TW
+# TODO: Chinese encodings GB18030 EUC-TW HZ
# TODO: Armenian encoding ARMSCII-8
# TODO: Hebrew encoding ISO-8859-8-1
# TODO: Thai encoding TCVN
Only in Encode.2: Makefile.old
diff -dur Encode/TW/Makefile.PL Encode.2/TW/Makefile.PL
--- Encode/TW/Makefile.PL Tue Feb 26 06:59:47 2002
+++ Encode.2/TW/Makefile.PL Sat Mar 2 07:56:04 2002
@@ -2,8 +2,9 @@
use strict;
use ExtUtils::MakeMaker;
-my %tables = (BIG5 => ['big5.enc'],
- CP950 => ['cp950.enc'],
+my %tables = ('BIG5' => ['big5.enc'],
+ 'BIG5-HKSCS' => ['big5-hkscs.enc'],
+ 'CP950' => ['cp950.enc'],
);
my $name = 'TW';
diff -dur Encode/TW/TW.pm Encode.2/TW/TW.pm
--- Encode/TW/TW.pm Sun Feb 17 01:12:34 2002
+++ Encode.2/TW/TW.pm Sat Mar 2 07:18:36 2002
@@ -6,5 +6,3 @@
1;
__END__
-
-todo: HZ (Escape-based)
diff -dur Encode/lib/Encode/XS.pm Encode.2/lib/Encode/XS.pm
--- Encode/lib/Encode/XS.pm Tue Jan 29 23:12:34 2002
+++ Encode.2/lib/Encode/XS.pm Sat Mar 2 07:39:16 2002
@@ -1,6 +1,6 @@
package Encode::XS;
use strict;
-our $VERSION = do {my @r=(q$Revision: 0.30 $ =~ /\d+/g); sprintf "%d."."%02d"
x $#r, @r};
+our $VERSION = do {my @r=(q$Revision: 0.40 $ =~ /\d+/g); sprintf "%d."."%02d"
x $#r, @r};
use base 'Encode::Encoding';
1;
__END__
pgpp5cxSgRbaL.pgp
Description: PGP signature