perl-unicode

Re: 5.8 roadmap and Encode

2002-03-01 17:13:34
On Thu, Feb 28, 2002 at 06:21:03PM +0200, Jarkko Hietaniemi wrote:
  - 'hz' and 'iso-2022-cn', two different encoding tables for gb2312
    described above.
This isn't there?  I remember seeing HZ.enc?

Apparently its support was not completed, which involves escape sequence
processing.

  - 'gb18030', used in glibc2.2, is a superset of gbk, which is a super
    set of gb2312; we should use that instead of 'gbk' if we want gbk
    support.

This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use
that functionality for Encode.pm?

Anyway, 'gbk' is done, which is probably more usable (and recognizable).

  - 'iso-ir-165', a different extension to gb2312, adding gb6345 and
    gb8565 support. Not in wide use.

Done as 'iso-ir-165'.

  - 'big5p', the Big5+ Traditional Chinese encoding, is similarily a

This was deemed unneccessary, as it didn't have an iana entry and isn't
really deployed in any sense.

  - 'big5-hkscs', a different extension to big5, adding characters used
    is Hong Kong, incompatible with big5p.

This, however, is dominant in Hong Kong. Done as 'big5-hkscs'.

All trivial Chinese encodings are done; patch against blead follows. They
were generated by GNU libiconv's test/*.TXT, and tested against gnu iconv.

The TODO is now 'euc-tw'(which covers most of the modern cns-11643),
'gb18030', and 'hz'.

That patch also correct a small problem in Encode/lib/Encode/XS.pm that says
version is 0.30, which didn't agree with 0.40 in Encode.pm. Oh, and the HZ
comment doesn't belong in TW.

The actual encoding files (to be placed in Encode/) are available separately
at <http://autrijus.org/zh_enc.tar.gz>.

Thanks,
/Autrijus/

diff -dur Encode/CN/CN.pm Encode.2/CN/CN.pm
--- Encode/CN/CN.pm     Sun Feb 17 01:12:34 2002
+++ Encode.2/CN/CN.pm   Sat Mar  2 07:18:43 2002
@@ -6,3 +6,4 @@
 
 1;
 __END__
+todo: HZ (Escape-based)
diff -dur Encode/CN/Makefile.PL Encode.2/CN/Makefile.PL
--- Encode/CN/Makefile.PL       Tue Feb 26 06:59:47 2002
+++ Encode.2/CN/Makefile.PL     Sat Mar  2 07:17:43 2002
@@ -3,9 +3,11 @@
 use ExtUtils::MakeMaker;
 
 my %tables = (EUC_CN   => ['euc-cn.enc'],
+             GBK      => ['gbk.enc'],
              GB2312   => ['gb2312.enc'],
              GB12345  => ['gb12345.enc'],
              CP936    => ['cp936.enc'],
+             'ISO-IR-165' => ['iso-ir-165.enc'],
              );
 
 my $name = 'CN';
diff -dur Encode/Encode.pm Encode.2/Encode.pm
--- Encode/Encode.pm    Fri Mar  1 11:18:44 2002
+++ Encode.2/Encode.pm  Sat Mar  2 07:57:21 2002
@@ -170,7 +170,7 @@
 # TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
 # TODO: HP-UX '15' encodings japanese15 korean15 roi15
 # TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 GBK Big5-HSKCS EUC-TW
+# TODO: Chinese encodings GB18030 EUC-TW HZ
 # TODO: Armenian encoding ARMSCII-8
 # TODO: Hebrew encoding ISO-8859-8-1
 # TODO: Thai encoding TCVN
Only in Encode.2: Makefile.old
diff -dur Encode/TW/Makefile.PL Encode.2/TW/Makefile.PL
--- Encode/TW/Makefile.PL       Tue Feb 26 06:59:47 2002
+++ Encode.2/TW/Makefile.PL     Sat Mar  2 07:56:04 2002
@@ -2,8 +2,9 @@
 use strict;
 use ExtUtils::MakeMaker;
 
-my %tables = (BIG5   => ['big5.enc'],
-             CP950  => ['cp950.enc'],
+my %tables = ('BIG5'           => ['big5.enc'],
+             'BIG5-HKSCS'      => ['big5-hkscs.enc'],
+             'CP950'           => ['cp950.enc'],
              );
 
 my $name = 'TW';
diff -dur Encode/TW/TW.pm Encode.2/TW/TW.pm
--- Encode/TW/TW.pm     Sun Feb 17 01:12:34 2002
+++ Encode.2/TW/TW.pm   Sat Mar  2 07:18:36 2002
@@ -6,5 +6,3 @@
 
 1;
 __END__
-
-todo: HZ (Escape-based)
diff -dur Encode/lib/Encode/XS.pm Encode.2/lib/Encode/XS.pm
--- Encode/lib/Encode/XS.pm     Tue Jan 29 23:12:34 2002
+++ Encode.2/lib/Encode/XS.pm   Sat Mar  2 07:39:16 2002
@@ -1,6 +1,6 @@
 package Encode::XS;
 use strict;
-our $VERSION = do {my @r=(q$Revision: 0.30 $ =~ /\d+/g); sprintf "%d."."%02d" 
x $#r, @r};
+our $VERSION = do {my @r=(q$Revision: 0.40 $ =~ /\d+/g); sprintf "%d."."%02d" 
x $#r, @r};
 use base 'Encode::Encoding';
 1;
 __END__

Attachment: pgpp5cxSgRbaL.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>