perl-unicode

Encode-0.99 is now available

2002-03-25 12:55:19
Encode Hackers,

  As the title says, Encode-0.99 is now available as

http://www.dan.co.jp/~dankogai/Encode-0.99.tar.gz

  or CPAN.  Here are Changes;

0.99  Tue Mar 26 2002
- lib/Encode/JP/Const.pm
+ lib/Encode/CJKConstants.pm
+ lib/Encode/CN/2022_CN.pm
+ lib/Encode/KR/2022_KR.pm
+ t/KR.t
+ t/gb2312.euc
+ t/gb2312.ref
+ t/ksc5601.euc
+ t/ksc5601.ref
+ t/table.euc
+ t/table.ref
+ ucm2table
  * Support for ISO-2022-KR and ISO-2022-CN added.
  * t/KR.t added!
  * more t/*.{euc,ref} added, which was autogenerated from ucm2table
  * ucm2table autogenerates character table out of UCM files.
- engine.c
+ encengine.c
- lib/Encode/Supports.pod
+ lib/Encode/Supported.pod
  Names reverted due to popular demand.
  8.3 rule applies only when there is a conflict.
  Message-Id: <20020325095924(_dot_)GD44120(_at_)not(_dot_)autrijus(_dot_)org>
! */Makefile.PL
- Encode/*.enc
+ Encode/*.ucm
- lib/Tcl*
- lib/Encode/Format/Enc.pod
- t/Tcl.t
  * Character tables is now 100% ucm.
  * All files under Encode/ is now 8.3-compliant
  * some of missing encodings added (i.e. gsm0338 and nextstep)
  * Vendor mappings aggregated with appropriate national std in
    Makefile.PL, resulting smaller *.so especially for CJK.
    Following is result on Dan's FreeBSD box.
                                                  Now        Then
  ---------------------------------------------------------------
  blib/arch/auto/Encode/Byte/Byte.so          157,279     171,042
  blib/arch/auto/Encode/CN/CN.so            1,634,476   1,626,685
  blib/arch/auto/Encode/EBCDIC/EBCDIC.so       18,476      18,476
  blib/arch/auto/Encode/Encode.so              27,791      27,791
  blib/arch/auto/Encode/JP/JP.so            1,408,056   1,832,811
  blib/arch/auto/Encode/KR/KR.so            1,156,518   1,329,587
  blib/arch/auto/Encode/Symbol/Symbol.so       23,940      20,990
  blib/arch/auto/Encode/TW/TW.so*             948,761   1,316,437
  ---------------------------------------------------------------
  Total                                     5,375,297   6,343,819
  Saving                                      968,522
  * As a result of ucm-transition, Encode::Tcl dropped because
  Encode::Tcl demands *.enc.
  Encode::Tcl will be supplied in a separate tarball with *.enc.
  Message-Id: <C024E294-3FC3-11D6-8347-00039301D480(_at_)dan(_dot_)co(_dot_)jp>
!compile
-encengine.c
+encode.c
!Encode.pm
-lib/Encode/Supported.pod
+lib/Encode/Supports.pod
-lib/Encode/iso10646_1.pm
+lib/Encode/10646_1.pm
-lib/Encode/EncFormat.pod
+lib/Encode/Format/Enc.pod
Files renamed 8.3 filename compliance. Affected modules/scripts revised.
- lib/Encode/JP/Constants.pm
+ lib/Encode/JP/Consts.pm
! lib/Encode/JP/JIS.pm
! lib/Encode/JP/H2Z.pm
  Version nit problem and 8.3 rule fix.
  > Package namespace         installed    latest  in CPAN file
> Encode::JP::Constants 0.92 1.02 J/JH/JHI/perl-5.7.3.tar.gz
  was noted by jhi then Dan discovers "Constants.pm" does not comply 8.3
  rule.  Contants.pm renamed to Consts.pm and affected modules are fixed
  accordingly.  In addition, legacy "use vars qw()..." are replaced with
  "our";
  Message-Id: <20020325011248(_dot_)D1561(_at_)alpha(_dot_)hut(_dot_)fi>
  Message-Id: <41023D51-3FB5-11D6-8347-00039301D480(_at_)dan(_dot_)co(_dot_)jp>
! JP/JP.pm
- lib/Encode/JP/ISO_2022_JP.pm
- lib/Encode/JP/ISO_2022_JP_1.pm
+ lib/Encode/JP/2022_JP.pm
+ lib/Encode/JP/2022_JP1.pm
                01234567.012
  8.3 naming conflict for vanilla fat addressed by jhi
  Message-Id: <20020324201931(_dot_)V22596(_at_)alpha(_dot_)hut(_dot_)fi>

As you see, the biggest difference is that Encode no longer uses *.enc, or Tcl's encoding table. Instead it uses IBM's ucm format. As a result;

* Encode::Tcl is detached from the main Encode. Encode::Tcl will be made available via separate package.
* File size is significantly larger.  Now the tarball is over 1 MB.
* In exchange, it compiles faster and the resulting table is smaller.
* But the best thing about ucm is that it is now much easier to debug/hack
  the table!  This is nearly impossible on *.enc
* t/KR.t is added at last.
* ISO-2022-(KR|CN) added.  But is there any apps that handles this one?
most browsers and mailers only support EUC-KR and EUC-CN (but their MIME names are mostly not EUC-* but the charset, such as gb2314 and KS_C_5601. Strange!). I am sure my implementation is correct but I can't see that for myself....)

Yours,

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>