perl-unicode

Re: Encode; Should we aggregate all EUCs?

2002-02-05 01:41:03
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
Folks,

  First, thank you for perl(_at_)14550(_dot_)
  Based upon that, I tried aggregating all EUCs (euc-(cn|jp|kr)) as Nick
suggested.  It did work nicely except for the time it compiles.  Awful
lot of time.
  The tty gets silent for some 3 minutes at "Writing compiled form".
With EUC alone taking so much time we have to think carefully about how
to distribute encoding tables.
  Nick and I suggested that we distribute perl5.7.3 (and 5.8.0) sans CJK
then use CPAN to add more encodings.  >I have thought it over and
concluded though this is technically correct, it may be not so
politically correct.

That is my worry as well too.

Perhaps we make "Build CJK encodings?" a Configure question?
We could determine default based on locale, or (as I once
did for a UK/USA paper size choice) by TZ.


I want to show perl community in CJK world show
that we care.  I now believe we should do our best to include CJK
support to next perl because that is what Unicode support is all about.
After all Tcl comes with those.
  But how we do that can be a problem....

  I am also checking to see....

* if iconv tables can be used (there is already a CPAN module that
claims to do so but didn't work on my environment).
* my humble version of encoding schemes which I prepared for Jcode-NG

  jki, how fast do you want perl 5.7.3 released?  I know you are dying
to release ASAP.  But at the same time compiled version of Encode
definitely needs some work besides codes.  Here is my suggestion;

* If you want 5.7.3 out in a week or so,  Drop EUC_JP and release the
rest.  Encode::Tcl may be slow but works (Thanks to Sadahiro)
* If you can wait for one extra week I think I can make Encode::EUC and
other compile-base encodings together with Encode::(JP|ZN|KR) which call
them.  I think Sadahiro's Encode::Tcl::Escape can be used to implement
Encode::ISO2022 (Or whatever that is)

Dan the Man with Too Many Encodings to Handle

/usr/bin/time -l make
cp EUC.pm blib/lib/Encode/EUC.pm
/usr/home/dankogai/bin/perl5.7.2 ../compile -o EUC_JP.xs -f EUC_JP.fnm
M encoded euc-cn
M encoded euc-jp
M encoded euc-kr
Writing compiled form
96677 bytes in string tables
107853 bytes (112%) saved spotting duplicates

Probably worth keeping.

22801 bytes (23.6%) saved using substrings

That is where the time goes - there is a loop which uses index()
on all existing strings to see if it can re-use one.
It saves 22K but is that worth while?

/usr/home/dankogai/bin/perl5.7.2 ../../../lib/ExtUtils/xsubpp  -typemap
.../../../lib/ExtUtils/typemap  EUC_JP.xs > EUC_JP.xsc && mv EUC_JP.xsc
EUC_JP.c
Please specify prototyping behavior for EUC_JP.xs (see perlxs manual)
cc -c  -I..  -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing
-I/usr/local/include -O    -DVERSION=\"0.02\"  -DXS_VERSION=\"0.02\"
-DPIC -fpic -I../../..   EUC_JP.c
Running Mkbootstrap for Encode::EUC ()
chmod 644 EUC.bs
rm -f blib/arch/auto/Encode/EUC/EUC.so
LD_RUN_PATH="" cc  -shared  -L/usr/local/lib EUC_JP.o  -o
blib/arch/auto/Encode/EUC/EUC.so
chmod 755 blib/arch/auto/Encode/EUC/EUC.so
cp EUC.bs blib/arch/auto/Encode/EUC/EUC.bs
chmod 644 blib/arch/auto/Encode/EUC/EUC.bs
      189.74 real       174.47 user         0.59 sys
     20028  maximum resident set size
       746  average shared memory size
     17693  average unshared data size
       128  average unshared stack size
     13210  page reclaims
         0  page faults
         0  swaps
         0  block input operations
        79  block output operations
         0  messages sent
         0  messages received
         0  signals received
        91  voluntary context switches
      5049  involuntary context switches
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/