perl-unicode

Re: Encode; Should we aggregate all EUCs?

2002-02-05 09:30:14
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
On Tue, Feb 05, 2002 at 05:07:00PM +0100, Andreas Marcel Riechert wrote:
Nick Ing-Simmons <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com> writes:

Perhaps we make "Build CJK encodings?" a Configure question?
We could determine default based on locale, or (as I once
did for a UK/USA paper size choice) by TZ.

Doing so we will have again Perl and something like JPerl, as
we allways had. Close to 100% of my Perl coding is realated to
CJK and I don't like the idea of having two Perls again. Aka,
I don't like to tell some customers etc. to reinstall Perl with
"CJK" support.

Well, it wouldn't be "reinstall", it would be more like "get
more modules from CPAN".

It isn't even "get more modules from CPAN" but
cd ext/Encode/EUC-JP
perl Makefile.PL
make
make install

But I agree: we shouldn't diverge.

Given ...

Nicholas Clark <nick(_at_)unfortu(_dot_)net> writes:
On Tue, Feb 05, 2002 at 08:38:28AM +0000, Nick Ing-Simmons wrote:
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:

Perhaps we make "Build CJK encodings?" a Configure question?
We could determine default based on locale, or (as I once
did for a UK/USA paper size choice) by TZ.

22801 bytes (23.6%) saved using substrings

That is where the time goes - there is a loop which uses index()
on all existing strings to see if it can re-use one.
It saves 22K but is that worth while?

Then surely this extra searching becomes the configure question?

 Try harder to compress CJK encodings (this will slow your build 
considerably)?
 [no]

Just to prove my assumption I just added a -O switch to ext/Encode/compile
which enables the substring search:

nick(_at_)bactrian 537$ time ../../../perl -I../../../lib ../compile -o 
EUC_JP.xs ../Encode/euc-jp.ucm
Reading euc-jp (euc-jp)
Writing compiled form
34803 bytes in string tables
35066 bytes (101%) saved spotting duplicates

real    0m6.764s
user    0m6.660s
sys     0m0.100s
nick(_at_)bactrian 538$ time ../../../perl -I../../../lib ../compile -O -o 
EUC_JP.xs
.../Encode/euc-jp.ucm
Reading euc-jp (euc-jp)
Writing compiled form
33414 bytes in string tables
33678 bytes (101%) saved spotting duplicates
2777 bytes (8.31%) saved using substrings

real    0m54.631s
user    0m54.540s
sys     0m0.080s
nick(_at_)bactrian 539$

If I throw jis208.enc into the pot, then without -O it is 12s
and with -O approx 4 minutes for a trivial saving.




As a heavy user of Perl I just wanted to tell my objections.

Your milleage may vary,

Andreas Marcel Riechert

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/