perl-unicode

Re: Encode; Should we aggregate all EUCs?

2002-02-05 10:27:11
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
Folks,

  First, thank you for perl(_at_)14550(_dot_)
  Based upon that, I tried aggregating all EUCs (euc-(cn|jp|kr)) as Nick
suggested.  It did work nicely except for the time it compiles.  Awful
lot of time.

I tweaked some things in //depot/perlio/(_dot_)(_dot_)(_dot_)(_at_)14563(_dot_)

I made the expensive sub-string search optional, and then did not use
it for EUC_JP. I bundled jis0201 (which is ASCII+) in with other ASCII-oids
in the baseline bundle. I added
               'jis0208.enc',
               'jis0212.enc',
               'shiftjis.enc',
to the EUC_JP bundle.

Even bundling all those, without substring search build time is better
than EUC_JP alone with substring search.

It seems to me as a westerner than the correct bundles might be
by country/language rather than organization - e.g. the
euc-jp + jis0208 + jis0212 + shiftjis  => Encode::Japanese

rather than

euc-jp + euc-kr + euc-cn => Encode::EUC

but I _really_ don't know the regional politics well enough to know
if that will upset anyone.

Which ever way they are bundled we need to teach Encode.pm that they
exist so it can demand-load them without user's script having to know
how they are bundled.


* if iconv tables can be used (there is already a CPAN module that
claims to do so but didn't work on my environment).
* my humble version of encoding schemes which I prepared for Jcode-NG

  jki, how fast do you want perl 5.7.3 released?  I know you are dying
to release ASAP.  But at the same time compiled version of Encode
definitely needs some work besides codes.

Can you list what that other work is?


Here is my suggestion;

* If you want 5.7.3 out in a week or so,  Drop EUC_JP and release the
rest.  Encode::Tcl may be slow but works (Thanks to Sadahiro)
* If you can wait for one extra week I think I can make Encode::EUC and
other compile-base encodings together with Encode::(JP|ZN|KR) which call
them.  I think Sadahiro's Encode::Tcl::Escape can be used to implement
Encode::ISO2022 (Or whatever that is)

The snag with keeping Encode::Tcl is that we need to keep *.enc form
of the encodings in addition to the more human readable and flexible *.ucm
forms.

I can still convince myself in optomistic momments that the encengine.c
scheme can assist escape encodings (perhaps with some tweaks), or given
the regex-like nature of them perhaps leaving as perl code is just as good.
Maybe this is a 5.7.3+ series of experiments.

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/