Nick Ing-Simmons <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com> writes:
Encode::Tcl is too slow - even for 8-bit - which is why I wrote the
engine which works from the "compiled" form.
Have you tried using ext/Encode/compile to build an XS module for
EUC ?
The example above on my FreeBSD box, Pentium III 800 MHz and
512MB RAM took some two seconds to show the result (Its performance is
not too bad once the internal table is full).
If I had _ANY_ test data I would run the compiled test and give you
the comparative number.
You can use t/table.euc under Jcode module for instance. table.utf8
in my code example is just a utf8 version thereof. That's a data which
contains all characters defined in EUC (well, actually JISX0212 is not
included but very few environments can display JISX0212).
It is realy great to have some valid data!
For a start it has found a bug in :encoding layer - knew there must be some...
(I think I have rediscovered the multi-byte char spanning buffer boundary
bug ... which I could not reproduce before)
But avoiding that with this script:
use Encode;
use Encode::Tcl;
open(my $jp,"<","table.euc") || die "Cannot open table.euc:$!";
my $text = join('',<$jp>);
close($jp);
my $enc = find_encoding('euc-jp');
if ($enc)
{
my $uni = $enc->decode($text,1);
if (length $text)
{
die "Failed to translate";
}
open(my $un,">:utf8","table.utf8") || die "Cannot open table.utf8:$!";
print $un $uni;
close($un);
}
I get
nick(_at_)bactrian 624$ time ../../perl -I../../lib try2
real 0m1.389s
user 0m1.370s
sys 0m0.020s
nick(_at_)bactrian 624$
And file is binary identical against running linux iconv.
If I run the compile script on it and build Encode::EUC_JP
as an XS extension and change Encode::Tcl to :
use Encode::EUC_JP;
I get
nick(_at_)bactrian 626$ time ../../perl -I../../lib try2
real 0m0.197s
user 0m0.170s
sys 0m0.030s
nick(_at_)bactrian 626$
Which is still worse than:
nick(_at_)bactrian 626$ time iconv -f EUC-JP -t UTF-8 table.euc > expected
real 0m0.026s
user 0m0.010s
sys 0m0.020s
nick(_at_)bactrian 627$
But IO is sub-optimal.
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/