On Thu, Jan 31, 2002 at 04:19:23AM +0900, Dan Kogai wrote:
And the speed of the compile script may be a problem if we want all
CJK to be XS-based. It roughly takes about 25 seconds to compile single
CJK encoding on my FreeBSD box. Well, I can live with that too but
other porters may find it frustrating....
Now I've re-read this message I've just noticed that paragraph.
I did get frustrated with it.
1: It's too slow
2: It uses too much RAM. (Well, that's subjective, but my FreeBSD box only
has 16M total, and it was not a happy bunny, swapping like crazy and taking
over an hour to run 5 minutes worth of CPU time)
So I've been re-jigging it (and Jarkko has been commiting the improvements)
to bleadperl - not sure if you're subscribed to p5p.
By yesterday I think it was 37% faster at compiling EUC_JP, and I've found
some more things to tweak today.
[eg just found that using (unpack "n*", pack "H*", $line) makes it 2.5% faster
than (map {hex $_} $line =~ /(....)/g)
I think that that is portable to big endian, and to 64 bit]
I hope that I've not been tramping on things you've been doing. It's still
making output files that are byte-for-byte identical with what the original of
last week did.
I've got a question about FFFD. The original compile script does this:
for (my $j = 0; $j < 16; $j++)
{
no strict 'refs';
my $ech = &{"encode_$type"}($ch,$page);
my $val = hex(substr($line,0,4,''));
next if $val == 0xFFFD;
if ($val || (!$ch && !$page))
{
my $el = length($ech);
$max_el = $el if (!defined($max_el) || $el > $max_el);
$min_el = $el if (!defined($min_el) || $el < $min_el);
my $uch = encode_U($val);
if (exists $seen{$uch})
{
warn sprintf("U%04X is %02X%02X and %02X%02X\n",
$val,$page,$ch,@{$seen{$uch}});
}
else
{
$seen{$uch} = [$page,$ch];
}
enter($e2u,$ech,$uch,$e2u,0);
enter($u2e,$uch,$ech,$u2e,0);
}
else
{
# No character at this position
# enter($e2u,$ech,undef,$e2u);
}
$ch++;
}
Is there a bug?
Should the $ch++ happen even for the cases where $val == 0xFFFD?
Currently it looks like $ch is not incremented when the input value is 0xFFFD
Nicholas Clark
--
EMCFT http://www.ccl4.org/~nick/CV.html