Thank you for your description about the issue.
At Fri, 17 Jun 2005 12:04:21 -0500,
Earl Hood wrote:
Wrt mknmz, with a locale of "C" or "en_US", by default, the strings are
_not_ utf-8. Even the mknmz code invokes binmode() on filehandles to
prevent Perl from applying any character encoding semantics (Perl 5.8.x
supports character encoding/decoding on file handles similiar to Java).
binmode was used for Win32 former, I hadn't know such side effect.
The problem trigger is in decode_numbered_entity() in html.pl and
If $num is > 256, Perl ends up creating a utf-8 sequence (because
of the "%c" format), causing the string having the entity decoded
get its utf-8 flag set (regardless of the current locale setting).
Subsequently, any character-based operations (like regexes or file
writes) cause Perl to generate warnings. It also causes mis-behavior
and probably corruption in Namazu.
Therefore, my initial fix was to drop any $num >= 255. This would
preserve the 8-bit agnostic behavior of namazu.
Hmm, it seems sufficently for me. I want to apply it in the stable
branch and HEAD.
Do you have any objection about it, Teranishi-san?
knok(_at_)namazu(_dot_)org / knok(_at_)debian(_dot_)org
Namazu-users-en mailing list