namazu-users-en
[Top] [All Lists]

Re: Malformed UTF-8 character ...

To: namazu-users-en@namazu.org
Subject: Re: Malformed UTF-8 character ...
From: Earl Hood <earl@earlhood.com>
Date: Fri, 07 May 2004 14:49:42 -0500
Reply-to: Earl Hood <earl@earlhood.com>
Message-id: <200405071949.i47Jngc06355@gator.earlhood.com>
On May 6, 2004 at 10:40, Tadamasa Teranishi wrote:

Figuring it was a LANG envariable setting, I explicitly sent LANG
to en_US (it was defaulted to en_US.UTF-8), but it did not fix it.
Maybe I should try en_US.ISO-8859-1?

xxxx.UTF-8 is not supported.

I'm aware of this.

You Instead of "en_US.UTF-8" You have to set "C".

probably "LC_ALL" or "LC_CTYPE" etc. It is en_US.UTF-8.
Please set up LC_ALL=C and use mknmz.

Is there any drawback of including the "use bytes" pragma to
avoid this problem?  Is there a need to support older versions
of perl that do not support the pragma?

As a sanity check, namazu could do a locale check (checking various
envariables), and if set to a UTF-8 locale, could either generate
a warning, and fallback to the C locale, or could error out
stating unsupported locale.

With later linux distributions now defaulting to UTF-8-based locales,
such checks may eliminate user mail to the list about this.

--ewh

<Prev in Thread] Current Thread [Next in Thread>