namazu-users-en
[Top] [All Lists]

Re: Malformed UTF-8 character ...

2004-05-07 12:49:45
On May 6, 2004 at 10:40, Tadamasa Teranishi wrote:

Figuring it was a LANG envariable setting, I explicitly sent LANG
to en_US (it was defaulted to en_US.UTF-8), but it did not fix it.
Maybe I should try en_US.ISO-8859-1?

xxxx.UTF-8 is not supported.

I'm aware of this.

You Instead of "en_US.UTF-8" You have to set "C".

probably "LC_ALL" or "LC_CTYPE" etc. It is en_US.UTF-8.
Please set up LC_ALL=C and use mknmz.

Is there any drawback of including the "use bytes" pragma to
avoid this problem?  Is there a need to support older versions
of perl that do not support the pragma?

As a sanity check, namazu could do a locale check (checking various
envariables), and if set to a UTF-8 locale, could either generate
a warning, and fallback to the C locale, or could error out
stating unsupported locale.

With later linux distributions now defaulting to UTF-8-based locales,
such checks may eliminate user mail to the list about this.

--ewh

<Prev in Thread] Current Thread [Next in Thread>