At Mon, 2 Jun 2003 16:26:26 -0500,
dchartrand(_at_)scclab(_dot_)com wrote:
Namazu is having problems displaying and understanding french characters such
as
"àéèçô..." when used to search Mailman archives. A word like "troisième" is
displayed (and searched) as "troisime" in Namazu... Notice the missing "è".
In the past, I had got a same report. So I tried to check the probrem
with the following sequence:
1. Saved the mail <1054589186(_dot_)3edbc102c03c0(_at_)scclab(_dot_)com> as a
text file
named as "docs/french-text.txt".
2. Typed "LANG=C mknmz -O index ./docs" to make index.
3. Typed "LANG=C namazu -h ` sed -n '78p' index/NMZ.w` index > foo" to
search the word "troisième", because I don't know how to input any
french characters.
4. Checked the file foo. It is like the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<!-- LINK-REV-MADE -->
<link rev=made
href="mailto:webmaster(_at_)puti(_dot_)knok(_dot_)daionet(_dot_)gr(_dot_)jp">
<!-- LINK-REV-MADE -->
<title>Namazu: a Full-Text Search Engine: <troisième></title>
:
(snip)
:
<h2>Results:</h2>
<p>
References: [ troisième: 1 ]
:
(snip)
Hmm, it seems no problem for me.
I am using Mailman 2.1 and Namazu 2.0.12.
How about your envrionment? The follwoing is mine:
Debian GNU/Linux (today's unstable)
Linux 2.4.21-pre4
glibc 2.3.1
--
NOKUBI Takatsugu
E-mail: knok(_at_)daionet(_dot_)gr(_dot_)jp
knok(_at_)namazu(_dot_)org / knok(_at_)debian(_dot_)org