namazu-users-en
[Top] [All Lists]

Re: Mailman & "Charactères Français"

2003-06-02 22:38:52
At Mon,  2 Jun 2003 16:26:26 -0500,
dchartrand(_at_)scclab(_dot_)com wrote:
Namazu is having problems displaying and understanding french characters such 
as
"àéèçô..." when used to search Mailman archives. A word like "troisième" is
displayed (and searched) as "troisime" in Namazu... Notice the missing "è".

In the past, I had got a same report. So I tried to check the probrem
with the following sequence:

1. Saved the mail <1054589186(_dot_)3edbc102c03c0(_at_)scclab(_dot_)com> as a 
text file
   named as "docs/french-text.txt".

2. Typed "LANG=C mknmz -O index ./docs" to make index.

3. Typed "LANG=C namazu -h ` sed -n '78p' index/NMZ.w` index > foo" to
   search the word "troisième", because I don't know how to input any
   french characters.

4. Checked the file foo. It is like the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd";>
<html>
<head>
<!-- LINK-REV-MADE -->
<link rev=made 
href="mailto:webmaster(_at_)puti(_dot_)knok(_dot_)daionet(_dot_)gr(_dot_)jp">
<!-- LINK-REV-MADE -->
<title>Namazu: a Full-Text Search Engine: &lt;troisième&gt;</title>

  :
(snip)
  :
<h2>Results:</h2>
<p>
References:  [ troisième: 1 ] 
  :
(snip)

Hmm, it seems no problem for me.

I am using Mailman 2.1 and Namazu 2.0.12.

How about your envrionment? The follwoing is mine:

Debian GNU/Linux (today's unstable)
Linux 2.4.21-pre4
glibc 2.3.1
-- 
NOKUBI Takatsugu
E-mail: knok(_at_)daionet(_dot_)gr(_dot_)jp
        knok(_at_)namazu(_dot_)org / knok(_at_)debian(_dot_)org