namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: namazu stopped working

2005-11-24 22:33:36
IEM - network operating center wrote:

international mailinglist, language is english (but there are a lot of 
subscribers with special characters in their names, especial spanish ones)
the list in question is the one with the most traffic: the archive 
starts in 1998 and by now there are about 37850 files in it, without 
attachments (which i exclude from indexing via the 
"exclude-pattern"-flag) there are 33419.

Namazu supports only English (and Japanese). 
Spanish cannot be correctly processed. 
In a word, operation when Spanish is input has not been secured. 

i guess it is a problem with some multi-byte characters.

The cause might be another one. 
If the document file can be gotten by specifying the document 
that makes trouble, it is 
likely to be able to pinpoint the cause. 

By the way,
I think that warning is improved by the following corrections. 
(no guarantee)

(which reminds me that when i build the index i get some warnings:
"Wide character in print at /usr/bin/mknmz line 2447, <GEN7162> line
158600.")

--- namazu-2.0.14/scripts/mknmz.in      2004-04-08 17:34:42.000000000
+0900
+++ mknmz.in    2005-11-25 14:21:26.000000000 +0900
@@ -2250,7 +2250,7 @@ sub count_words ($$$$) {
     $$contref =~ tr/A-Z/a-z/;

     # Remove control char.
-    $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a/ /;
+    $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a\x80-\xff/ /;

     # Do wakatigaki if necessary.
     if (util::islang("ja")) {
-- 
=====================================================================
TADAMASA TERANISHI  yw3t-trns(_at_)asahi-net(_dot_)or(_dot_)jp
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint =  474E 4D93 8E97 11F6 662D  8A42 17F5 52F4 10E7 D14E

_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en

<Prev in Thread] Current Thread [Next in Thread>