IEM - network operating center wrote:
international mailinglist, language is english (but there are a lot of
subscribers with special characters in their names, especial spanish ones)
the list in question is the one with the most traffic: the archive
starts in 1998 and by now there are about 37850 files in it, without
attachments (which i exclude from indexing via the
"exclude-pattern"-flag) there are 33419.
Namazu supports only English (and Japanese).
Spanish cannot be correctly processed.
In a word, operation when Spanish is input has not been secured.
i guess it is a problem with some multi-byte characters.
The cause might be another one.
If the document file can be gotten by specifying the document
that makes trouble, it is
likely to be able to pinpoint the cause.
By the way,
I think that warning is improved by the following corrections.
(no guarantee)
(which reminds me that when i build the index i get some warnings:
"Wide character in print at /usr/bin/mknmz line 2447, <GEN7162> line
158600.")
--- namazu-2.0.14/scripts/mknmz.in 2004-04-08 17:34:42.000000000
+0900
+++ mknmz.in 2005-11-25 14:21:26.000000000 +0900
@@ -2250,7 +2250,7 @@ sub count_words ($$$$) {
$$contref =~ tr/A-Z/a-z/;
# Remove control char.
- $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a/ /;
+ $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a\x80-\xff/ /;
# Do wakatigaki if necessary.
if (util::islang("ja")) {
--
=====================================================================
TADAMASA TERANISHI yw3t-trns(_at_)asahi-net(_dot_)or(_dot_)jp
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint = 474E 4D93 8E97 11F6 662D 8A42 17F5 52F4 10E7 D14E
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en