namazu-users-en
[Top] [All Lists]

[Namazu-users-en] 2.0.13 and wide characters problem + index exists but no results problem

2004-06-27 04:46:01
Hi,

I'm having few serious problem with namazu. I'm using Linux (glibc 2.3.3, perl 
5.008004) and:

- with vanilla namazu there is bunch of
Malformed UTF-8 character (unexpected continuation byte
XYZ, with no preceding start byte) messages

There are other reports about this, too
http://www.mhonarc.org/archive/html/mharc-users/2004-05/msg00008.html

patch from fedora fixed these 
(http://cvs.pld-linux.org/cgi-bin/cvsweb/SOURCES/namazu-fixinutf8.patch?rev=1.1)

- unfortunately that's not all, even with patch above I'm getting:
65/74 - /var/spool/mailman/archives/public/feedback/2002-July/003643.html 
[text/html; x-type=pipermail]
66/74 - /var/spool/mailman/archives/public/feedback/2002-July/003644.html 
[text/html; x-type=pipermail]
Wide character in print at /usr/bin/mknmz line 710, <GEN3> line 66.
67/74 - /var/spool/mailman/archives/public/feedback/2002-July/003645.html 
[text/html; x-type=pipermail]
Wide character in print at /usr/bin/mknmz line 710, <GEN3> line 67.
68/74 - /var/spool/mailman/archives/public/feedback/2002-July/003646.html 
[text/html; x-type=pipermail]
Wide character in print at /usr/bin/mknmz line 710, <GEN3> line 68.
69/74 - /var/spool/mailman/archives/public/feedback/2002-July/003647.html 
[text/html; x-type=pipermail]
70/74 - /var/spool/mailman/archives/public/feedback/2002-July/author.html is 
Pipermail's index file! skipped.
70/73 - /var/spool/mailman/archives/public/feedback/2002-July/date.html is 
Pipermail's index file! skipped.
70/72 - /var/spool/mailman/archives/public/feedback/2002-July/index.html is 
Pipermail's index file! skipped.
70/71 - /var/spool/mailman/archives/public/feedback/2002-July/subject.html is 
Pipermail's index file! skipped.
70/70 - /var/spool/mailman/archives/public/feedback/2002-July/thread.html is 
Pipermail's index file! skipped.
Wide character in print at /usr/bin/mknmz line 2475.
Wide character in print at /usr/bin/mknmz line 2475.
Wide character in print at /usr/bin/mknmz line 2475.
Wide character in print at /usr/bin/mknmz line 2475.
Wide character in print at /usr/bin/mknmz line 2475.
Wide character in print at /usr/bin/mknmz line 2475.
(tons of these)

I can workaround these by placing use bytes; before and no bytes after print 
in 2475 line (and other lines where this occurs).

Finally when I have all indexes:
[Base]
Date:                Sat Jun 26 23:37:08 2004
Added Documents:     3,730
Size (bytes):        11,472,325
Total Documents:     3,730
Added Keywords:      84,273
Total Keywords:      84,273
Wakati:              module_kakasi -ieuc -oeuc -w
Time (sec):          77
File/Sec:            48.44
System:              linux
Perl:                5.008004
Namazu:              2.0.13

it doesn't find anything from them:
root(_at_)anduril /root]# namazu pld
Results:

References:  [ pld: 0 ]

No document matching your query.
[root(_at_)anduril /root]# namazu linux
Results:

References:  [ linux: 0 ]

No document matching your query.

[root(_at_)anduril /root]# namazu -C
Loaded rcfile: /etc/namazu/namazurc
--
Index:        /var/lib/namazu/index
Logging:      off
Lang:         C
Scoring:      tfidf
Template:     /var/lib/namazu/index
MaxHit:       10000
MaxMatch:     1000
EmphasisTags: <strong class="keyword">  </strong>
Replace: /var/spool/mailman/archives/private/   
http://lists.pld-linux.org/pipermail/

[root(_at_)anduril /root]# mknmz -C
Loaded rcfile: /etc/namazu/mknmzrc
System: linux
Namazu: 2.0.13
Perl: 5.008004
File-MMagic: 1.22
NKF: module_nkf
KAKASI: module_kakasi -ieuc -oeuc -w
ChaSen: module_chasen -j -F '%m '
Wakati: module_kakasi -ieuc -oeuc -w
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types:   (18)
Unsupported media types: (16) marked with minus (-) probably missing 
application in your $path.
- application/excel: excel.pl
  application/ichitaro5: taro56.pl
  application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
  application/macbinary: macbinary.pl
- application/msword: msword.pl
- application/pdf: pdf.pl
- application/postscript: postscript.pl
- application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
- application/vnd.sun.xml.calc: ooo.pl
- application/vnd.sun.xml.draw: ooo.pl
- application/vnd.sun.xml.impress: ooo.pl
- application/vnd.sun.xml.writer: ooo.pl
  application/x-apache-cache: apachecache.pl
  application/x-bzip2: bzip2.pl
  application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
  application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
  application/x-rpm: rpm.pl
- application/x-tex: tex.pl
- audio/mpeg: mp3.pl
  message/news: mailnews.pl
  message/rfc822: mailnews.pl
  text/hnf: hnf.pl
  text/html: html.pl
  text/html; x-type=mhonarc: mhonarc.pl
  text/html; x-type=pipermail: pipermail.pl
  text/plain
  text/plain; x-type=rfc: rfc.pl
  text/x-hdml: hdml.pl
  text/x-roff: man.pl

I'm also using pipermail.pl filter from http://mm.tkikuchi.net/pipermail.pl - 
it has same license as filters in namazu tarball so it would be nice if it 
also made into official namazu tarball.

There is also patch that fixes German support:
http://cvs.pld-linux.org/cgi-bin/cvsweb/SOURCES/namazu-de.patch?rev=1.1

-- 
Arkadiusz Mi?kiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux
<Prev in Thread] Current Thread [Next in Thread>
  • [Namazu-users-en] 2.0.13 and wide characters problem + index exists but no results problem, Arkadiusz Miskiewicz <=