namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: Problems with mknmz and Perl 5.8.6

2005-06-11 12:39:35
On June 11, 2005 at 13:11, Tadamasa Teranishi wrote:

Perhaps, NMZ.field.subject.i is broken. 
What is the version of Namazu used?

2.0.14.

Also, the "Malformed UTF-8 ..." warnings are popping up, regardles
of what LANG or LC_ALL are set to.  I had to add a 'use bytes' pragma
to mailnews.pl at line 212 to get rid of the warnings.

Please try. 

$ env LC_ALL=C mknmz ...

I have, in a myriad of ways.  I just recreated things on one of my
local systems to make analysis easier.

I've made available of the command used and the output of a
stock namazu 2.0.14 installation available for your examination at
<http://www.mhonarc.org/tmp/mknmz-out.txt.gz>.  I.e. No modifications
to namazu code is done, so the many "malformed utf-8 ..." messages
are provided.  Perl also complains about wide characters in print.

I've also made available the input files and NMZ.* files at
the following locations:
<http://www.mhonarc.org/tmp/namazu-users-en_NMZ_files.tar.gz>
<http://www.mhonarc.org/tmp/namazu-users-en_input_files.tar.gz>.

The following is version information from mknmz:

mknmz -C
Namazu: 2.0.14
Perl: 5.008006
File-MMagic: 1.20
NKF: /usr/bin/nkf
KAKASI: no
ChaSen: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /usr/local/etc/namazu
LIBDIR: /usr/local/share/namazu/pl
FILTERDIR: /usr/local/share/namazu/filter
TEMPLATEDIR: /usr/local/share/namazu/template
Supported media types:   (23)
Unsupported media types: (10) marked with minus (-) probably missing application
 in your $path.
- application/excel: excel.pl
  application/ichitaro5: taro56.pl
  application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
  application/macbinary: macbinary.pl
  application/msword: msword.pl
- application/pdf: pdf.pl
  application/postscript: postscript.pl
- application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
  application/vnd.sun.xml.calc: ooo.pl
  application/vnd.sun.xml.draw: ooo.pl
  application/vnd.sun.xml.impress: ooo.pl
  application/vnd.sun.xml.writer: ooo.pl
  application/x-apache-cache: apachecache.pl
  application/x-bzip2: bzip2.pl
  application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
  application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
  application/x-rpm: rpm.pl
- application/x-tex: tex.pl
- audio/mpeg: mp3.pl
  message/news: mailnews.pl
  message/rfc822: mailnews.pl
  text/hnf: hnf.pl
  text/html: html.pl
  text/html; x-type=mhonarc: mhonarc.pl
  text/plain
  text/plain; x-type=rfc: rfc.pl
  text/x-hdml: hdml.pl
  text/x-roff: man.pl

The following is the output of doing a search via `namazu' from the
command-line:

  namazu -s -n 3 -f cgi-bin/.namazurc '+from:earl' \
         ~/archive/html/namazu-users-en
  Results:

  References:  [ +from:earl: 49 ] 

   Total 49 documents matching your query.

  1. er things I want hidden (score: 1)
  /~listsarc/archive/html/namazu-users-en/2004-09/msg00005.html (8,178 bytes)

  2. g indexing (score: 1)
  /~listsarc/archive/html/namazu-users-en/2004-05/msg00011.html (7,732 bytes)

  3. med UTF-8 character ... (score: 1)
  /~listsarc/archive/html/namazu-users-en/2004-05/msg00004.html (8,738 bytes)

  Current List: 1 - 3


Notice how the first part of the subject strings are clipped.  Doing
a search for "PHP" provides no hits, which is should.

If you require any other information, I will provide it.

Thanks for your help,

--ewh
-- 
Earl Hood, <earl(_at_)earlhood(_dot_)com>
Web: <http://www.earlhood.com/>
PGP Public Key: <http://www.earlhood.com/gpgpubkey.txt>
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en