IEM - network operating center wrote:
debugging output:
running "namazu" from the command-line with debugging options gives me
following ("OSC" is a keyword which pops up every now and then)
%> namazu -c -d --config=pd-list "OSC"
namazu(debug): NAMAZUNORC: ''
namazu(debug): load_rcfile: /etc/namazu/namazurc loaded
namazu(debug): 5: Directive: [Index]
namazu(debug): Argument 1: [/var/lib/namazu/index/pd-list]
namazu(debug): 12: Directive: [Template]
namazu(debug): Argument 1: [/var/lib/namazu/index/pd-list]
namazu(debug): 14: Directive: [Replace]
namazu(debug): Argument 1: [/var/lib/mailman/archives/private]
namazu(debug): Argument 2: [/pipermail]
namazu(debug): 20: Directive: [Logging]
namazu(debug): Argument 1: [on]
namazu(debug): load_rcfile: pd-list loaded
namazu(debug): -n: 20
namazu(debug): -w: 0
namazu(debug): query: [OSC]
namazu(debug): Index name [0]: /var/lib/namazu/index/pd-list
namazu(debug): set_phrase_trick: OSC
namazu(debug): set_regex_trick: OSC
namazu(debug): query.tokennum: 1
namazu(debug): query.tab[0]: OSC
namazu(debug): size of /var/lib/namazu/index/pd-list/NMZ.t: 132748
namazu(debug): before nmz_strlower: [OSC]
namazu(debug): after nmz_strlower: [osc]
namazu(debug): do WORD search
namazu(debug): size of /var/lib/namazu/index/pd-list/NMZ.ii: 1492960
namazu(debug): l:0: !
namazu(debug): r:373239: µÎ¬
namazu(debug): searching: ..)
namazu(debug): searching:
namazu(debug): searching: khz.
...
so after a bit more research i found, that NMZ.ii does not return the
correct offset.
as far as i understand it the search::nmz_binsearch() performs a binary
search of the keyword using NMZ.wi to look up which byte-offset a given
line has in NMZ.w (with each keyord in a separate line)
it first starts with line 186620 [=(373239+1)/2=(r+1)/2] which in fact
contains "clean;" but namazu thinks that it contains "..)"
more research revealed, that the byte-offset returned from NMZ.wi points
into the middle of a line "clean....)"; however, since the so found term
in "..)" the binary search miserably fails.
i guess it is a problem with some multi-byte characters.
(which reminds me that when i build the index i get some warnings:
"Wide character in print at /usr/bin/mknmz line 2447, <GEN7162> line
158600.")
any hints how i should proceed?
mfg.asdr
IOhannes
mfg.asdr.
IOhannes
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en