namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: Single word search not working

2004-11-21 02:26:21
On Sat, Nov 20, 2004 at 05:11:45PM +0000, John Levon wrote:

However, single-word searches on this index often fail.  If I restrict

some testing (printing out exhaustively every entry in nmz_binsearch())
has shown that sometimes the word is in the index but missed by  the
search, and sometimes not present at all. For example:

$ grep ^the$ /archives/html/NMZ.w
the
$ ./src/namazu -d the /archives/html/ >a 2>&1
$ grep :the: a
at 60653:1030</errorcode><errmsg>95'smi-00114:the:
at 61243:106</errorcode><errmsg>smi-00114:the:
at 253524:the:
at 260503:the:
at 266275:the:
at 270982:the:
at 272918:the:
at 272986:the:
at 273348:the:
at 274611:the:
at 289542:e:the:
$ tail -25 a
namazu(debug): searching 179592: s
namazu(debug): searching 269388: les
namazu(debug): searching 314286: dterm
namazu(debug): searching 336735: /rpc,
namazu(debug): searching 347960: _exit
namazu(debug): searching 353572: ishes
namazu(debug): searching 356378: n.com
namazu(debug): searching 357781: e:fs>
namazu(debug): searching 358483: ldirs
namazu(debug): searching 358834: 75
namazu(debug): searching 359009:
namazu(debug): searching 359097: done;
namazu(debug): searching 359141: ~~~~~
namazu(debug): searching 359119: ~~~~~
namazu(debug): searching 359108: only;
namazu(debug): searching 359113: first
namazu(debug): searching 359116: ~~~~~
namazu(debug): searching 359114: splay
namazu(debug): searching 359115: y
hlist.stat = 0
Results:

References:  [ the: 0 ]

No document matching your query.

$ grep ':[st]' a | more
...
at 179955:s:
at 179956:t:
at 179958:s:
...

Array is unsorted! But NMZ.w is sorted (apart from some binary junk)

Or:

$ grep ^oprofile$ /archives/html/NMZ.w
oprofile
$ grep :oprofile: a
$

So it seems there are two serious bugs here: the index array is not
sorted correctly, so the binary search cannot work, and the index is
missing items anyway. The debug changes were :

    943     for (x = l; x < r; ++x) {
    944         fseek(Nmz.w, nmz_getidxptr(Nmz.wi, x), 0);
    945         fgets(term, BUFSIZE - 1, Nmz.w);
    946         nmz_chomp(term);
    947         printf("at %d:%s:\n", x, term);
    948     }
    949

...

    958         nmz_debug_printf("searching %d: %s", x, term);

...

regards
john
<Prev in Thread] Current Thread [Next in Thread>