namazu-users-en
[Top] [All Lists]

Forward: NOT MEMBER article from ot(_at_)w3(_dot_)org (namazu-users-en ML)

2003-10-16 20:06:23
The following mail was rejected, so I'll forward.

--- Begin Message ---
NOT MEMBER article from ot(_at_)w3(_dot_)org


Original mail as follows:

   From owner-namazu-users-en(_at_)karin(_dot_)namazu(_dot_)org  Thu Oct 16 
16:52:15 2003
   Return-Path: <owner-namazu-users-en(_at_)karin(_dot_)namazu(_dot_)org>
   Delivered-To: namazu-users-en(_at_)namazu(_dot_)org
   Received: from toro.w3.mag.keio.ac.jp (toro.w3.mag.keio.ac.jp 
[133.27.228.201])
        by karin.namazu.org (Postfix) with ESMTP id 4210EF861
        for <namazu-users-en(_at_)namazu(_dot_)org>; Thu, 16 Oct 2003 16:52:15 
+0900 (JST)
   Received: from w3.org (navi.w3.mag.keio.ac.jp [133.27.228.212])
        by toro.w3.mag.keio.ac.jp (Postfix) with ESMTP id 33E6AA8E
        for <namazu-users-en(_at_)namazu(_dot_)org>; Thu, 16 Oct 2003 16:52:12 
+0900 (JST)
   Date: Thu, 16 Oct 2003 16:52:11 +0900
   Mime-Version: 1.0 (Apple Message framework v552)
   Content-Type: text/plain; charset=US-ASCII; format=flowed
   Subject: namazu 2.0.12 massively dumping keywords from index
   From: Olivier Thereaux <ot(_at_)w3(_dot_)org>
   To: namazu-users-en(_at_)namazu(_dot_)org
   Content-Transfer-Encoding: 7bit
   Message-Id: <A6054190-FFAD-11D7-B8D3-000393A63FC8(_at_)w3(_dot_)org>
   X-Mailer: Apple Mail (2.552)
   
   Greetings.
   
   Here is a puzzling case for your consideration. Hopefully among namazu 
   users and developers on this list this may have happened before, and I 
   would appreciate any input.
   
   Now for the story:
   
   The users of my namazu-based system started complaining recently that 
   for the main (big) indexes namazu does not seem to find results beyond 
   a few days ago, whereas the system indexes documents dating from now 
   to... 1994.
   
   A quick look at the MNZ.log file for these indexes show something very 
   strange... Namazu is apparently getting rid of keywords.
   
    > grep "Added Keywords:" NMZ.log | tail -20
   Added Keywords:      160
   Added Keywords:      58
   Added Keywords:      95
   Added Keywords:      331
   Added Keywords:      286
   Added Keywords:      30
   Added Keywords:      -105,957
   Added Keywords:      545
   Added Keywords:      552
   Added Keywords:      176
   Added Keywords:      -215,331
   Added Keywords:      1,175
   Added Keywords:      1,300
   Added Keywords:      958
   Added Keywords:      -120,305
   Added Keywords:      1,965
   Added Keywords:      -1,017,652
   Added Keywords:      1,521
   Added Keywords:      2,287
   Added Keywords:      11,221
   
   according to my logs, it started there, for apparently (?) no reason:
   
   [Append]
   Date:                Fri Oct 10 20:42:01 2003
   Added Documents:     84
   Updated Documents:   2
   Size (bytes):        345,675
   Total Documents:     366,050
   Added Keywords:      -362,979
   Total Keywords:      5,447,163
   Wakati:              module_kakasi -ieuc -oeuc -w
   Time (sec):          1,429
   File/Sec:            0.06
   System:              linux
   Perl:                5.006001
   Namazu:              2.0.12
   
   That's right, adding 84 documents, removing over 300000 keywords.
   
   My first guess that maybe namazu would get rid of very popular keywords 
   in order to improve performance doesn't stand when seeing these insane 
   figures. Maybe it's a bug, then. Is that a known bug? Any idea what I 
   could do? I know my document base is a bit big for namazu2, but I'd 
   rather hear something else than "don't use namazu" since, performance 
   excluded, namazu does everything I need perfectly ;)
   
   Thanks.
   -- 
   olivier
   
   


--- End Message ---
<Prev in Thread] Current Thread [Next in Thread>