namazu-users-en
[Top] [All Lists]

[namazu-users-en] Re: Hello..

2000-03-17 09:29:23
Peter Marelas <maral(_at_)phase-one(_dot_)com(_dot_)au> wrote:

Thank you for your information.  It sounds great.  Since
Namazu's indexer called mknmz is written in perl, indexing
takes rather a long time.  Ryuji Abe has a plan to rewrite
mknmz with C. It would be great if we can employ mifluz to
the task.

Certainly mifluz is up to the task. You may have read already
mifluz is designed to index a large (+10 million) number of words.
Mifluz relies on a modified version of Berkeley DB B+Tree's
(we added on compression) for storing its index. The structure
employed makes updates very fast. There is some work going on
to improve the structure.

Speaking of Namazu, as README says "for a small or medium
scale Web search engine", Namazu's is not designed to index
a large number of documents.  As far as I know, the largest
Namazu index ever made is as follows:

  Documents:  878,914 files
  Total size: 2,167,480,108 bytes

On the other hand, mifluz Web site says:

<http://www.senga.org/mifluz/html/description.html>
|   mifluz has been designed with the further upper limits in mind : 500
|   million documents, 50 giga words, 20 million document updates per day.

It is terrific!


I would be interested if the persons that designed namazu's
index structure, critisized the mifluz structure. As the
structure is the key to fast updates and query performance.

I am the designer of Namazu's index structure.  The
structure is a very simple inverted index.  It is easy to
implement both indexer and search engine, but it is not fast
to update.  See the following page for details.
<http://www.namazu.org/doc/nmz.html.en>

I just printed out mifluz.texinfo and read it.  I notice
that it is really a high-performance library.  But at the
moment, I don't know whether or not it is good to employ
mifluz for Namazu.

Since Namazu is an easy-to-use search system, features which
mifluz provides are perhaps too much.  We mainly uses Namazu
for an intranet or personal use.  In my opinion, the latter
will becomes more important because people gets a number of
emails nowadays.  That's why Namazu emphasizes mail/news and
MHonArc support.

For the present, we Namazu project are concentrating on
development of Namazu 2.x.  TODOs are:

  * Support index compression with zlib.
  * Improve index merging.  O(n^2) -> O(n log n)
  * Rewrite query operations with lex and yacc.
  * Make source codes clear.  Throw legacy codes away.

When above TODOs are completed, we will change over to 3.0
development and decide employment of mifluz.  I hope
mifluz's APIs will be fixed and well documented at that
time. :-)

-- Satoru Takabayashi


<Prev in Thread] Current Thread [Next in Thread>