On July 2, 2002 at 09:10, Ben Ocean wrote:
I do not know if LDAP would be efficient for this. If you want
to do fulltext searching, I would not recommend LDAP. What kind
of searching would you like to do?
I presume it's called full text searching. The *standard* kind of searching
one does on any search for discussion lists such as for python.org,
I wanted to be sure since some types of searching could be appropriate
for LDAP. For example, you could store mail header information in
LDAP to provide queries for items like, "give me all messages from
a given author."
I thought LDAP would be appropriate here because the data
But LDAP is not really designed to do full text searching. LDAP's
roots come from X.500 which is basically a standard for providing
distributed directory services (address, organizations, etc).
The directory service is not intended to support transactions or
frequent modifications (but later X.500/LDAP implementations probably
handle data modification fairly efficiently). Read-only-based queries
is where X.500/LDAP is supposed to be very efficient and optimized for.
Are you saying MySQL is more appropriate?
It could be, and some users have requested they would like such
a thing. However, when it comes to full text retrieval, traditional
RDBMS are not as efficient as full text search engines. Reason in
a nutshell: full text search engines index the data into structures
(like hashes) to provide fast query results while RDBMS is basically
doing a fancy grep wrt large text columns (which would be needed
to store message body text).
Companies like Oracle do provide some fill text indexing add-ons
to their RDBMS, but I hear it takes some work to configure and
may not be that mature.
If you have a lot of computing resources, you could dump everything
into a database and it can do all your searches. But it will not
scale well and will definitely not give you the performance of
full text search engines. You would also have to determine what
you want to do with attachments (probably store file references to
them instead of as blobs in the database), and since the text data
of messsage bodies can be large, this could impact how you design
your schema and overall database performance.
Where RDBMS, or LDAP, can be very useful is in meta-based searches.
For example, storing message header information like mentioned above
to allow useful meta-based searches and dynamic archive navigation
capabilites beyond the static ones provided by MHonArc.
In newer versions of MHonArc, a minimal Perl API exists to allow
something like this. The API is documented in an appendix section
of the documentation. In a nutshell, you can create a callback
function to take the message header data obtained from MHonArc,
and store that information into a RDBMS using the Perl DBI modules,
or if you like LDAP, you can use the Perl LDAP modules.
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-USERS