procmail
[Top] [All Lists]

Procmail and indexing

1998-03-11 16:13:04
Howdy folks,

This question is really only peripherally related to procmail, so if you
know of a better forum for it, please let me know.

As an admin for a fairly large number of machines, I make heavy use of
procmail to sort incoming mail into some sort of manageable hierarchy. 
With the addition of various mailing lists, this comes to 161 directories,
and close to 60MB of information in 10,000 files. 

Trying to find a particular piece of information in all this can sometimes
be tedious.

At the moment, I'm using 'glimpse'[1] to index the lot.  While this is a
reasonably effective solution, it has some disadvantages -- primarily the
fact that glimpse doesn't understand the format of email messages.

I've been looking for an indexing program that is either (a) targeted at
mail indexing, or (b) allows the user to flexibly define the structure of
documents.  My goal is to be able to restrict searches to specific
headers; something along the lines of:

        (SUBJECT "procmail" and FROM "lars") or BODY "index"

Glimpse supports a form of structured queries, but rather than conforming
to the stucture of your documents, it makes the documents conform to a
specific format, so that's not much help.

'swish'[2] almost does what I want, but for HTML files.  Additionally, it
seems to be far slower than glimpse at indexing everything.

Has anyone out there tackled a similar problem?  Alternatively, do you
have any pointers for more information?

Thanks,

  -- Lars

References:

[1] Glimpse home page: http://glimpse.cs.arizona.edu/index.html
[2] Swish-E home page: http://sunsite.berkeley.edu/SWISH-E/

--
Lars Kellogg-Stedman * lars(_at_)bu(_dot_)edu * (617)353-8277
Office of Information Technology, Boston University


<Prev in Thread] Current Thread [Next in Thread>