Greetings,
I've written a search engine for MHonArc archives which is/will soon be
freely available. I'll be sending Earl a copy shortly for possible
inclusion in a future release, but anyone who drops me a note
is welcome to a copy under the same GPL that MHonArc uses.
You're welcome to try out the search engine on any of the lists
archived at http://eee.uci.edu/w3m3/ I'd recommend using
http://eee.uci.edu/w3m3/uci-www/ since you already know what keywords
to use there. :-) For backwards compatibility with
an older (and poorly implemented) search script, the archives that've
been around for a while use <FORM> buttons to initiate a search; newer
archives use a simple hyperlink to accomplish the same thing.
Check out http://eee.uci.edu/flasc-l/ for an example of the old style.
My script allows for Subject, Author, Date, and Full Message Body searches.
Boolean AND, OR, and literal phrase matching are supported, as are
perl5 regular expressions. Regexps shouldn't be necessary for most
users, however, as there are a number of radio-button-controlled options
that will satisfy most people's needs.
Because Glimpse searches against a pre-existing index, it will of course
be faster than marc-search. However, marc-search does have its
advantages:
1) Boolean AND searches in message bodies look into all of the lines in
a message. In Glimpse, all of the words have to be present on the same
line;
2) marc-search lets the user set a ceiling of results to return per
page, and supplies a button with which she can continue searching
from that point on. New searches can also be initiated at any point;
3) Body searches return one line on either side of the line in which
the match occurred, which is usually sufficient context. In AND
searches (where matches might occur at different lines in the file),
the line number is prepended to the result.
4) marc-search has user-oriented documentation on composing effective
searches :-).
Some limitations:
1) You must be running perl5. In fact, it may be that 5.002 is
required; I haven't checked, but since 5.003 is now the standard....
2) Your RCfile must put the following first on each message file:
<!--X-Subject: -->
<!--X-From: -->
<!--X-Date: -->
I don't know if this is something that differs widely for MHonArc
users, so it may not even be an issue.
Again, drop me a note if you want a copy, and thanks to Earl for
MHonArc!
Eric D. Friedman
friedman(_at_)uci(_dot_)edu