In message <199512262153(_dot_)PAA05415(_at_)imagine(_dot_)convex(_dot_)com>,
Earl Hood writes:
I've been planning to add something like this mhonarc. The failing of
mhonarc, hypermail, and similiar programs is that they are not well
suited for large archives. Allowing the user to specify the message
links, one can utilize a known database system for retrieval and use
mhonarc just as a dynamic message->html filter.
Sure. I've been considering several storage mechanisms:
1. MH folders
2. berkeley mail files
3. a berkeley mail file, plus an "index" file of just
the headers (stripping out Received and such),
and a DBM index that maps message-ids to offsets
in the raw file and the index file
5. sybase, oracle, ...
For searching, grep on #3 should be mighty fast. Appending is fast.
Rebuilding the database is fast.
A relational interface can be implemented on top of any of those,
with various tradeoffs. One reason why I starded this discussion
was to discuss that interface. By the way... does anybody have
details of the M$ ODBC API handy?
It might be something like:
def select(self, query_clauses, sort_keys, target): ...
def insert(self, fields): ...
def update(self, clauses, fields): ...
defe delete(self, clauses): ...
def addMessage(self, message_stream): ...
Then you need the auxiliary tables for building links: references,
in-reply-to, heuristic subject-based threads. Converting message/rfc822
to HTML is almost completely separate issue.
def message2html(message_stream, linkbase): ...
2. Support format negotiation. Make the original message/rfc822 data
available as well as the enhanced-with-links html format -- at the
same address. This _should_ allow clients to treat the message as a
message, i.e. reply to it, etc. by specifying:
A reasonable request. Will be very useful when clients can process
MIME data correctly.
That reminds me: we should be sure that the HTML has a link to the original
<link rel=enhancement href="mid:2o3423o4u2o3i4u2o34(_at_)foo(_dot_)com">
As for search engines, those can be hooked in independently; which some
have done with mhonarc. It is a waste of my time, and probably other
developers of mail processors, to write search engines when one can
already utilize well developed ones like Lycos, Glimpse, etc.
For full-text searching, this is true. But I was talking about
4. Allow relational queries: by date, author, subject, message-id,
keywords, or any combination. Essentially, treat the archive as a
relational database table with fields message-id, from, date, subject,
keywords, and body.
This is best done by utilizing an existing database system (eg Oracle),
and using mhonarc (or other prefered mail->html filter) to convert
retrievied messages to html on-the-fly.
You can do a pretty good imitation of oracle with flat-files if you
know what the queries are likely to look like. See the msgarchive.py
stuff in the grail sources, for example.
Update the index in real-time, as messages arrive, not in batch.
Right. MHonarc already allows this.
However, I see many of the tasks can be done by a collection of tools
and not a single tool.
Whatever. I just stated my requirements. I'm pleased with the discussion
Trying to develop a single software program to
do everything maybe wasted effort, and it does not make the best use of
existing software that can do the job better (ie. I'm lazy and do not
want to reinvent the wheel :-).
Right. Reusable software is good. But unix pipes are not the building
blocks I'm interested any more. I'm interested in objects, modules,
As long as mhonarc can be
invoked just as a message/rfc822->HTML converter, then others have the
ability to use that capability in whatever WWW mail archiving system
that suits their needs.
As I said: I like the way mhonarc does a lot of things. I just don't
like the API to it (unix pipes).
I'd like to remind people that many of the WWW tools/filters people use
are developed on various individuals spare-time. As one's problem
become more sophisticated, one should not hold his/her breath waiting
for a free, ready-made, solution. Many times it will take the
integration of several programs to come up with the desired solution
because free software developers cannot solve everyone's problems. The
solution to Dan's problem may be best be solved by an intelligent
integration of several programs and not a single program.
The solution might also come from a collaboration between the folks
in this forum, and other forums. I'm hardly waiting for a ready-made
solution. I've spent quite a bit of time poring over the available
tools and starting to develop new ones. But as long as I'm developing
something, I'd like to get ideas from other folks who have been down
And I'd like to see mhonarc, hypernews, HURL, and such tools converge
and share code.
I'd also like to see it form the basis of a _better_ communications
facility, perhaps built on something like KQML.