On May 31, 2001 at 20:54, J C Lawrence wrote:
Good point. As the ultimate goal is to shove the entire message
base into an SQL DB (I've got users begging for things like
thread-bounded searches and the ability to gen meta views of an
archive), I'll probably head that way.
While its a gruesome hack, I'm ultimately looking to use MHonArc as
a front end processor which writes scripts as output which are then
executed to input the message and all its particulars inputs into an
SQL DB. What I haven't figured out yet is how to properly extract
the thread linkings for input into the DB, as well as how to
effectively (ie scalably) provide the thread database to MHonArc
when archiving a message (we're talking hundreds of thousands of
messages, possibly small order millions).
Its on my TODO list to allow callback hooks during MHonArc processing.
The problem is that to allow a decent callback API, some of the internal
functions need changing. Something for probably a 2.5 release (whenever
With a hook, you can store the message-ids and references/in-reply-to
data in a DB, and then compute the threads from that. This is
basically what MHonArc does.
At that point my main interests in MHonArc are its excellant MIME
and charset handling (damned fine job BTW). I'd like to also use it
to build the thread graph rather than dynamically building it off
the References/In-Reply-To headers dynamically as MHonArc properly
handles the matching-subject thread hits.
With the current code base, you can access the thread listing order.
There are multiple approaches, but one is creating a custom mhonarc
that does a dump of thread data after an archive update in some format
you need. Two main variables are created when generating the thread
data: @TListOrder and %Index2TLoc. The first is a list of message
indexes in the order to be rendered on a thread index page. The
second is a hash that maps a message index the ordinal thread index
position (useful in resource variable resolution).
Also generated is the %ThreadLevel hash. This maps a message index
to the thread depth of the message. A depth of 0 means it is a
root-level message. Therefore, with @TListOrder and %ThreadLevel one
can infer the thread tree structure.
These structures are a sequential way of representing message threads,
but is conduscive to generating the HTML thread index pages since
that is done in a sequential manner. Also, in Perl 4 days, doing
complex tree structures was a non-trivial task.
BTW, the following is a snippet from mhinit.pl:
## Following variables used in thread computation
@ThreadList = (); # List of messages visible in thread index
= (); # List of messages not visible in index
%HasRef = (); # Flags if message has references (Keys = indexes)
# (Values = reference message indexes)
%HasRefDepth = (); # Depth of reference from HasRef value
%Replies = (); # Msg-ids of explicit replies (Keys = indexes)
%SReplies = (); # Msg-ids of subject-based replies (Keys = indexes)
%TVisible = (); # Message visible in thread index (Keys = indexes)
$DoMissingMsgs = 0; # Flag is missing messages should be noted in index
Unfortunately, my memory needs refreshing on all the threading stuff,
so I'm probably forgetting something. The multi-page index support
does complicate some of the stuff (hence the visible/non-visible