Re: Determing Threads

Okay, I'm back to work on this (adding threadid's).

A few questions:
1) Which file/sub does the initial determination if a message is a
response or a new thread?

2) For the threadid, I'd like to store it in MHonArc's own message.db.
 That is, I'd like to be able to use the db so for any messageid, we
can retreive the threadid.  Are there simple routines to add a new
field to the db, and to retreive it given a messageid?  (If not, I can
always use another store, like sqllite or something, but I would not
want to have to do this unless needed).

3) What is the best way to persist the counter?  Again, can the
message.db do this?  Are there any race conditions to be wary of
(doesn't seem like it, since MHonArc works in serial, and locks anyway
- but I want to double check.)

--ECC


On 4/21/05, Earl Hood <earl(_at_)earlhood(_dot_)com> wrote:

On April 19, 2005 at 19:37, East Coast Coder wrote:

Have you looked at the $TSLICE$ resource variable?  I.e.  If you
provide more info on what you are trying to do, there may already
be features that do what you need.


I'm experimenting with a format where, instead of rebuilding the
entire tree, new messages are output incrementally.  I think this
would be necessary for using MHonArc for formats other than HTML
archives (RSS feeds, perhaps, or text messages).  I'd like to be able
to process a mbox, take the new messages, and output them - but
identify each message with a unique id to its thread, so that the
final output device (aggregator, phone, whatever) can associate it.


Mhonarc does not assume that it will process messages in the correct
order.  For example, a follow-up to a message may be processed before
its reference(s).  Also, mhonarc was initially coded in Perl 4,
so leveraging complex data structures was not necessarily trivial.
Therefore, mhonarc just recomputes threads after each update to
an archive.  There may be ways to optimize threading computation
with the existing code base, but I have not bothered to look into it.
Some of the main hash structures have been "Perl 5'ed", but in general,
most of mhonarc's data structures are flat.

As for your immediate need, you can write a wrapper program that
invokes mhonarc for the main work and then does some post-processing
to get what you need.  With the minimal API facilities of mhonarc,
you can determine which messages are new.

You will need to maintain your own thread IDs.  A simple map can
be used to maintain the message to thread ID association.  Thread ID
generation can be a simple counter.

You can check out the mha-preview program in the examples/ directory
of the mhonarc distribution for an example of how to develop a
wrapper program.  Your wrapper will require some more knowledge of
the internals of mhonarc that is not documented in the API appendix
of the docs.

You can examine the library mhinit.pl for a list of the internal
data structures used by mhonarc.  The ones under "Message information
variables" will probably be of the most interest to you.  You can
even examine the .mhonarc.db file of an archive to get a clearer
picture of how the various hashes are structured.

Side Note: I like the idea of having thread IDs.  Something to consider
if/when mhonarc is rewritten, and maybe something possible with the
existing code base.  Mhonarc is an old program and various users are
definitely hitting up against its limitations.  Motivation is my main
enemy in doing a complete re-implementation.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV


---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV