Re: Using a SQL Database

2005-05-24 10:57:35
On May 23, 2005 at 22:52, East Coast Coder wrote:

After much thought, it seems my needs of 1) ThreadID's and 2) High
performance, high capacity archives call for use of a regular SQL
database, rather than messages.db.

I'd still like to use Mhonarc to parse the mbox, and to parse the
individual messages and convert them to HTML.  Just I'd like to be
able to hook it up to a SQL database (at least for the threadid,
subject, and references - possibly for the to: and from: also).

The least intrusive way is to utilize MHonArc's callback API to
get the information you need to load into your custom database.
For example, you can set the $mhonarc::CBMessageHeadRead callback
and extract the information you want to load into your database.
The API is described in the documenation in one of the appendix

Some problems may arise depending on what you ultimately want to do.
For example, data in the .mhonarc.db file is needed for numerous
resource variables expansion.  Therefore, if you disable the writing
of the .mhonarc.db file, many resource variables will not work.
Depending on your needs, you either customize page layout resource
to not require them (which means you would have to generate your own
index pages and nav links) or register a resource variable callback
(see API docs) to handle the expansion of resource variables (which
is probably a cumbersome task).

I have put some recent thought in what it would take to replace the
flat-file database file with something like Berkeley DB, but such
effort ripples through much of mhonarc's code base.  Berkeley DB will
allow for scability, but much code will have to be changed (I think
doing tie tricks will not be sufficient).

If you want to minimize work, you could take the following approach:

* Use the callback API to load key information into your SQL database,
  as noted above.

* Use period-based archives (e.g. monthly) to keep archive updates
  manageable.  This is how mharc works, and it is what other
  users have done to make their archives scalable.  The period
  archives are sufficient for date-based navigation since the
  boundaries do not matter (mharc uses a simple CGI program to provide
  nav links between periods).

* Customize page layout resources that provide navigational links
  based on your SQL data.  Since threading is the big issue, you
  can do something like the following:

  - Disable thread index generation in mhonarc.  Mainly because threads
    will be "broken" at period boundaries.  This is doable via a 
    resource setting.

  - Remove mhonarc's built-in thread nav links in message pages (same
    reason as previous item).  This is doable via resource settings.

  - Create your own custom thread indexes based upon your database
    data.  These indexes may be generated via CGI/dynamic programs
    that query your database at runtime.

  - Create your own thread nav links in messages pages based upon your
    own database data.  You can modify the page layout resource to
    include markup to a CGI (or similiar) URL that determines next and
    previous items by thread and/or view entire thread summary.

A subtle technical issue is resolving your database data to the
correct message file.  I.e.  In your threading code, you need to
know how to map to the correct filename/URL, and your code needs
to deal with the fact that message files may span multiple directories
(due to the period-based archive layout).

In your callback, you can use the $mhonarc::OUTDIR variable to get
the path specified to mhonarc when invoked.  Therefore, when calling
mhonarc, make sure OUTDIR is set to a value that you can map to
valid URLs.

As for the base filename of the message, you will need to the message
number mhonarc will assign the message (remember, message file names
are based upon the message number).  Unfortunately, the message number
is not explicitly provided to you when $mhonarc::CBMessageHeadRead
is invoked.  To get the assigned message number, you can use
the following in your callback routine:

  $msg_num = $mhonarc::LastMsgNum+1;

Assuming you are using the default resource values for message prefix
and suffix, you can get the filename with:

  $msg_base_filename = sprintf('msg%05d.html', $msg_num);
The above approach does not remove the use of the .mhonarc.db files,
but it should deal with the scalability problem along with addressing
the functionality you desire.

If the above approach is not sufficient for your needs, then a more
detailed technical discussion is required, along with the consideration
of a major redesign/upgrade of the mhonarc code base.


To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the

<Prev in Thread] Current Thread [Next in Thread>