Re: conversion question

2003-07-24 22:58:52
On July 24, 2003 at 18:08, "Scott Noone" wrote:

      I have a directory full of messages from an NNTP server, each file 
corresponding to one post. What I need to do is generate one HTML file per 
thread, ending up with something that looks like what you get in Google 
Groups. This is for a specific purpose and not something that readers of the 
archive are ever going to see, so I'm not worried about navigation or making 
it pretty. I have a plan on how to do it already, but it seems like overkill 
and I feel like I'm missing something obvious. Ideas?

I actually did a custom contract job to develop a program that does
exactly this.  Of course, there is more than one way to do it, and
the solution to go with depends on how much you know about some of
MHonArc internals.

If you know nothing, you can always do a post-processing step on the
files themselves to generating your thread pages (in the program I
did, I called them discussion pages to avoid confusion with thread
index pages).  The easiest approach I can think of is to utilize
the OTHERINDEXES resource to create a special file that lists out
all the threads in a format that is easily parsable to faciliate
post processing.

The alternatives is to utilize some of the internals of MHonArc for
better performance (along with custom resource settings).  My first
idea was to use SSIs.  Each message page layout would be configured
to be included via an SSI.  Then a post-processing step would
create the discussion pages and used an SSI for each message of
the thread.  Therefore, the HTTP server would generate the complete
page when requested.  This is basically an extension of the
the blog.mrc example provided in the MHonArc docs.

However, the client wanted both the normal singe message pages along
with discussion pages and was concerned about the overhead of SSI
processing.  Therefore, the post-processing step would extract the
"meat" of each message of a thread to generate the discussion page.
I used page layout resources to set markers that deliminate what the
script should extract.

I used some of MHonArc internals to quickly walk the threads
to generate the discussion pages vs having to peek at a bunch of
message pages.  Also, by using the internals, I was able to optimize
discussion page updates by only updating those pages that needed to
be updated when messages are added vs blindly creating all discussion
pages each time.

BTW, things like navigation were important, so the script provided
customizations features so navigational links can be included.


To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the

<Prev in Thread] Current Thread [Next in Thread>