mharc-users

[approved] Re: [approved] NEWBE: Immediate rebuilt of archive

2003-05-09 07:51:12
On Thu, 8 May 2003, Earl Hood wrote:

On May 8, 2003 at 14:20, Steffen Kaiser wrote:

The default mhonarc is lacking a nice user interface and breaking the
archive at a monthly period, so picked up mharc, too, ran web-archive and
finally had some nice archive lists.

However, I now have the problem of doing a nice replacement of pipermail,
in the aspects of archiving and cataloguing the mails (what mharc does
nicely), but I have not come to the point, where I can pull MHonArc and
mharc together:

When mailman delivers a mail to the archive, the posts say to call
"mhonarc -add xyz", but in this way I cannot use the mrc files supplied
with mharc. How can I add the arriving post to just the mailing list
archive, which name (and therefore directory) I already know?

You do not want use this approach with mharc.  Mharc is designed to
work independently of the list management software.

The model mharc uses is to have a special user account that you
subscribe to the lists you want archived.  Mharc, via cron jobs, then
processes the incoming mailing and filters it according to the
information your provide in <mharc-root>/lib/lists.def.

Ah, I suspected this behaviour is just _one_ way.
I didn't actually like it, because it requires me to maintain two list
definitions: one in mailman and one for filtering using mharc.

Of course, I understand that using a subscribed user makes you independend
on the mailist software and, furthermore, you needn't run the archiving on
the same host as the list processor.

Now, it is possible to support alternative "input" methods into mharc.
The ORGMAIL <mharc-root>/lib/config.sh allows you to specifying any
"incoming" mailbox file.   Also, there are the mh-month-pack and
mbox-month-pack scripts that can be used.  If using these techniques,
you would not use the read-mail and filter-spool script components, but
just the web-archive script.

OK. I used the web-archive script to built the initial HTML archive (along
with the nice search interface etc.) from the mailboxes created by
mailman.

For your case, you could have a script that mailman invokes for
each message it receives to append a copy to a mailbox file of your
choosing (you can use procmail to insure safe delivery).  Then, set
the ORGMAIL config.sh variable to that spool file (also make sure to
set IS_MAIL_SPOOL to the proper value).  The cron scripts will do
the rest as long as you define lists.def properly.

Do I understood the process correctly that I have to:

a) mimicking file-spool by filing (aka appending) the new message into
~mharc/mbox/listname/$( ~mharc/bin/extract-mesg-date -fmt '%Y-%m' )

in order to create the "raw" mailboxes.
(mharc way: read-mail -> filter-spool -> procmail ->
~mharc/procmailrc.mharc)

Perhaps I can even use the "arriving" (aka current system) time bypassing
extract-mesg-date?

b) and finally run ~mharc/bin/web-archive e.g. once an hour to propagte
the changes made to the raw mailboxes into the HTML archive?

(mharc way: via cron: read-mail -> web-archive )

Do I really need to keep all posts of a mailing list in the flat file UNIX
mailbox and have rebuilt the mharc "view" of this mailbox via cron job?? I
was actually assuming that the mail is in the archive, hence, I do not
need the unix mailbox anymore; and when I loose interest in the posts of
1995, I just remove the subdirs, re-create the search index, and I'm done
with it.

mharc maintains the raw mailbox files to facilitate HTML archive
recovering and rebuilds.  The crontab is set up to gzip compress mailbox
files that have not been touched in a long time.

There are some remarks in the man page of some commands about not using
compressed archives, because they won't support it. The idea is then to
hope that no further mail arrives in this "compressed" period, right?

You are correct that if you do not want the old data, you can remove
the unix mailbox files that you no longer want.

Here you are reffering to the broken up raw mailboxes
~mharc/mbox/listname/period, right?

Many thanks!

-- 

Steffen Kaiser

FH Bonn-Rhein-Sieg        | e-mail: 
Steffen(_dot_)Kaiser(_at_)FH-Rhein-Sieg(_dot_)DE
FB Angewandte Informatik  |
Grantham-Allee 20         | phone : +49 2241/865-203
53757 Sankt Augustin      |
Germany - Deutschland     | fax   : +49 2241/865-8203

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS

<Prev in Thread] Current Thread [Next in Thread>