mharc-users

Re: editing archives

2003-04-21 11:03:51
On April 21, 2003 at 12:40, Scott Lipcon wrote:

I run an archive for exmh using mharc/mhonarc -
http://mercea.net/~exmh/  and I like it very much.  I received an
email from a company that would like me to remove references to their
domain from the archive, for spam prevention purposes.   A quick
search shows only 3 affected messages, all from 1995.  

I've enabled SPAMMODE in mhonarc, and that works fine for new mail.

The original raw message link is still enabled, basically defeating
setting SPAMMODE.  In your <mharc-root>/lib/lists.def file, you
need to add the following option for each list:

    No-Raw-Link: 1

To disable the "[Original]" link on each message page.  Have
a look at the <mharc-root>/lib/mrc/_nospam.mrc resource file on
how to remove the [Original] link from appearing on message pages.

Ideally, I'd like a way to do one of the following:
- run spammode type filter over the entire archive, without trashing
  the archive and re-importing all the mboxes (I have them, but it
  would take hours to organize over 8 years of mbox folders for 3 lists) 
- edit the individual html files for those three messages and manually
  xxxx out the domains in question.
- delete the three messages entirely.

I tried to use mhonarc -rmm on the message id from within the yyyy-mm
directory, and it told me it couldn't get the lock - that doesn't make a
lot of sense, since as I said, the message was from 1995, so nothing
else should be locking that archive.

Running mhonarc manually does not work well since you would need to
specify all the options set by the mharc scripts.  Also, mharc uses
the flock locking method when invoking mhonarc, so you are encountering
a locking method conflict.

As for addressing your problem, the quickest hack would to manually
edit the messages in question since there are only 3 of them.

If you want to delete the messages, you would have to delete them
from the raw mailbox files along with removing them from HTML archives.
Unfortunately, mharc provides no simple method to syncronize message
deletions in an efficient matter (its on the TODO list).  The simple
manual approach would be to delete the messages from the raw mailbox
files and then do a 'make rebuild' to recreate the HTML archives
from scratch.  If you want SPAMMODE to apply to all older existing
messages, along with message headers and bodies, a rebuild is
unavoidable.

Resource usage can be mitigated during the rebuild by invoking
the web-archive script directly vs calling 'make rebuild'.  By calling
web-archive directly, you can rebuild the HTML archives but leave
the search index in-place:

    <mharc-root>/bin/web-archive -rebuild -keepsearch

Note, if you delete the messages from the raw mailbox files, you
will need to rebuild the search indexes.

Before doing a rebuild by calling web-archive directly, you can
run 'make disable' first to disable the cron scripts from processing
any mail during the rebuild.  Once the rebuild is done, you can
run 'make enable'.

NOTE: If using SPAMMODE, you should remove the "From [<username>]"
link below the subject header on each message page.

Remember, after making any changes to lists.def, make sure to invoke
'make'.  Anytime you make changes to a .in file, make sure to invoke
'make configure'.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS

<Prev in Thread] Current Thread [Next in Thread>