mharc-users

Re: Best practice for removing virus messages...

2003-08-20 11:35:16
FWIW, I figured I'd do some googling for a script dump the messages out
of the Mbox files and came up with this mbox purge script by Roderick
Schertler...

http://www.argon.org/~roderick/mbox-purge

It seems to be fairly handy with some good filtering options...  Here
are some examples from his perdoc:

# Delete old messages from all your folders.
mbox-purge -before 5/1/2000 ~/Mail/*

# Delete messages from April 2000.
mbox-purge -before 5/1/2000 -after 3/31/2000 file

# Delete a chain letter from all user's mailboxes.
mbox-purge \
    -head-pattern '^Subject: (Re: )?GOOD LUCK TOTEM( \(fwd\))?$' \
    /var/spool/mail/*

# Delete messages larger than 1M.
mbox-purge -eval 'length ${ $_[2] } > 1_000_000' file

Unfortunately, it uses some subs he's got put in his own little utility
module which also has to be installed.  It can be found here...

http://www.argon.org/~roderick/RS%3A%3AHandy.pm

Anyway, I tried it against a copy of one mbox file, using the subjects
listed for Sobig here:

http://us.mcafee.com/virusInfo/default.asp?id=description&virus_k=100561

It cut the size of the file from roughly 65MB to 161KB (1753 messages to
38), and the most lengthy part of the process was typing the various
subject lines into the commandline.

On Wed, 2003-08-20 at 13:18, Earl Hood wrote:
On August 20, 2003 at 10:09, "Sean M. Alderman" wrote:

I'm curious what your collective opinions are on the best way to remove
multple virus messages from multiple archives are.  We've had a storm of
the messages from the W32/Sobig(_dot_)1(_at_)MM worm archived over the past 
few day
or so.

My first thought was to write a script to yank the messages out of the
raw mbox files and follow that up with a make rebuild.  I'm not sure how
long that might take, so I thought I'd see if there might be better
alternatives.

Message deletion is a known problem with mharc, and I have been
contemplating on how a deletion system could be implemented, mainly
on how one would designate which messages should be deleted.

For now, things have to be done manually.  You should delete the
messages from the raw mbox files.  The tricky part is what to do
about the HTML archives.

As for a rebuild, it depends on the size of your archives and
the capabilities of your system.  At a minimum, you can delete the
attachment files from the archives so users try not to download them
from the archives.

You could try to get mhonarc to remove the message from the archives
minus doing a rebuild, but it would require to get all the command-line
options right so the archives are updated properly.  This is where
having mharc doing the deletion itself would be helpful.  A possible
hack (in order to avoid a complete rebuild) is to do the following:

  1. Run 'make disable'.  This way no auto-cron-based processing will
     be done while making edits to the archive.

  2. Run 'mhonarc -rmm' directly to each period sub-archive of each
     archive with the viral messages to delete the messages.  Note,
     this will cause some pages to revert to default layout settings,
     but we will fix that in the next step.

  3. Run 'make editidx'.  This will re-edit the pages of each archive
     so the proper layout settings are applied.  You can explicitly
     state which archives to re-edit by doing the following instead:

      <mharc-root>/bin/web-archive -verbose -editidx <arch1> <arch2> ....

    The -verbose will tell you what is going on.

  4. Run 'make enable'.  This will re-enable auto updates to the
     archives.

Editidx can take some time depending on the size of the archive, but
it is faster than doing a complete rebuild.


BTW, on rebuilds, if not all of your archives have been polluted, you
can just rebuild the ones with viral messages.  You can do this by
invoking web-archive script directly:

  <mharc-root>/bin/web-archive -verbose -rebuild <archive1> <archive2> ...


If you are really concerned about the overhead of rebuilds, or
do any of the above, you can try to modify mharc scripts directly
to make things more efficient.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS
--
Sean M. Alderman
ITRACK Systems Analyst
PACE/NCI - NASA Glenn Research Center
(216) 433-2795

The Macintosh is Xerox technology at its best.

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS

<Prev in Thread] Current Thread [Next in Thread>