mhonarc-users

Re: Converting html back to mailbox format?

2000-04-28 14:08:12

You could could also start with the 'h2mbx.pl' script 

  http://www.albany.net/~anthonyw/archivedemo/script.txt

  http://www.albany.net/~anthonyw/archivedemo/


and modify it to parse your html files.


On 26 Apr 2000, [ISO-8859-1] François Pinard wrote:

Louis Proyect <lnp3(_at_)columbia(_dot_)edu> writes:

Has anybody written a perl script to convert mhonarc msg html to
standard Internet RSC mailbox format?  I want to add old archives to
the mail-archive website, but neglected to save the mailbox data that
created them originally.

I made the following script for one particular case, but since MHonArc
is incredibly configurable, there is little chance for the script to
work generally.  But it might help you at getting started, who knows...

To use it, I called a recursive `wget' on the archives, and from within
the directory, did `unmhonarc * > ../FOLDER' to produce a single big FOLDER
containing all the archives.  Then, I digested that folder from within Gnus,
and had fun for a good while, sorting out all the information!

The following script is put in an executable file named `unmhonarc',
as you guessed already :-).


#!/usr/bin/env python
# Rebuild simple messages from their HTML expression.

import string, sys

def main(*arguments):
    for file in arguments:
        sys.stderr.write("Processing %s ...\n" % file)
        lines = open(file).readlines()
        sys.stdout.write('From nobody(_at_)nowhere  Sun Feb 13 06:46:37 
2000\n')
        for counter in range(len(lines)):
            if lines[counter][0:4] == '<li>':
                break
        write_clean(lines[counter][4:])
        counter = counter + 1
        write_clean(lines[counter][4:])
        counter = counter + 1
        write_clean(lines[counter][4:])
        counter = counter + 1
        sys.stdout.write('Message-Id: <%s(_at_)progiciels-bpi(_dot_)ca>\n' % 
file)
        sys.stdout.write('\n')
        while counter < len(lines):
            if lines[counter] == '<PRE>\n':
                break
            counter = counter + 1
        counter = counter + 1
        while counter < len(lines):
            if lines[counter] == '</PRE>\n':
                break
            write_clean(lines[counter])
            counter = counter + 1
        sys.stdout.write('\n')
        sys.stdout.write('\n')

def write_clean(line):
    line = string.replace(line, '&lt;', '<')
    line = string.replace(line, '&gt;', '>')
    line = string.replace(line, '&amp;', '&')
    sys.stdout.write(line)

if __name__ == '__main__':
    apply(main, tuple(sys.argv[1:]))

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard




Regards, 

AnthonyW

<Prev in Thread] Current Thread [Next in Thread>