You could could also start with the 'h2mbx.pl' script
http://www.albany.net/~anthonyw/archivedemo/script.txt
http://www.albany.net/~anthonyw/archivedemo/
and modify it to parse your html files.
On 26 Apr 2000, [ISO-8859-1] François Pinard wrote:
Louis Proyect <lnp3(_at_)columbia(_dot_)edu> writes:
Has anybody written a perl script to convert mhonarc msg html to
standard Internet RSC mailbox format? I want to add old archives to
the mail-archive website, but neglected to save the mailbox data that
created them originally.
I made the following script for one particular case, but since MHonArc
is incredibly configurable, there is little chance for the script to
work generally. But it might help you at getting started, who knows...
To use it, I called a recursive `wget' on the archives, and from within
the directory, did `unmhonarc * > ../FOLDER' to produce a single big FOLDER
containing all the archives. Then, I digested that folder from within Gnus,
and had fun for a good while, sorting out all the information!
The following script is put in an executable file named `unmhonarc',
as you guessed already :-).
#!/usr/bin/env python
# Rebuild simple messages from their HTML expression.
import string, sys
def main(*arguments):
for file in arguments:
sys.stderr.write("Processing %s ...\n" % file)
lines = open(file).readlines()
sys.stdout.write('From nobody(_at_)nowhere Sun Feb 13 06:46:37
2000\n')
for counter in range(len(lines)):
if lines[counter][0:4] == '<li>':
break
write_clean(lines[counter][4:])
counter = counter + 1
write_clean(lines[counter][4:])
counter = counter + 1
write_clean(lines[counter][4:])
counter = counter + 1
sys.stdout.write('Message-Id: <%s(_at_)progiciels-bpi(_dot_)ca>\n' %
file)
sys.stdout.write('\n')
while counter < len(lines):
if lines[counter] == '<PRE>\n':
break
counter = counter + 1
counter = counter + 1
while counter < len(lines):
if lines[counter] == '</PRE>\n':
break
write_clean(lines[counter])
counter = counter + 1
sys.stdout.write('\n')
sys.stdout.write('\n')
def write_clean(line):
line = string.replace(line, '<', '<')
line = string.replace(line, '>', '>')
line = string.replace(line, '&', '&')
sys.stdout.write(line)
if __name__ == '__main__':
apply(main, tuple(sys.argv[1:]))
--
François Pinard http://www.iro.umontreal.ca/~pinard
Regards,
AnthonyW