mhonarc-users

cleaning up yahoogroups messages?

2007-02-06 07:49:14
Hi Folks,

I've been trying to migrate a collection of messages from yahoogroups to sympa (which uses mhonarc as it's archiving engine).

There's a great little script, yahoo2mbox, that pulls messages form yahoogroups and aggregates them into an mbox file - ideal for processing by mhonarc.

Unfortunately, when I run mhonarc on the mbox file, it seems to cut out the bodies of a lot of, but not all of the messages - leaving the header intact. It seems like messages that originated with MS Outlook are particularly likely to end up with empty bodies.

Now I've read the archives of this list, and this seems to be a known problem with mhonarc filtering out malformed HTML, but I haven't seen any recent traffic indicating a solution of any sort.

So... has anybody come up with a straightforward way to clean up an mbox file sufficiently for mhonarc to process? (e.g. a way to run the mbox file through HTML Tidy or some such)? Or can anybody offer some suggestions, recipes, recent experiences, etc.?

Thanks much,

Miles Fidelman


<Prev in Thread] Current Thread [Next in Thread>