MHonArc and multi-byte characters in HTML

1998-04-27 22:19:42
I'm using the wilma-1.3 package to convert and index one of my
Majordomo lists.  Things have been running well, but recently the HTML
stripping utility that comes with Wilma (wilma_striphtml) got terribly
stuck on a certain MHonArc v2.2.0 generated message containing
Japanese characters.

The particular line wilma_striphtml sticks on looks like this:

&gt;&gt; &gt; ESC$B<i2,ESC(B ESC$BCNI'ESC(B / MORIOKA Tomohiko ...
See the unescaped open bracket there?  I don't know enough about
encodings to say whether or not the bracket is specifically legal
there or not, but it doesn't look like legal HTML to me.  My
understanding is that the wilma_striphtml program requires legal HTML
for correct operation.

This seems like it may be a MHonArc bug.  Comments?

I've put the text (pre-MHonArc) as well as the HTML version that
MHonArc generated on my website.  Line 56 of the HTML file and line 42
of the text file represent the problem line.



Any help would be appreciated.  Please reply directly to me or CC: me
on replies to the list as I'm not subscribed.  Thanks.

   Jason R. Mastaler                      jason(_at_)mastaler(_dot_)com

