MHonArc and multi-byte characters in HTML

I'm using the wilma-1.3 package to convert and index one of my
Majordomo lists.  Things have been running well, but recently the HTML
stripping utility that comes with Wilma (wilma_striphtml) got terribly
stuck on a certain MHonArc v2.2.0 generated message containing
Japanese characters.

The particular line wilma_striphtml sticks on looks like this:

&gt;&gt; &gt; ESC$B<i2,ESC(B ESC$BCNI'ESC(B / MORIOKA Tomohiko ...
                   ^
See the unescaped open bracket there?  I don't know enough about
encodings to say whether or not the bracket is specifically legal
there or not, but it doesn't look like legal HTML to me.  My
understanding is that the wilma_striphtml program requires legal HTML
for correct operation.

This seems like it may be a MHonArc bug.  Comments?

I've put the text (pre-MHonArc) as well as the HTML version that
MHonArc generated on my website.  Line 56 of the HTML file and line 42
of the text file represent the problem line.

TEXT. http://www.mastaler.com/tmp/msg00089.txt

HTML. http://www.mastaler.com/tmp/msg00089.html

Any help would be appreciated.  Please reply directly to me or CC: me
on replies to the list as I'm not subscribed.  Thanks.

   Jason R. Mastaler                      jason(_at_)mastaler(_dot_)com

<Prev in Thread]	Current Thread	[Next in Thread>
MHonArc and multi-byte characters in HTML, Jason R Mastaler <= Re: MHonArc and multi-byte characters in HTML, Koichi Nakatani Re: MHonArc and multi-byte characters in HTML, Earl Hood

Previous by Date:	Re: Not enough memory..., Dataweaver
Next by Date:	Re: Re[2]: Statistics, Chuq Von Rospach
Previous by Thread:	adding new messages, Christopher Adams
Next by Thread:	Re: MHonArc and multi-byte characters in HTML, Koichi Nakatani
Indexes:	[Date] [Thread] [Top] [All Lists]