Re: XML, control characters and MHonArc

2007-10-05 15:23:40
On October 5, 2007 at 08:45, Chris Hastie wrote:

Mostly this is working fine, but I have the occasional problem with
control characters in badly formatted emails. Specifically, a QP email
with the string =12 - MHonArc outputs the associated control character
to the XML. These characters are not valid in XML and the XML parser
chokes on them.

Have you tried out the TEXTENCODE resource to see how the
control characters are handled?  If generating XML, you may
want to use TEXTENCODE to normalize all character data to UTF-8.
See manual for examples.

I see a quick mention of a similar problem back in 2000:

Have things changed? Is there any way short of writing a custom filter,
or hacking/patching an existing one, that I can persuade MHonArc to
strip out XML illegal control characters?

Check the minimal API documented in an appendix of the manual.  There
is a callback you can register after a message has been converted.
Your callback can check for invalid characters and remove them.


P.S. Please post you resource settings for creating XML.  Others
may be interested and it may be something to include in the docs.

<Prev in Thread] Current Thread [Next in Thread>