I've recently been looking at revamping an archive and having MHonArc
output XML which is then pulled into a PHP based application using
Mostly this is working fine, but I have the occasional problem with
control characters in badly formatted emails. Specifically, a QP email
with the string =12 - MHonArc outputs the associated control character
to the XML. These characters are not valid in XML and the XML parser
chokes on them.
I see a quick mention of a similar problem back in 2000:
Have things changed? Is there any way short of writing a custom filter,
or hacking/patching an existing one, that I can persuade MHonArc to
strip out XML illegal control characters?
If not, any hints on where to start hacking?