mhonarc-users

XML, control characters and MHonArc

2007-10-05 02:45:41
I've recently been looking at revamping an archive and having MHonArc
output XML which is then pulled into a PHP based application using
XML_Unserialize.

Mostly this is working fine, but I have the occasional problem with
control characters in badly formatted emails. Specifically, a QP email
with the string =12 - MHonArc outputs the associated control character
to the XML. These characters are not valid in XML and the XML parser
chokes on them.

I see a quick mention of a similar problem back in 2000:
http://www.mhonarc.org/archive/html/mhonarc-users/2000-07/msg00040.html

Have things changed? Is there any way short of writing a custom filter,
or hacking/patching an existing one, that I can persuade MHonArc to
strip out XML illegal control characters?

If not, any hints on where to start hacking?

Thanks

-- 
Chris Hastie

<Prev in Thread] Current Thread [Next in Thread>