On Thu, September 13, 13:56:49 +1000 (EST), Deborah Pickett wrote:
This error:
Error
org.xml.sax.SAXParseException: illegal XML character U+18:illegal
XML character U+18
says that you have a character U+18 (i.e., ASCII CAN, decimal 24,
Ctrl-X) in your file. That character isn't allowed in XML. See:
http://www.w3.org/TR/REC-xml/#charsets
Whatever is generating the "XML" file is putting that character in,
erroneously.
I have a program that is receiving text-only e-mails and logging the
messages to XML. For various reasons (including troubleshooting), I would
like to log the content of the e-mails exactly. It sounds like that's
simply not possible in XML, at least to the extent that "text-only" can
include characters not allowed in XML.
You will have to either tell the generator to not do that, or you
will have to insert a pipeline stage that converts U+18 into some
other character so that the document is actually XML and can be
parsed.
I guess I have to go with the pipelining strategy.
To add to the conformance woes of whatever is producing your input,
U+18 is not a printable character in ISO 8859-1, nor are smart
quotes part of true ISO 8859-1 (they are in Windows-1252), so if it
is producing the XML declaration you quoted then it is doubly wrong.
I inserted the ISO 8859-1 encoding declaration myself. Apparently, Saxon
6.3 doesn't support windows-1252 encoding. Saxon 8.9J, which I just now
installed, does appear to support that encoding. However, it still
(correctly) flags the U+18 character as illegal.
--
Roger L. Cauvin
Cauvin, Inc.
Product Management/Market Research
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--