xsl-list
[Top] [All Lists]

[xsl] RE: Smart Quote Encoding

2007-09-13 08:08:14
On Thu, September 13, 13:56:49 +1000 (EST), Deborah Pickett wrote:

This error:

  Error
    org.xml.sax.SAXParseException: illegal XML character U+18:illegal
XML character U+18

says that you have a character U+18 (i.e., ASCII CAN, decimal 24,
Ctrl-X) in your file.  That character isn't allowed in XML.  See:
http://www.w3.org/TR/REC-xml/#charsets

Whatever is generating the "XML" file is putting that character in,
erroneously.

I have a program that is receiving text-only e-mails and logging the
messages to XML.  For various reasons (including troubleshooting), I would
like to log the content of the e-mails exactly.  It sounds like that's
simply not possible in XML, at least to the extent that "text-only" can
include characters not allowed in XML.

You will have to either tell the generator to not do that, or you
will have to insert a pipeline stage that converts U+18 into some
other character so that the document is actually XML and can be
parsed.

I guess I have to go with the pipelining strategy.

To add to the conformance woes of whatever is producing your input,
U+18 is not a printable character in ISO 8859-1, nor are smart
quotes part of true ISO 8859-1 (they are in Windows-1252), so if it
is producing the XML declaration you quoted then it is doubly wrong.

I inserted the ISO 8859-1 encoding declaration myself.  Apparently, Saxon
6.3 doesn't support windows-1252 encoding.  Saxon 8.9J, which I just now
installed, does appear to support that encoding.  However, it still
(correctly) flags the U+18 character as illegal.

--
Roger L. Cauvin
Cauvin, Inc.
Product Management/Market Research


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>