On 28/07/2010 15:39, Ben Stover wrote:
Assume I have an XML doc file which starts with:
<?xml version="1.0" encoding="UTF-16"?>
<foobar>....
But the xml doc file is NOT UTF-16 encoded but ANSI or ISO-8859-1 or whatever.
Does it matter?
I mean does an XSLT processor like Saxon (or other) view this as nice to have
info but rely on the real encoding?
XSLT processors don't care. They pass off the work to an XML parser.
Which is why, when a failure occurs, Saxon is careful to tell you that
the error comes from the XML parser, not from Saxon itself.
Error on line 1 column 40 of in.xml:
SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
Transformation failed: Run-time errors were reported
That's another way of saying: you can choose from a wide range of
parsers to run with Saxon, and if you choose one that has poor error
messages, that's your problem not mine. (The one I generally recommend
is the Xerces parser from Apache, but the one that most people use is
the Xerces-derivative contained in the Sun/Oracle JDK; Sun's main
contribution was to add bugs.)
In practice "Content not allowed in prolog" is a very generic way of
reporting that the parser can't make sense of the bytes at the start of
the file, and an incorrect encoding is one possible reason for that failure.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--