The spec is very strict that characters not allowed in XML cause an error.
This is a change since the book was written.
However, the spec is very loose about how URIs are resolved. So a conformant
product could take the URI
thing.txt?substitute-illegal-chars=FFFD
as a reference to "the document formed by taking thing.txt and substituting
illegal characters with xFFFD."
Perhaps I'll do that.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Abel Braaksma Online [mailto:abel(_dot_)online(_at_)xs4all(_dot_)nl]
Sent: 27 July 2006 19:10
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] unparsed-text() and illegal characters
Dear List,
Trying to "import" a non-XML file of an undefined encoding, I
received the following error when using Saxon8: "The unparsed
text file contains a character illegal in XML (line=1
column=4 value=hex 11)". I only found one reference about
this error
(http://www.stylusstudio.com/xsllist/200510/post90470.html),
which is actually a post about illegal characters inside the
XSLT document.
Michael Kay points out in that post that this error is merged
into XTDE1190 (see
http://www.w3.org/TR/xslt20/#err-XTDE1190). It is claimed in
the specs that non-understood characters or byte sequences
should result in this non-recoverable dynamic error.
In his indispensable book, the XSLT 2.0 Programmer's
Reference, he states the following:
"Some processors will provide configuration options that pass
this choice on the user. If the file contains characters that
are invalid in XML (this applies to most control characters
in the range x00 to x1F under XML 1.0, but only to the null
character x00 under XML 1.1) then the invalid characters are
substituted by the special Unicode character xFFFD, which is
specifically intended for such purposes."
I understand that the book was written before XSLT 2.0 was
finalized (it is still a Candidate), but I wonder if a
treatment like above is still possible somehow. The contents
of the file is ISO-8859-1, apart from the start and end
header, which contain control characters. I only need the
part that is parsable as text, the rest can be dismissed.
Am I asking too much from XSLT, or is this somehow possible?
It would really add to the possibilities, and it means I
don't need some extra filter or preparse step.
Cheers,
Abel Braaksma
www.nuntia.nl
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--