"Michael Leung" <mmhleung(_at_)bikerider(_dot_)com> wrote in message
news:20030629043817(_dot_)55987(_dot_)qmail(_at_)mail(_dot_)com(_dot_)(_dot_)(_dot_)
Hi,
I am trying to transform an XML document:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="bin.xsl"?>
<doc>
<binary>
�
</binary>
</doc>
into a binary file with the contents being the value of
the binary element in the above using the following
XSLT stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="utf-8" />
<xsl:template match="binary">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
I tried using MSXSL from Microsoft and I got an "Invalid unicode
character" error.
I also tried using Saxon and I got an "illegal XML character " error.
In IE, those characters are displayed as rectangles ().
I wonder why the XSLT processors are complaining about these
characters and I wonder if it is possible to carry out such a
transformation.
It is not the XSLT processors that are complaining -- it is the XML parsers.
The XML 1.0 Spec defines strictly what characters are allowed in an XML
document:
"Character Range[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the
surrogate blocks, FFFE, and FFFF. */"
http://www.w3.org/TR/REC-xml#charsets
Therefore, the only characters with code less than #x20 are: #x9 | #xA | #xD
The cited xml parsers are implementing this spec and are correctly producing
error messages for any character not included in the above definition -- and
this is exactly the case you describe.
=====
Cheers,
Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list