xsl-list
[Top] [All Lists]

Re: Binary characters in XML

2003-06-29 00:16:41

"Michael Leung" <mmhleung(_at_)bikerider(_dot_)com> wrote in message
news:20030629043817(_dot_)55987(_dot_)qmail(_at_)mail(_dot_)com(_dot_)(_dot_)(_dot_)
Hi,

I am trying to transform an XML document:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="bin.xsl"?>
<doc>
<binary>
&#x01;&#x02;&#x03;&#x04;&#x05;&#x06;&#x07;&#x08;&#x00;
</binary>
</doc>

into a binary file with the contents being the value of
the binary element in the above using the following
XSLT stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output encoding="utf-8" />

<xsl:template match="binary">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>

I tried using MSXSL from Microsoft and I got an "Invalid unicode
character" error.

I also tried using Saxon and I got an "illegal XML character &#x1;" error.
In IE, those characters are displayed as rectangles ().

I wonder why the XSLT processors are complaining about these
characters and I wonder if it is possible to carry out such a
transformation.



It is not the XSLT processors that are complaining -- it is the XML parsers.

The XML 1.0 Spec defines strictly what characters are allowed in an XML
document:

"Character Range[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] |
                                                       [#xE000-#xFFFD] |
[#x10000-#x10FFFF]  /* any Unicode character, excluding the

surrogate blocks, FFFE, and FFFF. */"

http://www.w3.org/TR/REC-xml#charsets



Therefore, the only characters with code less than #x20 are: #x9 | #xA | #xD

The cited xml parsers are implementing this spec and are correctly producing
error messages for any character not included in the above definition -- and
this is exactly the case you describe.



=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL







 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>