On 4/21/06, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
Reading around a bit 150 is a control character... so does
that mean it shouldn't appear in source XML document
(unresolved) where the encoding is specified as ISO-8859-1 ??
I believe that in the ISO standard ISO 8859/1, the control blocks C0 and C1
(which includes 150) are unused - they are not part of the character set.
However, according to Wikipedia [1], "the character map ISO_8859-1:1987,
more commonly known by its preferred MIME name of ISO-8859-1 ... assigns the
C0 and C1 control characters to the code values 00-1F, 7F, and 80-9F.
The XML recommendation defines encodings in terms of their IANA definitions
not their ISO definitions, so on that basis ISO-8859-1 does include the
control character 150.
In XML 1.1, there is a requirement that C0 and C1 characters (with obvious
exceptions such as TAB) must be represented as character references. This is
primarily to catch the common error where a Windows 1252 file is mislabelled
as ISO-8859-1.
[1] http://en.wikipedia.org/wiki/ISO_8859-1
Thanks for the info. Based on that, given this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="ISO-8859-1" method="xml"/>
<xsl:template match="/">
<foo>–</foo>
</xsl:template>
</xsl:stylesheet>
The output differs between MSXML 3/4, Saxon 6.5.4 and Saxon 8.7.1.
The latter escapes the character back to #150, while the 3 xslt 1.0
processors all output the character itself.
I'm guessing this is due to xml 1.1 support in Saxon 8.7?
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--