"andrew welch" <andrew(_dot_)j(_dot_)welch(_at_)gmail(_dot_)com> writes:
On 4/21/06, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
Why is it that #150 gets escaped when using Windows-1252
output encoding when it should contain that character?
Because there is no character in the Windows-1252 character set that
corresponds to the Unicode character with codepoint 150.
Yes, thanks. That makes sense now. The thing I'm struggling with now is
this:
This source XML:
<?xml version="1.0" encoding="Windows-1252" ?>
<foo>––</foo>
With this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="US-ASCII"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
Gives this result:
<foo>––</foo>
I've checked the input file with a hex editor to make sure the
un-escaped dash really is 0x96. Somehow the two characters are
treated differently, which is something I didn't expect.
I think that 0x96 in the input XML read using Windows-1252 should
become #8211 when output using any encoding other than Windows-1252,
which is what is happening for the actual character 0x96, but the
character reference #150 gets serialised back as #150...
Isn't this beause – is a unicode entity? It's not a windows-1252
entity. In other words a character entity never changes according to
the input encoding.
Nic Ferrier
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--