xsl-list
[Top] [All Lists]

Re: [xsl] Character 150 withs Windows-1252 output

2006-04-21 06:11:35
"andrew welch" <andrew(_dot_)j(_dot_)welch(_at_)gmail(_dot_)com> writes:

On 4/21/06, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
Why is it that #150 gets escaped when using Windows-1252
output encoding when it should contain that character?

Because there is no character in the Windows-1252 character set that
corresponds to the Unicode character with codepoint 150.

Yes, thanks.  That makes sense now.  The thing I'm struggling with now is 
this:

This source XML:

<?xml version="1.0" encoding="Windows-1252" ?>
<foo>&#150;–</foo>

With this stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output encoding="US-ASCII"/>
<xsl:template match="/">
  <xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Gives this result:

<foo>&#150;&#8211;</foo>

I've checked the input file with a hex editor to make sure the
un-escaped dash really is 0x96.  Somehow the two characters are
treated differently, which is something I didn't expect.

I think that 0x96 in the input XML read using Windows-1252 should
become #8211 when output using any encoding other than Windows-1252,
which is what is happening for the actual character 0x96, but the
character reference #150 gets serialised back as #150...

Isn't this beause &#150; is a unicode entity? It's not a windows-1252
entity. In other words a character entity never changes according to
the input encoding.


Nic Ferrier

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--