Re: [xsl] Character 150 withs Windows-1252 output

Gives this result:

<foo>&#150;&#8211;</foo>

I've checked the input file with a hex editor to make sure the
un-escaped dash really is 0x96.  Somehow the two characters are
treated differently, which is something I didn't expect.

I think that 0x96 in the input XML read using Windows-1252 should
become #8211 when output using any encoding other than Windows-1252,
which is what is happening for the actual character 0x96, but the
character reference #150 gets serialised back as #150...


Isn't this beause &#150; is a unicode entity? It's not a windows-1252
entity. In other words a character entity never changes according to
the input encoding.


Ahh of course, that makes sense.  The character for #150 is worked out
after the bytes in the document have be parsed using the encoding
specified in the prolog....

So 0x96 becomes #8211 though the mapping defined in Windows-1252, and
#150 remains as #150 because its a character reference and character
references are always unicode.

Thanks Nic!

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

[xsl] SV: xsl-list Digest 21 Apr 2006 05:10:00 -0000 Issue 753, Lisa.Bergqvist

Next by Date:

RE: [xsl] Re: Character 150 withs Windows-1252 output, Michael Kay

Previous by Thread:

Re: [xsl] Character 150 withs Windows-1252 output, Nic

Next by Thread:

[xsl] Regular expression /s whitespace : Which whitespace?, Karen McAdams

Indexes:

[Date] [Thread] [Top] [All Lists]