xsl-list
[Top] [All Lists]

Re: [xsl] Problem: XSLT, attribute value, Unicode supplementary characters

2010-04-18 02:29:56
I'm afraid I can't help, other than to verify that there seems to be
no problem running
 1. OS X 10.5.8
 2. Saxon-HE 9.2.0.3J
using the XML input and XSLT you provided (except providing an XML
declaration that stated encoding=UTF-8 and changing "pleft/text/case"
to "/case" :-) I used XSLT 2.0.

The output has U+1041C, U+10430, and U+1043B for both the value-of
the attr and the value-of the content.


I've got a problem with XSLT transformation of attribute values
consisting of Unicode supplementary characters.

Background:

1.  OS X  10.6.3
2.  saxonhe9-2-0-6j
3.  The task:  transforming an XML document into XeTeX (specifying 
<xsl:output method="text" encoding="UTF-8"/> )
4.  The XML document is well-formed and also validates against a Relax NG 
schema.
5.  The XML document is designated as <?xml version="1.0" encoding="UTF-8"?>
6.  The locale of the operating system is UTF-8


Typical Data:

XML: <case
correctda="ð???ð??°ð??»">ð???ð??°ð??»</case>

The value of the attribute named correctda is here a short string
of three Deseret Alphabet letters, from the Unicode supplementary
area.

Matching XSLT template:
  
<xsl:template match="pleft/text/case">{\da <xsl:value-of
select="@correctda"/>\endnote{\rom Case correction: {\da
<xsl:value-of select="."/>} $\rightarrow$ {\da <xsl:value-of
select="@correctda"/>}}}</xsl:template>

Behavior:

1. The key problem is the output of the attribute value, via
<xsl:value-of select="@correctda"/>. Instead of outputting the
value ð???ð??°ð??», as expected, the output is
instead a long string of unrelated Deseret Alphabet characters.
It's as if the value-of function is being confused by the Unicode
supplementary characters.

2. This XSLT script was working a couple of months ago. Since then,
I did upgrade to OS X 10.6 (Snowleopard), and in trying to fix the
current problem, I upgraded to saxonhe9-2-0-6j as well. The problem
persists.

Question:

Does anyone know what's happening and how I can fix it? Has
something changed in the handling of Unicode supplementary
characters?

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>