xsl-list
[Top] [All Lists]

RE: [xsl] Problem: XSLT, attribute value, Unicode supplementary characters

2010-04-18 04:09:45
Check that you are not using the XML parser built-in to JDK 1.6: use the Xerces 
parser from Apache. The JDK 1.6 parser has some nasty bugs, and often corrupts 
attribute values.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay  

-----Original Message-----
From: Kenneth Reid Beesley [mailto:krbeesley(_at_)gmail(_dot_)com] 
Sent: 18 April 2010 06:25
To: xslt
Subject: [xsl] Problem: XSLT, attribute value, Unicode 
supplementary characters


I've got a problem with XSLT transformation of attribute 
values consisting of Unicode supplementary characters.

Background:

1.  OS X  10.6.3
2.  saxonhe9-2-0-6j
3.  The task:  transforming an XML document into XeTeX 
(specifying <xsl:output method="text" encoding="UTF-8"/> ) 4. 
 The XML document is well-formed and also validates against a 
Relax NG schema.
5.  The XML document is designated as <?xml version="1.0" 
encoding="UTF-8"?> 6.  The locale of the operating system is UTF-8


Typical Data:

XML:      <case correctda="𐐜𐐰𐐻">𐑄𐐰𐐻</case>

The value of the attribute named correctda is here a short 
string of three Deseret Alphabet letters, from the Unicode 
supplementary area.

Matching XSLT template:
  
<xsl:template match="pleft/text/case">{\da <xsl:value-of 
select="@correctda"/>\endnote{\rom Case correction: {\da 
<xsl:value-of select="."/>} $\rightarrow$ {\da <xsl:value-of 
select="@correctda"/>}}}</xsl:template>

Behavior:

1.  The key problem is the output of the attribute value, via 
<xsl:value-of select="@correctda"/>.  Instead of outputting 
the value 𐐜𐐰𐐻, as expected, the output is instead a long 
string of unrelated Deseret Alphabet characters.   It's as if 
the value-of function is being confused by the Unicode 
supplementary characters.

2.  This XSLT script was working a couple of months ago.  
Since then, I did upgrade to OS X 10.6 (Snowleopard), and in 
trying to fix the current problem, I upgraded to 
saxonhe9-2-0-6j as well.  The problem persists.

Question:

Does anyone know what's happening and how I can fix it?  Has 
something changed in the handling of Unicode supplementary characters?

Thanks,

Ken


******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA






--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>