Re: [xsl] How to copy attribute value to text? (Suspected bug involving

From: Kenneth Reid Beesley <krbeesley(_at_)gmail(_dot_)com>
Subject: RE: [xsl] How to copy attribute value to text? (Suspected bug 
involving supplementary characters)
Date: 7 July 2016 at 12:23:29 MDT
To: xslt <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>

*****  Suspected bug involving supplementary characters *****

But my real task involves an input XML document, in UTF-8 encoding, that 
consists of Deseret Alphabet characters, which are encoded in the 
supplementary area.  In such a case, the resulting text content in the <word> 
element, copied from an original attribute value, is corrupted.  I saw such 
corruption in my own attempts, and couldn’t understand what was happening.

Using the following input document (the Deseret Alphabet characters may not 
display correctly for you)

<?xml version="1.0" encoding="UTF-8"?>

<foo>
  <bar>𐑄𐐮𐑅 𐐮𐑆 𐐾𐐲𐑅𐐻 <word correction="𐐻𐐭">𐑂𐐯𐑉𐐮</word> 𐑁𐐲𐑌𐐮</bar>
</foo>

the output, using your script, is corrupted.  The text() value in the output 
is not the same as the original @correction value.  Extra characters (just 
one in this case) are inserted.  The longer the original attribute value, the 
more extra characters are inserted.

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar>𐑄𐐮𐑅 𐐮𐑆 𐐾𐐲𐑅𐐻 <word origerror="𐑂𐐯𐑉𐐮">𐐻𐐻𐐭</word> 𐑁𐐲𐑌𐐮</bar>
</foo>

This kind of corruption is exactly what I was seeing using my own scripts, 
leading me to bother the group.  

I suspect a bug in the XSLT engine involving supplementary characters.  
Again, I’m using SaxonHE9-7-0-6J.

What’s my next step?

Thanks,

Ken

From: Michael Müller-Hillebrand <mmh(_at_)docufy(_dot_)de>
Subject: Re: [xsl] How to copy attribute value to text? (Suspected bug 
involving supplementary characters)
Date: 7 July 2016 at 14:20:30 MDT
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com

When copying the data and stylesheet into OxygenXML and also enabling bidi 
support, the XSLT processing works fine. 

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar>𐑄𐐮𐑅 𐐮𐑆 𐐾𐐲𐑅𐐻 <word origerror="𐑂𐐯𐑉𐐮">𐐻𐐭</word> 𐑁𐐲𐑌𐐮</bar>
</foo>

So your problems may come form some details in your setup? How are you 
running the transform?

BTW, interesting letters!

- Michael



I _was_ running the transform with the default JDK XML parser (Java 1.8).  I’m 
using SaxonHE9-7-0-6J.
This default JDK parser is reputed to be buggy.

From: Michael Kay <mike(_at_)saxonica(_dot_)com>
Subject: Re: [xsl] How to copy attribute value to text? (Suspected bug 
involving supplementary characters)

More likely to be a bug in the JDK parser. Try it using Apache Xerces, which 
is much more reliable than the JDK parser. I think some of the long-standing 
bugs in the JDK parser have finally been fixed in Java 8, so you could also 
try it with a different JDK.

Michael Kay
Saxonica



Michael Kay is right.  I changed to using the Xerces-J parser and now 
everything works as expected.

By the way, I found it a little difficult to figure out how to use Saxon and 
specify the xerces parser.
I had to hunt around a bit.  I finally found the following incantation (as 
coded in my Makefile).  


# using Saxon XSLT with the Xerces-J parser
BoMDA1869c.xml: BoMDA1869.xml BoMDA1869c.xsl
        java 
-Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
 \
  
-Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl
 \
  net.sf.saxon.Transform -o:$@  $<  BoMDA1869c.xsl


I have saxon9he.jar and xercesImpl.jar on my CLASSPATH.  It all seems to work.  
Am I missing anything?

Many thanks to all who responded to my question.

Ken

********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA



--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

Re: [xsl] How to copy attribute value to text? (Suspected bug involving supplementary characters)