xsl-list
[Top] [All Lists]

Re: [xsl] Saxon and ZWNJ

2013-06-10 01:58:51
Yes, I think it's a bug -- but not in Saxon.

Saxon's implementation of XdmItem.getStringValue() relies on calling 
textNode.getNodeValue() in the underlying DOM, and my suspicion is that this 
method is returning the value of the text node in escaped form.

What exactly is this "HTML cleaned DOM" that you are passing to the DOMSource 
constructor? If my suspicion is correct, it doesn't implement the DOM spec 
correctly.

Michael Kay
Saxonica

PS: this question is very product specific. Product-specific questions are 
better addressed to a product-specific forum rather than to the xsl-list. For 
Saxon, you can use the forums at saxonica.plan.io




On 9 Jun 2013, at 22:42, Mohsen Saboorian wrote:

Hi,
I'm trying to evaluate an XPATH expression with saxon-9.1.0.8 using
the following code snippet:

 Configuration conf = new Configuration();
 conf.setValidation(false);
 Processor p = new Processor(false);
 DocumentBuilder documentBuilder = p.newDocumentBuilder();
 XPathCompiler xpathCompiler = p.newXPathCompiler();

 XPathExecutable xpe = xpathCompiler.compile(expression);
 XPathSelector xpath = xpe.load();
 xpath.setContextItem(documentBuilder.build(new
DOMSource(cleanHtml.document)));

 XdmItem result = xpath.evaluateSingle();

The HTML is in Persian script (whose cleaned DOM is passed as
cleanHtml.document in the above code) which has ZWNJ (U+200C) not
escaped.

The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining
result.getStringValue(), the result has escaped ZWNJ as (‌) which
doesn't seem to be correct because I'm getting node 'string' value.

Is this a bug, or is there any flag to disable escaping special
Unicode characters in saxon?

Regards,
Mohsen

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>