Sorry, this was related to my underlying HTML cleaner engine (which
provides HTML => valid DOM 3). I upgraded from htmlcleaner-2.2 to
htmlcleaner-2.5 and this escaping issue happened. I just downgraded
and this was resolved.
Thanks,
Mohsen
On Mon, Jun 10, 2013 at 11:28 AM, Michael Kay <mike(_at_)saxonica(_dot_)com>
wrote:
Yes, I think it's a bug -- but not in Saxon.
Saxon's implementation of XdmItem.getStringValue() relies on calling
textNode.getNodeValue() in the underlying DOM, and my suspicion is that this
method is returning the value of the text node in escaped form.
What exactly is this "HTML cleaned DOM" that you are passing to the DOMSource
constructor? If my suspicion is correct, it doesn't implement the DOM spec
correctly.
Michael Kay
Saxonica
PS: this question is very product specific. Product-specific questions are
better addressed to a product-specific forum rather than to the xsl-list. For
Saxon, you can use the forums at saxonica.plan.io
On 9 Jun 2013, at 22:42, Mohsen Saboorian wrote:
Hi,
I'm trying to evaluate an XPATH expression with saxon-9.1.0.8 using
the following code snippet:
Configuration conf = new Configuration();
conf.setValidation(false);
Processor p = new Processor(false);
DocumentBuilder documentBuilder = p.newDocumentBuilder();
XPathCompiler xpathCompiler = p.newXPathCompiler();
XPathExecutable xpe = xpathCompiler.compile(expression);
XPathSelector xpath = xpe.load();
xpath.setContextItem(documentBuilder.build(new
DOMSource(cleanHtml.document)));
XdmItem result = xpath.evaluateSingle();
The HTML is in Persian script (whose cleaned DOM is passed as
cleanHtml.document in the above code) which has ZWNJ (U+200C) not
escaped.
The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining
result.getStringValue(), the result has escaped ZWNJ as (‌) which
doesn't seem to be correct because I'm getting node 'string' value.
Is this a bug, or is there any flag to disable escaping special
Unicode characters in saxon?
Regards,
Mohsen
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--