xsl-list
[Top] [All Lists]

RE: [xsl] Escaped characters being duplicated

2007-12-11 16:23:35
Perplexing indeed.

I'd be less surprised if the output came out as "<" rather that
"<<". That's much more common, and could be caused by processing text
twice when it should only be processed once. 

The conversion from "<" to "&lt;" is done by the XML serializer. The fact
that you're using the Saxon XSLT processor doesn't necessarily mean that
you're using the Saxon serializer (the Saxon output could be sent to a DOM
which is then serialized using the DOM serializer); it would be a good idea
to find out what serializer is actually being used. The easiest way to find
out is to see whether the serialization is affected by xsl:output
declarations in the stylesheet.

How did you satisfy yourself that both the successful and the unsuccessful
runs are using Saxon 6.5.5? JAXP is a wonderful beast, and ensures that many
people are running a different XSLT processor from the one they thought they
were using.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Anderson, Paul [mailto:Paul(_dot_)Anderson(_at_)compuware(_dot_)com] 
Sent: 11 December 2007 23:07
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Escaped characters being duplicated

Greetings All,

We have a bunch of DITA XML content and we're using the 
open-source DITA Open Toolkit to transform it into a variety 
of outputs. The DITA Open Toolkit is a collection of Java 
classes, XSL stylesheets, and ANT scripts that transform the 
content and create the output. 

To shield our users from the command-line invocation of the 
publishing scripts, we deployed a simple web application 
running on Tomcat 5.5 that takes input from a JSP page and 
invokes the necessary ANT script to generate the desired 
output for the user. This methodology has been working quite 
nicely for nearly a year.

Over that time, a few of our users are having a problem where 
characters escaped in the XML content (for example, angle 
brackets and ampersands) are duplicated in the output. For 
example, in the place of one angle-bracket (&lt;), we end up 
with two or sometimes four escaped angle brackets (&lt;&lt;&lt;&lt;).

I've been troubleshooting the problem and the duplication 
always appears in the output files generated by one of the 
XSL stylesheets in the DITA Open Toolkit. If the input file 
contained an escaped character, the output file contains two 
of those escaped characters. The most interesting discovery 
so far is this: For each user that has the problem, the 
problem goes away if they invoke the ANT script via the 
command line; the duplication only occurs when the ANT script 
is invoked from the JSP page running on Tomcat 5.5. Having 
said that, the problem only exists for a few users; most 
users never see this problem when they use the JSP page to 
invoke the ANT script and publish the exact same XML content.

Perplexing.

Given all this background, my plea to this list is simple: 
What sort of conditions cause an XSL transformation to 
duplicate an escaped character? 

Would the system locale have an impact?
Would the Java version (1.5 versus 1.6) have an impact?
All source files use UTF-8 encoding.
All users are using the same XSL processor: Saxon 6.5.5.
I don't think the problem is in the XSL stylesheet or any 
other part of the DITA Open Toolkit because all users are 
using the same code and it works for most users.

Any ideas about his issue are appreciated.

Best regards,

Paul Anderson
Information Developer - Codex Administrator Compuware 
Corporation The contents of this e-mail are intended for the 
named addressee only. It contains information that may be 
confidential. Unless you are the named addressee or an 
authorized designee, you may not copy or use it, or disclose 
it to anyone else. If you received it in error please notify 
us immediately and then destroy it.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>