xsl-list
[Top] [All Lists]

Re: [xsl] (Re-)Escaping entities in input text

2008-08-20 09:39:34
Hi,

thanks for the answer, but I think you misunderstood my question. The point is, 
I'm not writing to a file on the drive, therefore I'm not using any XML writer 
here. I'm simply saving my result to an xs:string, and passing it (without 
writing it down) to another program. so for example, if I have this input file:
<goo>
  <foo>a &lt; b</foo>
</goo>

saxon (I'm using version 9.0.0.4J) will read it, and, in a template that 
matches "foo", <xsl:value-of select="."/> would return "a < b", which is 
correct. 

Now assume I have this piece of code (I'm writing it on the run, please be 
lenient :-) ):
<xsl:template match="foo">
  <xsl:variable name="my_xml">
    <xsl:text>&lt;bar&gt;</xsl:text>
    <xsl:value-of select="." />
    <xsl:text>&lt;/bar&gt;</xsl:text>
  </xsl:variable>
  <xsl:value-of select="java_class:function" />
</xsl:template>

The point of this template is to create a pseudo XML file in a string (my_xml), 
and pass it on to a java function (java_class:function) which will process it. 
However, doing it this way, my_xml will have the following content:
<bar>a < b</bar>
which is not well-formed, and thence couldn't be parsed by an XML parser in my 
java class. 

So what i'm looking for is a way of outputting, *in my internal string*,  "a 
&lt; b" instead of "a < b". 

I don't think this is bad practice, is it? I mean, definitely there are some 
cases where XSLT just cannot handle everything, and the processing of a piece 
of XML have to be handed over to some other processor :-). 

On a related note: could it be that Saxon uses ISO-8859-1 instead of UTF-8 
internally?? My source file is definitely UTF-8, but when I pass a string 
containing special characters (in that case german umlauts) to my Java class, 
I'm getting '?' (question marks) instead of the 2-byte codepoints... Any idea 
why this is happening, or how to avoid that?? 

David


----- Original Message ----
From: Andrew Welch <andrew(_dot_)j(_dot_)welch(_at_)gmail(_dot_)com>
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Sent: Wednesday, August 20, 2008 4:21:18 PM
Subject: Re: [xsl] (Re-)Escaping entities in input text

Of course, there is the possibility of replacing all 5 entities "by hand" by 
calling a transform function, however this might not be very efficient when 
the string is getting big. Is there either:

_never_ do that... it's the first step down the wrong road which is
long and painful.

- a way of disabling entity interpretation with xsl:value-of (actually 
getting "<" when it's written like this in the input file)

xsl:value-of simply creates text nodes in the result tree, there is no
interpretation going on - that only happens during
parsing/serialisation

- a function to "reescape" a piece of text so that it's usable in an XML 
file/string?

that happens during serialisation... for example:

<foo> a &lt; b </foo>

when that's parsed you will get a node "foo" with a single text node
child "a < b".   If you do xsl:value-of on that text node, it will add
to the result tree.  It's still "a < b" at this point.  Then the
serializer operates on the result tree which knows that "<" in a text
node must be escaped, so after that step it becomes "a &lt; b"...

It sounds like you might be skipping the serialization step - perhaps
you're constructing a String and just writing that to disk?  eg

String xml = "<foo>" + someValue = "</foo>";

...which would give you:

<foo> a < b </foo>.

...hence the question?  Doing it that way is A Bad Thing - the golden
rule is to always read and write XML using proper XML readers and
writers.

-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


      

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--