xsl-list
[Top] [All Lists]

RE: [xsl] Replace content of element, then transform it...

2012-09-03 02:37:13
Wendell,

Thanks for explaining the difference between this XSLT and the d-o-e. 
I have referenced this XSLT-code into my transformation and, and when applying 
this and the rest to my code, the transformation works. 
But then. These tests are controlled (as in the test files have been made by 
developers so they know what to expect from the file), so I am waiting for some 
from the real production to see if this really works or not.

Trond Husø



-----Original Message-----
From: Wendell Piez [mailto:wapiez(_at_)mulberrytech(_dot_)com] 
Sent: 31. august 2012 18:26
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Replace content of element, then transform it...

Trond,

Thanks for the reference.

To be clear, what this XSLT does (or appears to do: I haven't run it) is 
convert XML represented as a string literal into a tree structure. This isn't 
really the same as d-o-e, which as we discussed is a feature of a serializer, 
not a transformation (in the XSLT sense) at all.

As such, the stylesheet does implement something like an XML parser, although 
not formally one as it doesn't follow XML's rules (such as that a processor 
must stop on non-well-formed input). Thus, it isn't that it "does d-o-e" but 
that it does what a parser does after one has written literal XML from escaped 
pseudo-XML (with d-o-e).

It's worth noting that the same logic could probably be expressed in about half 
the code using XSLT 2.0. (But if it's to work inside Firefox it must be in XSLT 
1.0.)

Its author, Scott Trenda, has been a frequent contributor to this list.

Cheers,
Wendell

On 8/30/2012 5:04 PM, trond(_dot_)huso(_at_)ntb(_dot_)no wrote:
Wendell and others,

Just in case others come/are in the same situation:
On this page there is a reference to an XSL-file that does 
disable-output-escaping.

https://bugzilla.mozilla.org/show_bug.cgi?id=98168

Trond Husø

-----Original Message-----
From: trond(_dot_)huso(_at_)ntb(_dot_)no 
[mailto:trond(_dot_)huso(_at_)ntb(_dot_)no]
Sent: 30. august 2012 18:17
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Replace content of element, then transform it...

Wendell,

Thanks for a very informative answer.
Yes, the XML I am receiving is poor by design, but has been chosen because 
the input system opens for non-valid XHMLT/XML.
As I am trying to manipulate those who have taken this decision I sort of 
have option 0.

Found a bugreport on Firefox not supporting d-o-e where they have a 
doe.xsl-file, so I am following that route at the moment. Looks sort of OK, 
but since I am currently playing with controlled test-data and not real life 
examples, who knows when this will break...

I will look at the saxon::parse() function that you are referring to... And 
yes, if the XML/Source is valid or not...


Thanks
Trond Husø



Trond,

Indeed, your case illustrates exactly why disable-output-escaping is a trap.

One might think that if you have

<e>&lt;f>g&lt;/f></e>

and then

<xsl:template match="e">
    <xsl:copy>
      <xsl:value-of select="." disable-output-escaping="yes"/>
    </xsl:copy>
</xsl:template>

you then get<e><f>g</f></e>.

And this might even be true, in some cases -- but not in all. The reason? 
"<e><f>g</f></e>" is a sequence of characters, and XSLT does not generate 
sequences of characters, but nodes arranged in trees. You get a sequence of 
characters only when some subsequent process creates one from a tree 
generated by the transformation.

In this case, the tree would be shaped like this:

- element 'e'
    - text '<f>g</f>'

The process that turns this into an XML representation (tags and text:
"<e><f>g</f></e>") is called "serialization".

In your case, however, having generated such a tree, your XSLT submits it to 
be processed with a set of templates. No markup is created at all until the 
*final* result of your processing pipeline is written in the form of 
characters.

There is no 'f' element to be matched and processed in this second step.

Among other consequences of working with trees (by and large beneficial 
ones), this means that the design pattern of "hiding" markup inside XML by 
escaping it is a poor one, which is bound to create problems. At best it 
works only if it is carefully managed.

How to manage it (assuming you can't change your input data)? You have two 
choices:

1. Serialize your intermediate results before processing it again.
(Split your stylesheet into two, and run the second on the serialized 
output of the first.) This can work, but it doesn't scale well, as it 
requires both serializing and then parsing again in the middle of 
every transformation. Even with solid-state storage devices, this 
tends to be
time- and resource-intensive. And it locks you down to certain kinds of 
transformation architecture (namely on a file system).

2. Find another way of parsing the pseudo-markup hidden in your data.
For example, use an extension such as saxon:parse(), which will turn a string 
into a tree of nodes (assuming it's well-formed XML).

Cheers,
Wendell

On 8/30/2012 9:01 AM, trond(_dot_)huso(_at_)ntb(_dot_)no wrote:
Right. I take that note.

I am also noticing that disable-output-escaping is deprecated in XSLT2.0, so 
I guess I shall try and figure out another way of doing this...

Trond


-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: 30. august 2012 14:37
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Replace content of element, then transform it...

One of the main reasons that we've been telling people for 12 years not to 
use disable-output-escaping is that it couples the transformation too 
closely to the serialization, meaning it's difficult and inefficient to 
reuse your code as part of a pipeline. A lot depends on what this d-o-e 
stuff is really doing. Having said that, from the information supplied I 
don't know why you are getting the error you are.

Michael Kay
Saxonica


On 30/08/2012 12:08, trond(_dot_)huso(_at_)ntb(_dot_)no wrote:
Hi,

I have the following XSLT.
<xsl:template match="body">
           <body>
               <xsl:value-of select="." disable-output-escaping="yes"/>
           </body>
       </xsl:template>

       <xsl:template match="leadtext">
           <leadtext>
               <xsl:value-of select="." disable-output-escaping="yes"/>
           </leadtext>
       </xsl:template>


       <xsl:template match="node()|@*">
           <xsl:copy>
               <xsl:apply-templates select="node()|@*"/>
           </xsl:copy>
       </xsl:template>

Which does what is intended. Just that I want to not output it, but start 
w= orking on it in a phase-two process.
So I tried this

<xsl:template match="body">
           <body>
               <xsl:value-of select="." disable-output-escaping="yes"/>
           </body>
       </xsl:template>

       <xsl:template match="leadtext">
           <leadtext>
               <xsl:value-of select="." disable-output-escaping="yes"/>
           </leadtext>
       </xsl:template>


       <xsl:template match="node()|@*">
           <xsl:variable name="foo">
           <xsl:copy>
               <xsl:apply-templates select="node()|@*"/>
           </xsl:copy>
           </xsl:variable>
           <xsl:apply-templates select="$foo" mode="phase2" />
       </xsl:template>
<!-- Error message:
Description: Cannot create an attribute node (id) whose parent is a 
documen= t node
-->

After reading about how this works, I now understand why I am getting the 
e= rror. Is there another alternative to make this possible in one 
document, o= r do I have to send the output to a new document?

Best regards,

Trond Husø
System Developer
Mobile : +47 450 35 715
E-mail : trond(_dot_)huso(_at_)ntb(_dot_)no
www.ntb.no

--
======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
    Mulberry Technologies: A Consultancy Specializing in SGML and XML 
======================================================================

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--
======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML 
======================================================================

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>