xsl-list
[Top] [All Lists]

Re: Streamlining XSL and transform performance (how does it actually work?)

2004-03-19 10:36:58
Hi Julian,

At 05:00 AM 3/19/2004, you wrote:
I am trying to do a sort of mail merge for creating wills and have been
advised that an XSL transform is the best route to go down.

It works.

The data is in XML format and I am just starting to convert the massive
(200+ pages) html template into an XSLT document.

The XML data is formatted as follows:
<AnswerSet title = "Test File">
  <Answer name = "ABGRbefore">
    <RptValue>
      <TFValue>true</TFValue>
      <TFValue>false</TFValue>
      <TFValue>false</TFValue>
    </RptValue>
  </Answer>
  <Answer name = "Female">
    <TFValue>false</TFValue>
  </Answer>
  <Answer name = "ticdesc">
    <TextValue>my collection of teapots</TextValue>
  </Answer>
</AnswerSet>

There are around 3,000 elements in the XML file in total.

I have so far worked out that at a simplistic level I can use the
following XSL for extracting the data:
<xsl:for-each select="AnswerSet/Answer[(_at_)name='Female']">
    <xsl:if test="TFValue = 'true'">
     <p>the user is female.</p>
    </xsl:if>
    <xsl:if test="TFValue = 'false'">
     <p>the user is male.</p>
    </xsl:if>
</xsl:for-each>
<xsl:for-each select="AnswerSet/Answer[(_at_)name='ticdesc']">
  <p><xsl:value-of select="TextValue"/></p>
</xsl:for-each>

Is this the most efficient way of extracting the data?

It is not inefficient. Whether it is the most efficient depends on what you mean by "extract".

Each time I want to extract a value, is the Processor having to loop
through the XML file or does it do it in a single pass?

Generally, the latter. That is, since an XSLT processor usually works with an entire tree of data already parsed in memory, it doesn't have to "loop through the file" in the way you might think of it. But actually, how the processor does it need not concern you. You only need to understand (a) that processing is exactly optimized for this kind of stuff, and (b) what the general XSLT processing model is and how it can be applied to your situation.

In effect, this is exactly what you are asking:

I could break the template down into more manageable chunks, but am not
sue how to import one template into another.

Which is exactly the point. (And keep an eye on other ongoing threads: several people are asking related questions.)

How a stylesheet is architected in XSLT depends primarily on the relation between the structure of the source, and the structure of the result. If the structure of the result mainly mirrors that of the source (as an XML-encoded document may be transformed into an HTML "styled" version that pretty much presents the same information organized in the same way), the XSLT engine can be put to work by a stylesheet very straightforwardly -- by default (without your having to do anything) it works by traversing the input tree and building output as it goes. This is done by your mainly staying out of the way; stylesheets of this kind have nothing but templates written to match nodes from the input to be processed as they are encountered, which can be very simple and elegant even in cases where source documents vary widely in the particulars of their organization. (You wouldn't ordinarily expect a set of technical manuals all to have exactly the same organization; with this method, one stylesheet can cope with the whole range). These are called "push" stylesheets in the business.

If your data has to be rearranged significantly, however, its content not merely presented and embellished but funnelled into an entirely unrelated organization, the simple push technique doesn't work. At this point templates are used not just to catch things as they come and mark them as they go, but actually to step in and rearrange things. They can become like miniature queries into the source, breaking things out, performing tests, wrapping up the data in different ways, or even directing the processor where to go next. This is what is described as the "pull" model.

Most actual working stylesheets include a combination of pull and push. They'll have pull where they need to rearrange the data into some known structure, but they'll use push (characterized by template matches and apply-templates instructions) where their output's structure mirrors their input. Often template that match (handle) particular pieces of the input document will have miniature pulls inside of them.

Your code above, with the tests, the for-each and the XPath in select attributes, is characteristic of "pull" code, and seems to come naturally to people who are experienced with database querying technologies (since that's similar to what you're doing). The best XSLT practitioners also let the processor do plenty of pushing, however. (I actually think of it as being like tai chi, the Chinese martial art, but that's another topic.)

(Interestingly, what is often left out of the discussion about "push" and "pull", particularly when we're singing the virtues of push, is that the entire stylesheet by default is a big "pull", which is why push works so nicely. When you start pulling, you're beginning to mess with what the stylesheet does by itself, so you can easily get into trouble by pulling when you could just allow it to push.)

Now, the interesting thing about a merge-type application as compared to the "classic" or plain-vanilla XSLT transform is that you have two input documents (or input streams), not just one, in addition to your stylesheet. This raises the questions: pull or push? and if you're going to rely on push, which source document does the pushing? (You could actually let both do some pushing, but let's not go into that. :-)

The best answer to this is prompted by seeing what the different documents' roles are in your architecture, as well as long-term maintenance concerns (how and whether this stylesheet and its sources may need to evolve in future).

A pure "pull" approach might work for you if you are literally doing nothing but populating a known document with values snagged from another one. Look up the little-used feature "literal result document as stylesheet" if you want to see a shortcut into this approach. (Eventually you will also need an extension function to create multiple output files ... but worry about that later.)

If things get at all complex, however, you may find you need to do some pushing, at which point you have to contrive things more flexibly. Since you have one document serving as a "template" (non-XSLT sense of that term), another as a kind of little database, it may make sense to let the processor push the template through, querying the data in the other document only where it needs to (pulling it).

In this case you would have a set of templates that match documents in your boilerplate document (which you might want to make your main source file). Mostly these templates just copy the boilerplate through. (You will want to look up "identity transforms" to see how this is done.) Occasionally, however, they query into the resource document to snag particular bits of information. (Look up the document() function for this.)

Longer-term, there is still a problem with this approach in that you have to run the processor once per output document, which can be a chore if you have a big pile of names. This can be handled as well by wrapping your logic in a routine that iterates over the set of names (again either by pulling or letting them be pushed); but you need to implement the per-document processing first.

Before I embark on a massive conversion process, I'm just wondering if
I am going down the right route.

What alternatives would you consider? XML lets you take control of your own data at every level all the way down. This can be considered a good thing for very many reasons.

A good book or two on XSLT, plus searching the net for such keywords as "XSL processing model" would be a help. In particular, you want to understand how templates match, how apply-templates works, what the processor does when you don't tell it anything else (there are built-in templates), and what role XPath plays (look at the difference between "select" and "match").

Also, keep in mind that building a merge routine is not really a beginner application. Though it's not all that hard to do, it raises architectural issues that can be hard to understand when you don't yet know how XSLT works with a single input document.

Good luck,
Wendell


___&&__&_&___&_&__&&&__&_&__&__&&____&&_&___&__&_&&_____&__&__&&_____&_&&_
    "Thus I make my own use of the telegraph, without consulting
     the directors, like the sparrows, which I perceive use it
extensively for a perch." -- Thoreau

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>