Re: Streamlining XSL and transform performance (how does it actually wo

Hi Julian,

At 05:00 AM 3/19/2004, you wrote:

I am trying to do a sort of mail merge for creating wills and have been
advised that an XSL transform is the best route to go down.


It works.

The data is in XML format and I am just starting to convert the massive
(200+ pages) html template into an XSLT document.

The XML data is formatted as follows:
<AnswerSet title = "Test File">
  <Answer name = "ABGRbefore">
    <RptValue>
      <TFValue>true</TFValue>
      <TFValue>false</TFValue>
      <TFValue>false</TFValue>
    </RptValue>
  </Answer>
  <Answer name = "Female">
    <TFValue>false</TFValue>
  </Answer>
  <Answer name = "ticdesc">
    <TextValue>my collection of teapots</TextValue>
  </Answer>
</AnswerSet>

There are around 3,000 elements in the XML file in total.

I have so far worked out that at a simplistic level I can use the
following XSL for extracting the data:
<xsl:for-each select="AnswerSet/Answer[(_at_)name='Female']">
    <xsl:if test="TFValue = 'true'">
     <p>the user is female.</p>
    </xsl:if>
    <xsl:if test="TFValue = 'false'">
     <p>the user is male.</p>
    </xsl:if>
</xsl:for-each>
<xsl:for-each select="AnswerSet/Answer[(_at_)name='ticdesc']">
  <p><xsl:value-of select="TextValue"/></p>
</xsl:for-each>

Is this the most efficient way of extracting the data?

It is not inefficient. Whether it is the most efficient depends on what youmean by "extract".

Each time I want to extract a value, is the Processor having to loop
through the XML file or does it do it in a single pass?

Generally, the latter. That is, since an XSLT processor usually works withan entire tree of data already parsed in memory, it doesn't have to "loopthrough the file" in the way you might think of it. But actually, how theprocessor does it need not concern you. You only need to understand (a)that processing is exactly optimized for this kind of stuff, and (b) whatthe general XSLT processing model is and how it can be applied to yoursituation.


In effect, this is exactly what you are asking:

I could break the template down into more manageable chunks, but am not
sue how to import one template into another.

Which is exactly the point. (And keep an eye on other ongoing threads:several people are asking related questions.)

How a stylesheet is architected in XSLT depends primarily on the relationbetween the structure of the source, and the structure of the result. Ifthe structure of the result mainly mirrors that of the source (as anXML-encoded document may be transformed into an HTML "styled" version thatpretty much presents the same information organized in the same way), theXSLT engine can be put to work by a stylesheet very straightforwardly -- bydefault (without your having to do anything) it works by traversing theinput tree and building output as it goes. This is done by your mainlystaying out of the way; stylesheets of this kind have nothing but templateswritten to match nodes from the input to be processed as they areencountered, which can be very simple and elegant even in cases wheresource documents vary widely in the particulars of their organization. (Youwouldn't ordinarily expect a set of technical manuals all to have exactlythe same organization; with this method, one stylesheet can cope with thewhole range). These are called "push" stylesheets in the business.

If your data has to be rearranged significantly, however, its content notmerely presented and embellished but funnelled into an entirely unrelatedorganization, the simple push technique doesn't work. At this pointtemplates are used not just to catch things as they come and mark them asthey go, but actually to step in and rearrange things. They can become likeminiature queries into the source, breaking things out, performing tests,wrapping up the data in different ways, or even directing the processorwhere to go next. This is what is described as the "pull" model.

Most actual working stylesheets include a combination of pull and push.They'll have pull where they need to rearrange the data into some knownstructure, but they'll use push (characterized by template matches andapply-templates instructions) where their output's structure mirrors theirinput. Often template that match (handle) particular pieces of the inputdocument will have miniature pulls inside of them.

Your code above, with the tests, the for-each and the XPath in selectattributes, is characteristic of "pull" code, and seems to come naturallyto people who are experienced with database querying technologies (sincethat's similar to what you're doing). The best XSLT practitioners also letthe processor do plenty of pushing, however. (I actually think of it asbeing like tai chi, the Chinese martial art, but that's another topic.)

(Interestingly, what is often left out of the discussion about "push" and"pull", particularly when we're singing the virtues of push, is that theentire stylesheet by default is a big "pull", which is why push works sonicely. When you start pulling, you're beginning to mess with what thestylesheet does by itself, so you can easily get into trouble by pullingwhen you could just allow it to push.)

Now, the interesting thing about a merge-type application as compared tothe "classic" or plain-vanilla XSLT transform is that you have two inputdocuments (or input streams), not just one, in addition to your stylesheet.This raises the questions: pull or push? and if you're going to rely onpush, which source document does the pushing? (You could actually let bothdo some pushing, but let's not go into that. :-)

The best answer to this is prompted by seeing what the different documents'roles are in your architecture, as well as long-term maintenance concerns(how and whether this stylesheet and its sources may need to evolve in future).

A pure "pull" approach might work for you if you are literally doingnothing but populating a known document with values snagged from anotherone. Look up the little-used feature "literal result document asstylesheet" if you want to see a shortcut into this approach. (Eventuallyyou will also need an extension function to create multiple output files... but worry about that later.)

If things get at all complex, however, you may find you need to do somepushing, at which point you have to contrive things more flexibly. Sinceyou have one document serving as a "template" (non-XSLT sense of thatterm), another as a kind of little database, it may make sense to let theprocessor push the template through, querying the data in the otherdocument only where it needs to (pulling it).

In this case you would have a set of templates that match documents in yourboilerplate document (which you might want to make your main source file).Mostly these templates just copy the boilerplate through. (You will want tolook up "identity transforms" to see how this is done.) Occasionally,however, they query into the resource document to snag particular bits ofinformation. (Look up the document() function for this.)

Longer-term, there is still a problem with this approach in that you haveto run the processor once per output document, which can be a chore if youhave a big pile of names. This can be handled as well by wrapping yourlogic in a routine that iterates over the set of names (again either bypulling or letting them be pushed); but you need to implement theper-document processing first.

Before I embark on a massive conversion process, I'm just wondering if
I am going down the right route.

What alternatives would you consider? XML lets you take control of your owndata at every level all the way down. This can be considered a good thingfor very many reasons.

A good book or two on XSLT, plus searching the net for such keywords as"XSL processing model" would be a help. In particular, you want tounderstand how templates match, how apply-templates works, what theprocessor does when you don't tell it anything else (there are built-intemplates), and what role XPath plays (look at the difference between"select" and "match").

Also, keep in mind that building a merge routine is not really a beginnerapplication. Though it's not all that hard to do, it raises architecturalissues that can be hard to understand when you don't yet know how XSLTworks with a single input document.


Good luck,
Wendell


___&&__&_&___&_&__&&&__&_&__&__&&____&&_&___&__&_&&_____&__&__&&_____&_&&_
    "Thus I make my own use of the telegraph, without consulting
     the directors, like the sparrows, which I perceive use it

extensively for a perch." -- Thoreau


XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Re: Streamlining XSL and transform performance (how does it actually work?)