Hi Eliot,
Im interested to know more of how you came to these conclusions while also
ignoring the use of xpath within apply-templates/@select to do the bulk of
your node selection? I definitely have my own opinions on this but to make
sure I wasn't completely off base I did a quick test. To me there are two
priorities that need to be accounted for in every program (no matter what
language it is written in). These are 1)total cycles to completion and
2)peak memory consumption during the process. There's obviously more than
this but to me these are the most important numbers to help predict the
overall performance of any application/software solution.
If we first look at the total cycles (and define cycles as logic steps
within the Trace output) and we take the following three scenarios we get
some interesting results. For test XML I used an XML data set with a depth
of three(3) child nodes and 96 total elements. A simple boolean test of the
xpath revealed that 20 of these nodes matched the statement "foo[(_at_)bar =
'1']".
Scenario One: Xalan 2.5.1: 126 cycles
<xsl:apply-templates select="foo[(_at_)bar = '1']"/>
<xsl:template match="foo">
...
</xsl:template>
Scenario Two: Xalan 2.5.1: 199 cycles
<xsl:apply-templates select="foo"/>
<xsl:template match="foo[(_at_)bar = '1']">
...
</xsl:template>
<xsl:template match="foo"/>
Scenario Three: Xalan 2.5.1: 329 cycles
<xsl:apply-templates select="foo"/>
<xsl:template match="foo">
<xsl:if test="@bar = '1'">
...
</xsl:if>
</xsl:template>
All three scenarios output the exact same data.
It seems fairly obvious to me where the greatest performance is as far as
minimum cycles to complete the transformation is concerened. Actually,
scenario one and two use the exact same boolean test that results in the
exact same subset of data. The difference in cycles of course comes when you
add the fallthrough template that you are refering to which has to be put in
place to catch all the elements that don't match the criteria in the match
attribute of the template. XSL processors don't like anything that doesn't
contain markup so if no match is found the value of the element or attribute
gets dumped to the output and the processing continues. While in both cases
the select attribute of apply-templates will only pass those elements that
pass the boolean test, the first scenarios criteria is refined further than
the second and as such the subset of scenario one(which is then matched to
its corresponding template) is much smaller than scenario two resulting in
fewer nodes to process with the statement contained in the match attribute.
In the first two scenarios cycles are saved by reducing the number of
elements that are processed as far as possible by using an XPath statement
that matches an attribute/value pair in either the @select or the @match
attributes of xsl:apply-templates and xsl:template respectively. Scenario
three adds one more process to the mix before it begins to break things down
(there is an interesting near 1 > 2 > 3 relationship in the total cycles for
each scenario) to the attribute/value pair which means theres going to be
one more step in every matching element from @select and @match attributes.
While I am not suggesting there is no place for the conditional logic
elements of xsl I am suggesting that their use should be reserved for fine
tuning the results of your transformation and not for processing the bulk of
your XPath statements. In fact, IMHO ;) there are very few cases that
conditional logic elements should be used to process raw elements and
attributes. There best use is when taking the string value of either an
element or attribute and processing it further using a combination of
conditional logic and string functions. In fact, as a general rule I
believe the statement "templates are for processing elements and attributes
by matching their values or combinations of values of these elements and
attributes while the conditional elements xsl:if and
xsl:choose[when][otherwise] should be used to further process the non-XML
values of the resulting nodes that have been passed into the template using
the above mentioned template match processing."
With all of this in mind I see both scenario one and two as necessary
methods for bulk transformation of XML data depending one one simple factor:
Will the resulting nodeset from the XPath contained in the select attribute
of xsl:apply-templates result in elements that will need to be further
matched to more than one template? If "yes", use scenario two, and if "no",
use scenario one. Actually, this question should be further qualified by
asking if the result of the XPath statement can match multiple scenarios but
only one of those scenarios needs to be transformed or all of the scenarios
need to be transformed exactly the same. If this statement is true then
every effort should be made to qualify the elements in the xpath contained
in the select attribute and, if necessary, use unions in your match
attribute to match multiple elements, element[values], element[(_at_)attribute
values], or just simply [(_at_)attribute values] to the same template to be
further processed.
Taking peak memory into consideration (I have no data at this point to
evaluate so im speculating) it is my speculation that scenario one will
cause the smallest peak and scenario 3 the largest. This is based on the
simple fact that the further you step into a logic tree the more memory
required to store the data that got you in and the necessary data to get you
back out. I realize this takes nothing else into account (And there are
many more things to consider.) But without data to back me up I don't want
to get to deep into speculating anything.
I hope that I have in no way caused you to take offense to any of my
comments. My intention wasn't to drive down your comments but instead to
attempt to qualify with data what is actually taking place inside our
transformations. Actual data is the absolute most important thing we can
have at our disposal when evaluating performance and I believe that the
above numbers, from a general perspective, showcase quite well which
solution is best for each particular development scenario.
If you have data that suggests anything contrary to what I am saying please
let me/all of us know as, like you and everybody else in here, my ultimate
goal is to write the best possible code for any given situation. And the
more data there is that helps lean our code one way or another the better
off we are all going to be in our development efforts.
Best regards,
<M:D/>
----- Original Message -----
From: "Eliot Kimber" <ekimber(_at_)innodata-isogen(_dot_)com>
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Wednesday, June 02, 2004 2:10 PM
Subject: Re: [xsl] Performance Question: Expensive Functions in Predicates
My question is where, in general, is the best place to use these
functions:
- In apply-templates specifications?
- In match specifications?
- As IF blocks within templates?
I just stumbled onto a subtle (at least to me) difference between these
two nominally equivalent forms:
<xsl:template match="foo[util:is_applicable()]">
and
<xsl:template match="foo">
<xsl:if test="util:is_applicable()">
Which is that in the first case all *inapplicable* foo elements fall
through to the default template, which if there's no explicit template
for "foo", means that the content of foo will likely flow to the output,
therefore failing to suppress inapplicable foo elements. Doh!
Given that, it suggests that putting the check in the match= value is
the least attractive as it requires at least a single separate template
with a lower priority to catch all elements that fail the applicability
check, while doing the check at select time ensures that only applicable
elements will be processed at all.
Cheers,
Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122
eliot(_at_)innodata-isogen(_dot_)com
www.innodata-isogen.com
--+------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--