Re: Performance Question: Expensive Functions in Predicates

Hi Eliot,

Im interested to know more of how you came to these conclusions while also 
ignoring the use of xpath within apply-templates/@select to do the bulk of 
your node selection?  I definitely have my own opinions on this but to make 
sure I wasn't completely off base I did a quick test.  To me there are two 
priorities that need to be accounted for in every program (no matter what 
language it is written in).  These are 1)total cycles to completion and 
2)peak memory consumption during the process.  There's obviously more than 
this but to me these are the most important numbers to help predict the 
overall performance of any application/software solution.

If we first look at the total cycles (and define cycles as logic steps 
within the Trace output) and we take the following three scenarios we get 
some interesting results.  For test XML I used an XML data set with a depth 
of three(3) child nodes and 96 total elements.  A simple boolean test of the 
xpath revealed that 20 of these nodes matched the statement "foo[(_at_)bar = 
'1']".

Scenario One: Xalan 2.5.1: 126 cycles

<xsl:apply-templates select="foo[(_at_)bar = '1']"/>

<xsl:template match="foo">
    ...
</xsl:template>



Scenario Two: Xalan 2.5.1: 199 cycles

<xsl:apply-templates select="foo"/>

<xsl:template match="foo[(_at_)bar = '1']">
    ...
</xsl:template>

<xsl:template match="foo"/>



Scenario Three: Xalan 2.5.1: 329 cycles

<xsl:apply-templates select="foo"/>

<xsl:template match="foo">
    <xsl:if test="@bar = '1'">
        ...
    </xsl:if>
</xsl:template>


All three scenarios output the exact same data.


It seems fairly obvious to me where the greatest performance is as far as 
minimum cycles to complete the transformation is concerened. Actually, 
scenario one and two use the exact same boolean test that results in the 
exact same subset of data. The difference in cycles of course comes when you 
add the fallthrough template that you are refering to which has to be put in 
place to catch all the elements that don't match the criteria in the match 
attribute of the template.  XSL processors don't like anything that doesn't 
contain markup so if no match is found the value of the element or attribute 
gets dumped to the output and the processing continues.  While in both cases 
the select attribute of apply-templates will only pass those elements that 
pass the boolean test, the first scenarios criteria is refined further than 
the second and as such the subset of scenario one(which is then matched to 
its corresponding template) is much smaller than scenario two resulting in 
fewer nodes to process with the statement contained in the match attribute.

In the first two scenarios cycles are saved by reducing the number of 
elements that are processed as far as possible by using an XPath statement 
that matches an attribute/value pair in either the @select or the @match 
attributes of xsl:apply-templates and xsl:template respectively.  Scenario 
three adds one more process to the mix before it begins to break things down 
(there is an interesting near 1 > 2 > 3 relationship in the total cycles for 
each scenario) to the attribute/value pair which means theres going to be 
one more step in every matching element from @select and @match attributes.

While I am not suggesting there is no place for the conditional logic 
elements of xsl I am suggesting that their use should be reserved for fine 
tuning the results of your transformation and not for processing the bulk of 
your XPath statements.  In fact, IMHO ;) there are very few cases that 
conditional logic elements should be used to process raw elements and 
attributes.  There best use is when taking the string value of either an 
element or attribute and processing it further using a combination of 
conditional logic and string functions.  In fact, as a general rule I 
believe the statement "templates are for processing elements and attributes 
by matching their values or combinations of values of these elements and 
attributes while the conditional elements xsl:if and 
xsl:choose[when][otherwise] should be used to further process the non-XML 
values of the resulting nodes that have been passed into the template using 
the above mentioned template match processing."

With all of this in mind I see both scenario one and two as necessary 
methods for bulk transformation of XML data depending one one simple factor: 
Will the resulting nodeset from the XPath contained in the select attribute 
of xsl:apply-templates result in elements that will need to be further 
matched to more than one template?  If "yes", use scenario two, and if "no", 
use scenario one.  Actually, this question should be further qualified by 
asking if the result of the XPath statement can match multiple scenarios but 
only one of those scenarios needs to be transformed or all of the scenarios 
need to be transformed exactly the same.  If this statement is true then 
every effort should be made to qualify the elements in the xpath contained 
in the select attribute and, if necessary, use unions in your match 
attribute to match multiple elements, element[values], element[(_at_)attribute 
values], or just simply [(_at_)attribute values] to the same template to be 
further processed.

Taking peak memory into consideration (I have no data at this point to 
evaluate so im speculating) it is my speculation that scenario one will 
cause the smallest peak and scenario 3 the largest.  This is based on the 
simple fact that the further you step into a logic tree the more memory 
required to store the data that got you in and the necessary data to get you 
back out.  I realize this takes nothing else into account (And there are 
many more things to consider.)  But without data to back me up I don't want 
to get to deep into speculating anything.

I hope that I have in no way caused you to take offense to any of my 
comments.  My intention wasn't to drive down your comments but instead to 
attempt to qualify with data what is actually taking place inside our 
transformations. Actual data is the absolute most important thing we can 
have at our disposal when evaluating performance and I believe that the 
above numbers, from a general perspective, showcase quite well which 
solution is best for each particular development scenario.

If you have data that suggests anything contrary to what I am saying please 
let me/all of us know as, like you and everybody else in here, my ultimate 
goal is to write the best possible code for any given situation.  And the 
more data there is that helps lean our code one way or another the better 
off we are all going to be in our development efforts.

Best regards,

<M:D/>



----- Original Message ----- 
From: "Eliot Kimber" <ekimber(_at_)innodata-isogen(_dot_)com>
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Wednesday, June 02, 2004 2:10 PM
Subject: Re: [xsl] Performance Question: Expensive Functions in Predicates

My question is where, in general, is the best place to use these
functions:

- In apply-templates specifications?

- In match specifications?

- As IF blocks within templates?


I just stumbled onto a subtle (at least to me) difference between these
two nominally equivalent forms:

<xsl:template match="foo[util:is_applicable()]">

and

<xsl:template match="foo">
   <xsl:if test="util:is_applicable()">


Which is that in the first case all *inapplicable* foo elements fall
through to the default template, which if there's no explicit template
for "foo", means that the content of foo will likely flow to the output,
therefore failing to suppress inapplicable foo elements. Doh!

Given that, it suggests that putting the check in the match= value is
the least attractive as it requires at least a single separate template
with a lower priority to catch all elements that fail the applicability
check, while doing the check at select time ensures that only applicable
elements will be processed at all.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot(_at_)innodata-isogen(_dot_)com
www.innodata-isogen.com


--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--