Unfortunately it's very difficult to give good performance advice that
applies across all processors.
There are several things Saxon does with leading // that are relevant.
Firstly, if you're running schema-aware and the type of the context node is
known, Saxon-SA will rewrite //z as /a/b/c/d/z if it can. It can't always,
of course, for example if the structure is recursive. (As it happens this
isn't always a good optimization - it's good when the z elements are few and
localized, but bad when they are many and can appear anywhere.)
Secondly, //z is rewritten as /descendant::z if there's no positional
predicate.
Finally, for any given document, /descendant::z is implemented as a memo
function: the first time you execute it (for a particular choice of z and a
particular document) the document is scanned, but the result is retained and
is reused if you use the same expression again.
It's also worth pointing out that /descendant::z is very fast on the
tinytree anyway. Even if you've got 500,000 nodes in your document, it
doesn't take very long to scan an array of 500,000 integers and test each
one for equality to some constant. Sure, it's linear with the size of the
document, but the actual search speed per megabyte is probably 1000 times
faster than parsing or serializing.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Lars Huttar [mailto:huttarl(_at_)gmail(_dot_)com]
Sent: 31 October 2008 14:28
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XPath "//", speed, and Saxon
Hello,
I was recently trying to solve performance problems in an
XSLT-heavy web application, and came up against results that
puzzled me with regard to XSLT optimization.
We have a Cocoon pipeline in which about 5MB of XML data is
being fed through a particular XSLT stylesheet (one in a
series). And I thought that this stylesheet was the reason
for the pipeline taking forever to run. I looked in it and
found several uses of XPaths containing an initial
double-slash, e.g. select="//foo", some of them being invoked
multiple times.
I figured that for a simple XSLT processor, each "//foo"
expression could mean traversing the whole input DOM again,
which would be expensive for a big input.
So I went through and converted the "//foo" expressions to use keys.
Excited at how much faster I expected the stylesheet to run,
I ran some tests ... pretty fast. The process completed in
just under 2 seconds.
But then I ran an apples-to-apples test on the old version of
the stylesheet, the one with lots of "//foo" in it. And to my
surprise, the old version ran just as fast. After several
test runs I could see no appreciable difference in speed.
Obviously the performance problem was elsewhere. But the
question I wanted to ask here is, what does this imply
regarding good practices for writing efficient stylesheets?
Saxon of course is not a dumb XSLT processor. Maybe it
compiles the "//foo"-like XPath expressions into something
like keys without being told to... e.g. it indexes the DOM
tree by element name... and so you get good performance with
those expressions even on large inputs.
If so, does that optimization rely on the name of the
element, so that it would not apply to expressions like
"//*[...]"? That would suggest that for "//foo"-like
expressions, you're in good shape, but for expressions like
"//*" you should use a key for efficiency.
Thanks for any help and advice.
Lars
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--