xsl-list
[Top] [All Lists]

Re: [xsl] Sibling axis: All of kind A up to first of kind B

2008-03-26 14:04:30
Michael Kay schrieb:
No, implementations are not required to be smart. I think that any
respectable implementation(*) is likely to implement the predicate [1]
with a modicum of intelligence, that is, to stop scanning when the
first node is reached. Anything else is going to vary widely from one
product to another.

(*) Actually, implementations are not required to be respectable
either.

Thanks a lot for your answer. I submitted various respectable
implementations to some naive benchmarking on my laptop equipped
with a Intel(R) Core(TM)2 CPU T5500 @ 1.66GHz and running Linux.

* xsltproc (C): Using libxml 20627, libxslt 10120 and libexslt 813
* Xalan (C++): Xalan version 1.10.0, Xerces version 2.7.0
* Saxon (Java): /usr/local/jlib/saxon9-0-0-2/saxon9.jar
  Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)

The benchmark consisted of timing processing speed (using the Bash
function "time") for relatively large input documents using each of
the templates featured in my original post in this thread, the first
one being identified by "upto", the second one by "siblrec".

The input documents have the structure "/doc/h1/h2/p". I tested several
versions. In each case, there are 25 h1 elements. Then:

* in case (1): 25 h2 elements per h1, 25 p elements per h2;
* in case (2): 50 h2 elements per h1, 50 p elements per h2;
* in case (3): 75 h2 elements per h1, 75 p elements per h2.
* in case (4): 100 h2 elements per h1, 100 p elements per h2.
* in case (5): 1000 h2 elements per h1, 10 p elements per h2.

The document sizes are:

(1) 2.1 MB document, 16276 element nodes, 32551 text nodes
(2) 8.4 MB, 63776 element nodes, 127551 text nodes
(3) 19 MB, 142526 element nodes, 285051 text nodes
(4) 34 MB, 252526 element nodes, 505051 text nodes
(5) 34 MB, 275026 element nodes, 550051 text nodes

XML-wise, the output was the same regardless of processor or stylesheet.
Just small variations with regard to indenting.

The timing results are:

    processor  template first second third run
    ---------  -------- ----- ------ ---------
(1) xsltproc   upto      0.29   0.29   0.29
    xalan      upto      5.44   5.40   5.43
    saxon      upto      1.19   1.20   1.31
    xsltproc   siblrec   0.57   0.59   0.58
    xalan      siblrec   0.95   0.94   0.96
    saxon      siblrec   1.11   1.11   1.12
    ---------  -------- ----- ------ ---------
(2) xsltproc   upto      1.38   1.39   1.42
    xalan      upto     45.04  44.40  44.49
    saxon      upto      2.45   2.42   2.56
    xsltproc   siblrec   4.31   4.46   4.16
    xalan      siblrec   6.30   6.35   6.33
    saxon      siblrec   2.09   2.01   1.97
    ---------  -------- ----- ------ ---------
(3) xsltproc   upto      3.65   3.71   3.73
    xalan      upto    149.44 151.50 150.25
    saxon      upto      4.77   4.75   4.87
    xsltproc   siblrec  13.38  13.39  12.92
    xalan      siblrec  20.07  20.08  19.94
    saxon      siblrec   3.39   3.36   3.43
    ---------  -------- ----- ------ ---------
(4) xsltproc   upto      8.76   8.82   8.74
    xalan      upto         -      -      - not viable
    saxon      upto      8.52   8.40   8.46
    xsltproc   siblrec  30.31  30.46  29.19
    xalan      siblrec  45.25  46.95  45.71
    saxon      siblrec   5.45   5.49   5.48
    ---------  -------- ----- ------ ---------
(5) * see below *

I didn't really look at memory consumption. But for document (4),
I had to allow "java -Xmx96M" in order for Saxon to avoid the
"java.lang.OutOfMemoryError". LibXSLT used up to 160 MB, Xalan
only about 30 MB, and Java/Saxon about 115 MB. So memory-wise,
Xalan appears to do best.

It seems, xsltproc/LibXSLT prefers the "upto" template whereas Xalan
and Saxon perform better using sibling recursion. Xalan even seems to
dislike the "upto" approach. Saxon performs most evenly.

Consider that the timings for Saxon include JVM startup! And despite
that overhead with regard to the C/C++ binaries of xsltproc and xalan,
Saxon outperforms even LibXSLT for large inputs. Terrific! Saxon seems
to scale best with regard to both input size and programming variations.
(Well, at least for the two programs I fed it.)

[... sibling recursion ...]

I think this is likely to be more efficient. But with large input, it
will run out of stack space (typically after 500 or so calls) unless
the processor implements tail-call optimization, which not all do.

Michael Kay

Processing document (5) is difficult. It weighs 34 MB, has 275026
element nodes and 550051 text nodes. Having 1000 h2 nodes means the
context size is much higher where most of the processing happens.

Running out of stack space didn't happen. It may happen, but I
terminated some of the processes because they took too long.

(5) xsltproc   upto     67.30      -      -
    xalan      upto         -      -      - not viable
    saxon      upto     33.17      -      -
    xsltproc   siblrec      -      -      - not viable
    xalan      siblrec      -      -      - not viable
    saxon      siblrec   5.80   5.76   5.81

It's interesting to see how processing speed and memory consumption do
indeed - as you said - vary widely based on processor and programming
method. I think these are all very good processors. From my tests,
overall, your Saxon processor seems to perform best. But as stated,
my tests are likely to be pretty naive.

Michael Ludwig

PS: Here's the program I used to generate the input documents for
testing - in case someone wants to follow up ... Simply pass it an
input document like "<Urmel/>" - the input document isn't actually
needed. You can parametrize document generation tweaking the @count
attributes.

<xsl:transform version="1.0"
  xmlns:exsl="http://exslt.org/common";
  extension-element-prefixes="exsl"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

  <xsl:output indent="yes"/>

  <xsl:variable name="bla-rtf">
    <h1 count="3">Eins</h1>
    <h2 count="3">Zwei</h2>
<p count="3">eins zwei drei vier fünf sechs sieben acht neun zehn elf zwölf</p>
  </xsl:variable>
  <xsl:variable name="bla" select="exsl:node-set($bla-rtf)"/>

  <xsl:template match="/">
    <doc>
      <xsl:variable name="start-elm" select="$bla/*[1]"/>
      <xsl:call-template name="dodo">
        <xsl:with-param name="elm" select="$start-elm"/>
        <xsl:with-param name="count" select="$start-elm/@count"/>
      </xsl:call-template>
    </doc>
  </xsl:template>

  <xsl:template name="dodo">
    <xsl:param name="elm"/>
    <xsl:param name="count"/>
    <xsl:variable name="txt" select="$bla/*[local-name() = name($elm)]"/>
    <xsl:element name="{name($elm)}">
      <xsl:value-of select="$txt"/>
      <xsl:text> </xsl:text>
      <xsl:value-of select="$txt"/>
    </xsl:element>
    <xsl:variable name="next-elm" select="$elm/following-sibling::*[1]"/>
    <xsl:if test="$next-elm">
      <xsl:call-template name="dodo">
        <xsl:with-param name="elm" select="$next-elm"/>
        <xsl:with-param name="count" select="$next-elm/@count"/>
      </xsl:call-template>
    </xsl:if>
    <xsl:if test="$count > 1">
      <xsl:call-template name="dodo">
        <xsl:with-param name="elm" select="$elm"/>
        <xsl:with-param name="count" select="$count - 1"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:transform>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--