"Gupta, Raman K [CI]" <raman(_dot_)k(_dot_)gupta(_at_)citigroup(_dot_)com>
wrote in message
news:C0CB45C72DDE2A4FA32370AB65CAEC8F059673(_at_)EXNJMB26(_dot_)nam(_dot_)nsroot(_dot_)net(_dot_)(_dot_)(_dot_)
[Nice timing results omitted]
So all of the methods except recursion exhibit exponential
behavior with an increase in continuation records except
recursion.
So if there is a way to do solve this problem recursively
that would avoid the stack overflow errors (2.5.2 doesn't
even do you the courtesy of throwing an exception -- it
just dies in the middle of the transformation), that would
still be by far the best solution.
Hi Raman,
I have both bad and good news for you.
The Bad News:
The recursive algorithm isn't the fastest.
The Good News:
I am enclosing here the code implementing the fastest (that I
know of) algorithm, anf it is non-recursive.
The idea is to get a string with the positions of all "record" nodes with
type="normal" among all "record nodes", then for a record with type="normal"
find its position in the string and also the position of the next record
with type="normal" The difference in these two positions - 1 is the number
of intermediate record nodes with type="continuation".
Here's the XSLT code:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vposArray">
<xsl:value-of select="'|'"/>
<xsl:for-each select="/*/record">
<xsl:if test="@type = 'normal'">
<xsl:value-of select="concat(position(), '|')"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:template match="@* | node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="records">
<records>
<xsl:apply-templates select="record"/>
</records>
</xsl:template>
<xsl:template match="record">
<xsl:choose>
<xsl:when test="not(@type='normal')">
<xsl:call-template name="identity"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vPos" select="position()"/>
<xsl:variable name="vposNext"
select="substring-before(
substring-after($vposArray,
concat('|',
position(),
'|'
)
),
'|'
)"/>
<xsl:variable name="vNumNested"
select="$vposNext - position() - 1"/>
<xsl:copy>
<xsl:copy-of select="@* | node()"/>
<xsl:if test="$vNumNested > 0">
<xsl:copy-of select=
"following-sibling::record
[position() <= $vNumNested]"/>
</xsl:if>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="record[not(@type='normal')]"/>
</xsl:stylesheet>
I also measured the time it takes with the three different algorithms
(recursive, with keys, non-recursive) with two different sorce xml
documents -- one having 1000 consecutive record nodes with
type="continuation", the other with 2000 such nodes. They look like this:
<records>
<record n="1" type="normal"/>
<record n="2" type="normal"/>
<record n="3" type="continuation"/>
<record n="4" type="continuation"/>
<record n="5" type="continuation"/>
<record n="6" type="continuation"/>
<record n="7" type="continuation"/>
..............................................................
<record n="1000" type="continuation"/>
<record n="1001" type="continuation"/>
<record n="1002" type="continuation"/>
<record n="1003" type="normal"/>
<record n="1004" type="normal"/>
</records>
and
<records>
<record n="1" type="normal"/>
<record n="2" type="normal"/>
<record n="3" type="continuation"/>
<record n="4" type="continuation"/>
<record n="5" type="continuation"/>
<record n="6" type="continuation"/>
<record n="7" type="continuation"/>
..............................................................
<record n="2000" type="continuation"/>
<record n="2001" type="continuation"/>
<record n="2002" type="continuation"/>
<record n="2003" type="normal"/>
<record n="2004" type="normal"/>
</records>
I tested the following XSLT processors: MSXML4, Saxon6.5.3, JD1.5.2,
XalanJ2.4.1, XalanC1.5
I also wanted to test a forth algorithm using generate-id(). It took MSXML4
233157 milliseconds on the 1000 nodes document and I had to kill it after 30
minutes with the 2000 nodes document. I did not attempt to test this with
the rest of the XSLT processors.
Here are the results:
Method MSXML4 Saxon JD
XalanJ XalanC
=============================================================
Recursive:
1000 nodes 37 Stack Ovfl. 150
Stack Ovfl. 170
2000 nodes 519 Stack Ovfl. Stack Ovfl. Stack
Ovfl. 621
Keys:
1000 nodes 44 1783 791
4677 1472
2000 nodes 2211 6890 2654
17466 5818
Non-Recursive:
1000 nodes 5.9 50 100
1152 10
2000 nodes 12.09 101 120
1342 40
The non-recursive algorithm exhibits linear behaviour with MSXML4 and Saxon
and sub-linear! one with JD, XalanJ and XalanC.
The recursive and key-based algorithm exhibit quadratic behaviour.
XalanJ is clearly not in the same company as the rest of the tested
processors.
I could try still speeding up the non-recursive algorithm, by using a faster
search than linear to find the position of a record node in the string with
positions -- this will require that all positions must have the same (some
maximum) length. Or I could record the positions in a node-set, for which
binary search is straight-forward.
In case you are still not satisfied with the speed of the non-recursive
algorithm, just let me know :o)
Hope this helped.
=====
Cheers,
Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list