xsl-list
[Top] [All Lists]

[xsl] Efficiency problems while maintaining state

2007-05-23 14:49:57
I am exploring aspects of maintaining state in an XSLT script using a
technique based on examples and insight from Dimitre Novatchev[0].  In
order to maintain state as I have implemented it, each template must
return the latest value of the state to the dispatcher of the template
along with any "normal" results of the template, and any dispatcher is
responsible for passing the state from any previous template call to
the next template call.  This works, but I am concerned about a
performance issue.  Until a top-level template finally extracts the
"normal" results from the final return value, the results from
descendant template calls are recursively copied to parent calls (in a
depth-first manner), which can cause the same results to be copied
many times.  Is there a better way to think about this problem that
can avoid this overhead?

To help visualize the problem, consider the toy example that follows.
The input file has the format:

<root>
 <a>
   <dup times="2">
     <a>
       <dup times="3">
         <a/>
       </dup>
       <dup times="2">
         <a>
           <dup times="2">
             <a/>
           </dup>
         </a>
       </dup>
       <a/>
     </a>
   </dup>
   <dup times="3">
     <a/>
   </dup>
 </a>
</root>

The desired set of semantics of the example XSLT script that will
process such an input file has two elements.  First, the script should
replace each <dup/> element with N copies of the resulting subtree of
the <dup/> element, where N is the value of the "times" attribute.
Second, each <a/> element should have an "n" attribute containing the
count of <a/> elements that have been added so far (including the
current one).  Based on the example input document above, these
semantics would indicate the following output document:

<root>
 <a n="1">
   <a n="2">
     <a n="3"/>
     <a n="4"/>
     <a n="5"/>
     <a n="6">
       <a n="7"/>
       <a n="8"/>
     </a>
     <a n="9">
       <a n="10"/>
       <a n="11"/>
     </a>
     <a n="12"/>
   </a>
   <a n="13">
     <a n="14"/>
     <a n="15"/>
     <a n="16"/>
     <a n="17">
       <a n="18"/>
       <a n="19"/>
     </a>
     <a n="20">
       <a n="21"/>
       <a n="22"/>
     </a>
     <a n="23"/>
   </a>
   <a n="24"/>
   <a n="25"/>
   <a n="26"/>
 </a>
</root>

One way to implement this is by keeping a running count (that is,
maintaining the count as state), and this is analogous to an actual
problem that I am trying to solve.  One possible implementation
follows below.  In this implementation, the
"count-and-process.contents" template is critical, as it provides
generic processing for a node-set by processing the first node in that
node-set, obtaining the resulting state from processing that node, and
then passing the state to a recursive processing of the remainder of
the node-set.

<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
   xmlns:exsl="http://exslt.org/common";
   xmlns:s="tag:jlc6(_at_)po(_dot_)cwru(_dot_)edu,2007-05-23:state_container_NS"
   extension-element-prefixes="exsl"
   exclude-result-prefixes="s"
   version="1.0">
 <xsl:output indent="yes"/>

 <xsl:template match="/root">
   <xsl:variable name="_m">
     <xsl:call-template name="count-and-process">
       <xsl:with-param name="node-set" select="*"/>
     </xsl:call-template>
   </xsl:variable>

   <xsl:variable name="m" select="exsl:node-set($_m)/node()"/>

   <xsl:copy>
     <xsl:copy-of select="$m/s:result/node()"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template name="count-and-process">
   <xsl:param name="m" select="document('')/*/s:monad[s:count]"/>
   <xsl:param name="node-set" select="/.."/>

   <xsl:variable name="_contents">
     <xsl:call-template name="count-and-process.contents">
       <xsl:with-param name="m" select="$m"/>
       <xsl:with-param name="node-set" select="$node-set"/>
     </xsl:call-template>
   </xsl:variable>

   <xsl:variable name="contents"
                 select="exsl:node-set($_contents)"/>
   <s:monad>
     <xsl:copy-of select="$contents/s:count"/>
     <s:result>
       <xsl:copy-of select="$contents/node()[not(self::s:count)]"/>
     </s:result>
   </s:monad>
 </xsl:template>

 <xsl:template name="count-and-process.contents">
   <xsl:param name="m" select="document('')/*/s:monad[s:count]"/>
   <xsl:param name="node-set" select="/.."/>

   <xsl:choose>
     <xsl:when test="not($node-set)">
       <xsl:copy-of select="$m/s:count"/>
     </xsl:when>
     <xsl:otherwise>
       <xsl:variable name="_n">
         <xsl:apply-templates select="$node-set[1]">
           <xsl:with-param name="m" select="$m"/>
         </xsl:apply-templates>
       </xsl:variable>
       <xsl:variable name="n"
                     select="exsl:node-set($_n)/node()"/>

       <xsl:copy-of select="$n/s:result/node()"/>

       <xsl:variable name="p">
         <s:monad>
           <xsl:copy-of select="$n/s:count"/>
           <s:result/>
         </s:monad>
       </xsl:variable>

       <xsl:call-template name="count-and-process.contents">
         <xsl:with-param name="m" select="exsl:node-set($p)/node()"/>
         <xsl:with-param name="node-set"
                         select="$node-set[position() &gt; 1]"/>
       </xsl:call-template>
     </xsl:otherwise>
   </xsl:choose>
 </xsl:template>

 <xsl:template name="increment">
   <xsl:param name="m" select="document('')/*/s:monad[s:count]"/>

   <s:monad>
     <s:count><xsl:value-of select="$m/s:count + 1"/></s:count>
     <xsl:copy-of select="$m/s:result"/>
   </s:monad>
 </xsl:template>

 <xsl:template match="a">
   <xsl:param name="m" select="/.."/>

   <xsl:variable name="_i">
     <xsl:call-template name="increment">
       <xsl:with-param name="m" select="$m"/>
     </xsl:call-template>
   </xsl:variable>
   <xsl:variable name="i" select="exsl:node-set($_i)/node()"/>
   <xsl:variable name="_n">
     <xsl:call-template name="count-and-process">
       <xsl:with-param name="m" select="$i"/>
       <xsl:with-param name="node-set" select="*"/>
     </xsl:call-template>
   </xsl:variable>
   <xsl:variable name="n" select="exsl:node-set($_n)/node()"/>

   <s:monad>
     <xsl:copy-of select="$n/s:count"/>
     <s:result>
       <a n="{$i/s:count}">
         <xsl:copy-of select="$n/s:result/node()"/>
       </a>
     </s:result>
   </s:monad>
 </xsl:template>

 <xsl:template match="dup">
   <xsl:param name="m" select="/.."/>
   <xsl:param name="times" select="@times"/>

   <xsl:variable name="_c">
     <s:monad><xsl:copy-of select="$m/s:count"/><s:result/></s:monad>
   </xsl:variable>
   <xsl:variable name="c" select="exsl:node-set($_c)/node()"/>

   <xsl:variable name="_p">
     <xsl:call-template name="count-and-process">
       <xsl:with-param name="m" select="$c"/>
       <xsl:with-param name="node-set" select="*"/>
     </xsl:call-template>
   </xsl:variable>
   <xsl:variable name="p" select="exsl:node-set($_p)/node()"/>

   <xsl:variable name="_n">
     <s:monad>
       <xsl:copy-of select="$p/s:count"/>
       <s:result>
         <xsl:copy-of select="$m/s:result/node()"/>
         <xsl:copy-of select="$p/s:result/node()"/>
       </s:result>
     </s:monad>
   </xsl:variable>
   <xsl:variable name="n" select="exsl:node-set($_n)/node()"/>

   <xsl:choose>
     <xsl:when test="$times &gt; 1">
       <xsl:apply-templates select=".">
         <xsl:with-param name="m" select="$n"/>
         <xsl:with-param name="times" select="$times - 1"/>
       </xsl:apply-templates>
     </xsl:when>
     <xsl:otherwise>
       <xsl:message><xsl:value-of select="$n/s:count"/></xsl:message>
       <xsl:copy-of select="$n"/>
     </xsl:otherwise>
   </xsl:choose>
 </xsl:template>

 <s:monad>
   <s:count>0</s:count>
   <s:result/>
 </s:monad>
</xsl:stylesheet>

In this example, the immutable results that are passed from leaves of
the tree to the root are small, and the path is short, but these
results could easily be much larger and might need to be copied a
large number of times.  In my investigation, much time is spent in
processing <xsl:copy-of/>, but many of these copies are repeated over
subtrees that will not change before they are added to the final
result tree.  The challenge is to modify and pass on the state
component without needing to touch the entire result each time the
state changes contexts.

Clearly this implementation is very awkward, as it must explicitly
convert between result tree fragments and node sets at many places,
but I don't see any way around that.

[0] http://groups.google.co.uk/group/comp.text.xml/msg/fccecacb602aab27?hl=en

Take care,

   John L. Clark

--
PLEASE NOTE that this message is not digitally signed.  As a result,
you have no strong evidence that this message was actually sent by me.
Upon request I can provide a digitally signed receipt for this
message or other evidence validating its contents if you need such
evidence.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>