xsl-list
[Top] [All Lists]

Re: [xsl] Processing based on number - alternatives to recursion?

2008-03-04 22:39:57
Lots of stack space is what I thought. Does tail recursion mean there
is nothing left to process in the current round when the template
recursively invokes itself?

more or less. googling for that phrase will yield arbitrary amounts of
detail:-) Basically there are theorems that tell you that certain
classes of problem can be written in tail recursive style (and certain
classes can not) and that those that are tail recursive can essentially
be implemented as a loop. this means that compilers can "compete" in how
good a job they can do at optimising tail recursive calls by rewriting
functions in tail recursive style even if the end user has not coded it
that way explictly.


Not all XSLT processors recognize and optimize tail-recursion.
Moreover, sometimes writing in the tail-recursive style may require
quite a bit of additional effort.

A technique, which doesn't rely on support from individual XSLT
processors, and which does not eliminate recursion entirely, but
allows for a much more shallow recursion is called the Divide and
Conquer recursion. In its simplest form it requires that:

  0. A set of one or two items is processed without recursion,
according to the rules of the problem being solved.

For sets of more than 2 items:

  1. All items to be processed are divided into two (almost equal in
size) sets Set1 and Set2.

  2. Then Set1 and Set2 are independently processed. This is the
recursion step, but notice that each recursion step is performed on
half as less items than the size of the current set of items.

  3. The results of processing Set1 and Set2 are combined, if needed.


As can be easily seen, when processing a set of N items the maximum
recursion depth is Log2(N).

This means, that to process a set of 1000000 (1M) items a maximum
stack depth of only 19 will be needed.


DVC is best illustrated with a simple example. Let's find the maximum
of a sequence of integers:

This stylesheet:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"


 <xsl:output method="text"/>

 <xsl:template match="/">
   <xsl:sequence select="f:dvcMax(1 to 100000)"/>
 </xsl:template>

 <xsl:function name="f:dvcMax" as="xs:integer">
   <xsl:param name="pSeq" as="xs:integer+"/>

   <xsl:sequence select=
    "for $vSize in count($pSeq)
       return
         if($vSize eq 1) then $pSeq[1]
         else
           for $vhalfSize in $vSize idiv 2,
               $vMax1 in f:dvcMax(subsequence($pSeq,1,$vhalfSize)),
               $vMax2 in f:dvcMax(subsequence($pSeq,$vhalfSize+1))
              return
                 if($vMax1 gt $vMax2) then $vMax1
                    else $vMax2
    "
    />
 </xsl:function>


</xsl:stylesheet>

when applied on any xml document (not needed and ignored),

produces the wanted result:

100000

in 735 milliseconds.

Notice that the recursive processing is specified just in 6 lines.

Also notice what a neat parallel processing oportunity presents the
split into two sets and their processing:

               $vMax1 in f:dvcMax(subsequence($pSeq,1,$vhalfSize)),
               $vMax2 in f:dvcMax(subsequence($pSeq,$vhalfSize+1))


These facts show that in many cases using DVC recursion might be more
efficient than (simple) tail recursion.

Another example, which demonstrates calculating integer power in a
very fast way (the two sets in this case are identical and we process
only one of them!):

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"

 <xsl:output method="text"/>

 <xsl:template match="/">
   <xsl:sequence select="f:nPow(1.000001, 1000000)"/>
 </xsl:template>

 <xsl:function name="f:nPow" as="xs:double">
   <xsl:param name="pBase" as="xs:double"/>
   <xsl:param name="pPower" as="xs:integer"/>

   <xsl:sequence select=
    "if($pPower eq 0) then 1
      else if($pPower ge 1)
      then
        for $vhalfPower in $pPower idiv 2
            return
                                if($pPower mod 2 eq 0)
                                      then
                                        for $vresPower1 in f:nPow($pBase, 
$vhalfPower)
                                            return $vresPower1 * $vresPower1
                                      else
                                        $pBase * f:nPow($pBase, $pPower -1)
      else
         error()
    "
    />
 </xsl:function>

</xsl:stylesheet>

When this transformation is applied (again on any xml document, which
will be ignored), the result:

2.7182804691319364

is obtained in just anywhere between 0 and 16 milliseconds. (of the
three timings one was 0 and the other 15 milliseconds).


-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play

On Tue, Mar 4, 2008 at 9:19 AM, David Carlisle 
<davidc(_at_)nag(_dot_)co(_dot_)uk> wrote:



 Supplying the stylesheet file name
as a parameter to the stylesheet and calling document() on it?

 <xsl:variable name="here" select="."/>
<xsl:for-each select="(document('')//*)[position()&lt; 10]">
 <xsl:variable name="p" select="position()"/>
      code using $here and $p as needed...

note this looks short but can easily be less efficient than the
recursive template version as it implies loading the stylesheet and
parsing the entire thing just to get a bunch of nodes to iterate over.
(For various reasons the system will probably need to reparse the
stylesheet as an input doc, not use the in-memory tree resulting from
the initial parse of the stylesheet). If on the other hand you know your
input doc will be big enough, you already have those nodes to hand so
<xsl:for-each select="(//*)[position()&lt; 10]">
is probably fairly cheap.



Lots of stack space is what I thought. Does tail recursion mean there
is nothing left to process in the current round when the template
recursively invokes itself?

more or less. googling for that phrase will yield arbitrary amounts of
detail:-) Basically there are theorems that tell you that certain
classes of problem can be written in tail recursive style (and certain
classes can not) and that those that are tail recursive can essentially
be implemented as a loop. this means that compilers can "compete" in how
good a job they can do at optimising tail recursive calls by rewriting
functions in tail recursive style even if the end user has not coded it
that way explictly. The factorial function is (always;) the standard
example.

5! is 5*4*3*2*1

if you code it as

fac(n) = n * fac(n-1) if n > 1 else 1

then that is not tail recursive and the  fac(5) ends up with a stack of
all 5 function calls being partially evaluated until you finally get down
to fac(1) and you pop the stack (or run out of stack:-)

if instead you do

fac(n)= fac(n,1)
fac(n,r)=fac(n-1,n*r) if n>1 else r

then the intermediate results are built up in the parameter list and so
you don't need to preserve the function call stack you can just
reuse the same space at each iteration (in otherwords it's essentially a
loop).




Does the call have to be placed at the end of the recursive template
in order for this to happen?

yes for some definition of "end" for example the recursive call inside a
conditional block ought to count as "end" (you normally need that to
terminate recursion). so in xslt the recursive call might not actually be
the last line, there may be various /xsl:when/xsl:choose etc.


David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________

--~------------------------------------------------------------------



XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--