xsl-list
[Top] [All Lists]

RE: Counting nodes efficiently

2004-02-19 03:19:32
 Greetings.
 
 I've been using Jeni's method from the XSLT FAQ to assign 
 unique id's to 
 nodes. In order to speed things up, can anyone think of a way 
 that I could 
 store the running totals for the different nodes, rather than 
 having to 
 call the count() function repeatedly? A generalized method 
 would obviously 
 be the best, so that it could be applied to any arbitrary set 
 of nodes, 
 but I don't know if this is even possible.
 
 <xsl:template match="*">
    <xsl:variable name="name" select="name()" />
    <xsl:element name="{name()}">
      <xsl:attribute name="id">
        <xsl:value-of select="concat($name, '-',
        count(preceding::*[name()= $name]) +
        count(ancestor::*[name()= $name]))" />
      </xsl:attribute>
      <xsl:apply-templates />
    </xsl:element>
 </xsl:template>

3 ways:

1.  Create a node-set by selecting all the elements you wish to count
and numbering them using position().  You can then query into this
node-set using the generate-id() function to get the correct number for
the element you're processing.  This only requies one pass of the data
so its quite efficient.

2.  Write a SAX Filter in java that numbers the elements on their way
into the transform.  You can then select this number as if it was
already in the data.

3.  If you are using saxon, you can substring the value returned from
generate-id() after the 'e', as the generated id's take form 'dxxeyy'
where d is the document number and e is the element number.

As pointed out earlier, just using generate-id() probably solves the
original problem.

I will show here one solution to the problem, because it is interesting,
simple and uses and important xslt design pattern.

In this and similar problems one can re-use the identity transformation,
which processes strictly one node at a time (would a native English
speaker, please, suggest a good name? The best I could arrive at was "one
node at a time identity transformation"):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 <xsl:output omit-xml-declaration="yes"/>
 
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="node()[1]"/>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]"/>
  </xsl:template>

</xsl:stylesheet>

This transformation produces the same results as the more well-known
identity rule.

The difference is that we now have the finest possible grain-level of
controll as every xsl:apply-templates instruction above always selects at
most one node.

It is trivial to add parameters and to override the one-at-a-time identity
for elements. Thus we finally have:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 <xsl:output omit-xml-declaration="yes"/>
 
  <xsl:template match="@* | node()">
    <xsl:param name="pnAncestors" select="0"/>
    <xsl:param name="pnPreceding" select="0"/>
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="node()[1]">
        <xsl:with-param name="pnAncestors" 
                        select="$pnAncestors+1"/>
        <xsl:with-param name="pnPreceding"
                        select="$pnPreceding"/>
      </xsl:apply-templates>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding+1"/>
          
    </xsl:apply-templates>
  </xsl:template>
  
  <xsl:template match="*">
    <xsl:param name="pnAncestors" select="0"/>
    <xsl:param name="pnPreceding" select="0"/>
    
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:attribute name="_id">
        <xsl:value-of select=
                    "concat(name(), '_',
                            $pnAncestors, '_', 
                            $pnPreceding 
                            )"/>
      </xsl:attribute>
    </xsl:copy>
    <xsl:apply-templates select="node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors+1"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding+1"/>
          
    </xsl:apply-templates>
    
  </xsl:template>

</xsl:stylesheet>

This transformation is not long, it can be written almost mechanically, it
makes exactly one pass over the tree and it has a linear time complexity
(checked with inputs of different size).


Cheers,

Dimitre Novatchev 
FXSL developer,

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html




__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list