xsl-list
[Top] [All Lists]

RE: Spotting "cousin marriages" in a tree

2004-07-28 15:57:33
You may be hoping for too much. Graph algorithms such as looking for cycles
often have complexity of O(n^2) or worse, whatever language they are
implemented in.

Michael Kay 

-----Original Message-----
From: Phil Endecott [mailto:spam_from_xslt_list(_at_)chezphil(_dot_)org] 
Sent: 28 July 2004 22:19
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Spotting "cousin marriages" in a tree

Dear XSLT experts,

I have a flat input like this:

<things>
   <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
   <thing id="2"/>
   <thing id="3"/>
</things>

I want to match up the ids with the idrefs to build a tree that looks 
like this:

<tree id="1">
   <tree id="2"/>
   <tree id="3"/>
</tree>

I could do it like this:

<xsl:template match="thing">
   <tree>
     <xsl:for-each select="child">
       <xsl:apply-templates
            select="/things/thing[(_at_)id=current()/@idref]"/>
     </xsl:for-each>
   </tree>
</xsl:template>

But that is O(n^2), so I'd instead do it like this:

<xsl:key name="things-by-id" match="thing" use="@id"/>
<xsl:template match="thing">
   <tree>
     <xsl:for-each select="child">
       <xsl:apply-templates select="key('things-by-id',@idref)"/>
     </xsl:for-each>
   </tree>
</xsl:template>

which is O(n log n).

This is all fine.  But very rarely my input may not describe a pure 
tree; it may contain "cousin marriages" or more "incestuous" 
relationships like this:

<things>
   <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
   <thing id="2"> <child idref="4"/> </thing>
   <thing id="3"> <child idref="4"/> </thing>
   <thing id="4"/>
</things>

If you apply either of the above templates to this, thing 4 
appears in 
the output twice:

<tree id="1">
   <tree id="2">
     <tree id="4"/>
   </tree>
   <tree id="3">
     <tree id="4">
   </tree>
</tree>

which is very bad in my application.  What I need is 
something like this:

<tree id="1">
   <tree id="2">
     <tree id="4"/>
   </tree>
   <tree id="3">
     <error/>
   </tree>
</tree>

I can't afford for this to become an O(n^2) operation.

The obvious thought is that I need to check at the point of recursion 
whether the thing I am about to process has already been output.  But 
there is no "preceding-in-the-output" axis or other way I can 
think of 
to note which things have already been output.

An alternative is to apply a test to the other input elements 
to see if 
other things have the same thing as this one as a child, but 
that fails 
because my input may legitimately contain redundant nodes like this:

<things>
   <thing id="1"> <child idref="2"/> <child idref="3"/> </thing>
   <thing id="2"/>
   <thing id="3"/>
   <thing id="9"> <child idref="2"/> </thing>
</things>

In this case I want the same output as the very first example, i.e. 
thing 9 doesn't appear in the output at all.

I can do a multi-pass process using exsl:node-set(), 
generating the tree 
with the duplicates in it and then deleting them if they are 
duplicates, 
i.e. something like this:

<xsl:template match="/">
   <xsl:variable name="n">
     <xsl:apply-templates select="things/thing[1]"/>  (as above)
   </xsl:variable>
   <xsl:apply-templates select="exsl:node-set($n)" 
mode="remove-dupes"/>
</xsl:template>

<xsl:template match="tree" mode="remove-dupes">
   <xsl:choose>
     <xsl:when test="preceding::tree[(_at_)id=current/@id]">
       <error/>
     </xsl:when>
     <xsl:otherwise>
       <tree id="{(_at_)id}">
         <xsl:apply-templates mode="remove-dupes"/>
       </tree>
     </xsl:otherwise>
   </xsl:choose>
</xsl:template>

There are two problems with this.  First, generating the intermediate 
result could run forever if the input is seriously malformed (e.g. a 
loop, child refers to parent); checking as it is generated 
would avoid 
this.  Second, the test "preceding::tree[(_at_)id=current/@id]" is O(n^2). 
Can I use a key to avoid this?  How can a key be applied to an 
exsl:node-set() result?

XSLT often needs some lateral thinking to come up with a good 
solution. 
  I hope that someone who's read this far has a trick to 
solve this for me.

Thanks in advance!

--Phil.


--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--