xsl-list
[Top] [All Lists]

RE: Comparing two xml documents

2003-03-12 21:25:33
Ragulf Pickaxe wrote:

I don't know about the limit on 200 KB yet. I will check out 
the downloaded 
version at a later time (perhaps tomorrow).

By the way, xmlDiff is written in C# and requires .NET (even to download
the source).  :-(
But it is a cool app.  (Judging from the demo page, the xmldiff
language definition and the xmlpatch app.)

On the sort: I can see how you would do something like this 
when you are 
able to delete the elements that you have already compared, 
but I can't see 
how you would be able to avoid going through all elements in 
A and B another 
time when searching B for elements not found in A.
Would this be in any way possible in pure XSL?

Maybe not.
My best attempt at it so far requires the exslt:node-set() function
in order to get the sorted A and B documents into node-sets
so we can process them.

I'm also not at all sure it will perform better than the previous
solution... to recurse through the node-sets, I use
        <xsl:call-template name="recurse">
          <xsl:with-param name="nodesA" select="$nodesA[position()>1]" />
          <xsl:with-param name="nodesB" select="$nodesB" />
        </xsl:call-template>
so, even assuming we have tail-recursion optimization, we have
to create a new nodeset $nodesA[position()>1] for the recursive step!
Which could mean you have O(n*n + m*m) time.  :-(
Oh, for an equivalent of cdr!  (Maybe there is one in the extensions...
for now I'm not going to venture far from pure v1.0.)

In case anyone is interested, here's the code I have so far.
It runs under Saxon, but doesn't generate correct results;
I don't know how to ask whether stringA comes before stringB
alphabetically.

Suggestions are welcome.  As far as I'm concerned, this is just
for curiosity's sake.

Lars


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:exslt="http://exslt.org/common"; version="1.0">

  <!-- Replace // with / everywhere if we're only interested
        in immediate children of /RootElement. -->

  <xsl:variable name="docA" select="/" />
  <xsl:variable name="docB" select="document('try-compare2.xml')"/>

  <!-- This produces a whole nother copy of both docs!
       So, is the performance cost worth it?? -->

  <xsl:variable name="sortedNodesA">
    <!-- produce a sorted, flattened RTF of A's nodes -->
    <xsl:for-each select="$docA/RootElement//*">
      <xsl:sort select="name()" />
      <xsl:copy-of select="." />
    </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="sortedNodesB">
    <!-- produce a sorted, flattened RTF of B's nodes -->
    <xsl:for-each select="$docB/RootElement//*">
      <xsl:sort select="name()" />
      <xsl:copy-of select="." />
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:call-template name="recurse">
      <xsl:with-param name="nodesA"
        select="exslt:node-set($sortedNodesA)/*" />
      <xsl:with-param name="nodesB"
        select="exslt:node-set($sortedNodesB)/*" />
    </xsl:call-template>
  </xsl:template>
 
  <xsl:template name="recurse">
    <xsl:param name="nodesA" />
    <xsl:param name="nodesB" />
    <xsl:variable name="nameA" select="name($nodesA[1])" />
    <xsl:variable name="nameB" select="name($nodesB[1])" />

    <xsl:choose>
      <!-- Hopefully name(emptyNodeSet) returns ''?  Seems to work. -->
      <xsl:when test="$nameA = '' and $nameB = ''">
        <!-- end recursion -->
        <p>Debug message: end of recursion.</p>
      </xsl:when>
      <!-- Oops, this doesn't work.  I don't know how to
           do string comparison. -->
      <xsl:when test="$nameB > $nameA">
        <p><xsl:value-of select="$nameA" /> is only in doc A.</p>
        <xsl:call-template name="recurse">
          <xsl:with-param name="nodesA" select="$nodesA[position()>1]" />
          <xsl:with-param name="nodesB" select="$nodesB" />
        </xsl:call-template>
      </xsl:when>
      <!-- Oops, this doesn't work.  I don't know how to
           do string comparison. -->
      <xsl:when test="$nameA > $nameB">
        <p><xsl:value-of select="$nameB" /> is only in doc B.</p>
        <xsl:call-template name="recurse">
          <xsl:with-param name="nodesA" select="$nodesA" />
          <xsl:with-param name="nodesB" select="$nodesB[position()>1]" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <p><xsl:value-of select="$nameB" /> is in both documents.
          <!-- Do I need string(text(...))? -->
          <xsl:if
            test="string($nodesA[1]/text()) != string($nodesB[1]/text())">
            But their contents differ:
            '<xsl:value-of select="$nodesA[1]/text()" />' !=
            '<xsl:value-of select="$nodesB[1]/text()" />'.
          </xsl:if>
        </p>
        <xsl:call-template name="recurse">
          <xsl:with-param name="nodesA" select="$nodesA[position()>1]" />
          <xsl:with-param name="nodesB" select="$nodesB[position()>1]" />
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list