xsl-list
[Top] [All Lists]

RE: Testing 2 XML documents for equality - a solution

2005-03-30 09:13:40
Hi,
      The following is a link I came across on normalizing whitespace in XML
files. Maybe the 2 files need to be normalized before comparing which would
obviate the need to normalize in the comparison stylesheet .

Cheers,
Omprakash.V

http://www.oracle.com/technology/sample_code/tutorials/parser/saxnorm/toc.ht
m




-----Original Message-----
From: Jim Neff [mailto:jneff(_at_)blockvision(_dot_)com]
Sent: Wednesday, March 30, 2005 9:05 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Testing 2 XML documents for equality - a solution


What if the nodes are not in the same order?  From my understanding, XSLT
doesn't care (nor should it) the order of sibling nodes (nodes at the same
level).

Maybe we could do some pre-processing to massage our source documents into
some order if your going to do just one equals test.

Some other things to consider, If this were a very large document, would it
be faster to look for obvious differences first, like same number of nodes
at each level (if there weren't a lot of levels it wouldn't take much
time--depends on structure of expected documents though).  I guess we are
trying to avoid going through each and every node in one document, reading
its name and value, then comparing it with every other node in the second
document.

Great idea though.

--Jim


-----Original Message-----
From: Mukul Gandhi [mailto:mukul_gandhi(_at_)yahoo(_dot_)com]
Sent: Wednesday, March 30, 2005 10:29 AM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Testing 2 XML documents for equality - a solution

Hello,
  I was playing with XSLT. I thought could there be a nice
way (with XSLT 1.0) to test 2 XML documents for equality. Two
XML documents will be considered equal if all their nodes are
identical(i.e. element, text, attribute, namespace etc).

I found few approaches for this in the FAQ (URL -
http://www.dpawson.co.uk/xsl/sect2/N1777.html) .
Indeed they are good work.. But I could come up with an
elegant way. It uses no extension functions. Below is the XSLT ..

<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0">

 <xsl:output method="text" />

 <!-- parameter for "ignoring white-space only text nodes"
during comparison -->
 <!-- if iws='y', "white-space only text nodes" will not be
considered during comparison  -->  <xsl:param name="iws" />

 <xsl:variable name="doc1"
select="document('file1.xml')" />
 <xsl:variable name="doc2"
select="document('file2.xml')" />

 <xsl:template match="/">

    <!-- store hash of 1st document into a variable;
    it is concatination of name and values of all nodes -->
    <xsl:variable name="one">
      <xsl:for-each select="$doc1//@*">
        <xsl:value-of select="name()" /><xsl:value-of select="." />
      </xsl:for-each>
      <xsl:choose>
        <xsl:when test="$iws='y'">
          <xsl:for-each
select="$doc1//node()[not(normalize-space(self::text())
= '')]">
            <xsl:value-of select="name()"
/><xsl:value-of select="." />
          </xsl:for-each>
        </xsl:when>
        <xsl:otherwise>
          <xsl:for-each select="$doc1//node()">
          <xsl:value-of select="name()" /><xsl:value-of select="." />
          </xsl:for-each>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <!-- store hash of 2nd document into a variable;
    it is concatination of name and values of all nodes -->
    <xsl:variable name="two">
      <xsl:for-each select="$doc2//@*">
        <xsl:value-of select="name()" /><xsl:value-of select="." />
      </xsl:for-each>
      <xsl:choose>
         <xsl:when test="$iws='y'">
           <xsl:for-each
select="$doc2//node()[not(normalize-space(self::text())
= '')]">
             <xsl:value-of select="name()"
/><xsl:value-of select="." />
           </xsl:for-each>
         </xsl:when>
         <xsl:otherwise>
           <xsl:for-each select="$doc2//node()">
                   <xsl:value-of select="name()"
/><xsl:value-of select="." />
           </xsl:for-each>
         </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="$one = $two">
        Equal
      </xsl:when>
      <xsl:otherwise>
        Not equal
      </xsl:otherwise>
    </xsl:choose>
 </xsl:template>

</xsl:stylesheet>

In this stylesheet, I am relying on 2 features -
node() function and @* . node() function matches any node
other than an attribute node and the root node.
While @* matches any attribute. So I guess this XSLT can
cater to all cases ;) . I have done limited testing with
"element, text and attribute nodes only"
and have got favourable results..

Another feature that I have incorporated in the stylesheet
is, "controlling whether white space only text nodes should
be considered during comparison".
This is done with a stylesheet parameter iws. If it is "y",
white space only text nodes will be ignored during
comparison. If it is other than "y" or is not supplied, white
space only text nodes will make a difference to the 2 documents.

If anybody cares to test this stylesheet and report any
observations, I'll be happy!

Regards,
Mukul




__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--