xsl-list
[Top] [All Lists]

Re: Testing 2 XML documents for equality - a solution

2005-03-31 10:32:05
Hi David,
  I solved the bug pointed by you below. Below is the
modified stylesheet..

The new features in this version are:
1) Having a named template to calculate the XPath
expression(in string format) for element nodes. So now
the XPath of the node will also be included in the
document hash. This will help in ensuring uniqueness
of the node in the hash.

2) I am also concatinating the count of
"ancestor-or-self & preceding" nodes (i.e. union of
it) in the hash. This adds additional unique thing. 
This was neccessary because: 
 2.1 Only XPath expression was not sufficient, and 
 2.2 Only counting along ancestor-or-self axis was not
sufficient. 

My algorithm is not namespace aware. I'll try this
case later.

I tested with these XML documents (which you posted
below)
<x>
 <y a="2"/>
 <y/>
</x>

<x>
 <y/>
 <y a="2"/>
</x>

They are reported
Not equal

While "same" documents are reported
Equal

I hope this version is better (and probably bug
free)..

<?xml version="1.0"?> 
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0">
 
 <xsl:output method="text" />  
 
 <!-- parameter for "ignoring white-space only text
nodes" during comparison -->
 <!-- if iws='y', "white-space only text nodes" will
not be considered during comparison  -->
 <xsl:param name="iws" />
 
 <xsl:variable name="doc1"
select="document('file1.xml')" />
 <xsl:variable name="doc2"
select="document('file2.xml')" />
 
 <xsl:template match="/">
 
    <!-- store hash of 1st document into a variable;
    it is concatination of name and values of all
nodes -->
    <xsl:variable name="one">
      <xsl:for-each select="$doc1//@*">
        <xsl:sort select="name()" />
        <xsl:variable name="expr">
          <xsl:call-template
name="constructXPathExpr">
            <xsl:with-param name="node" select=".." />
            <xsl:with-param name="xpath" select="name(..)" />
          </xsl:call-template>
        </xsl:variable>
        <xsl:value-of
select="concat($expr,'/@',name(),':',.)"
/>:<xsl:value-of
select="count(../ancestor-or-self::node() |
../preceding::node())" /> 
      </xsl:for-each>
      <xsl:choose>
         <xsl:when test="$iws='y'">
           <xsl:for-each
select="$doc1//node()[not(normalize-space(self::text())
= '')]">
             <xsl:variable name="expr">
               <xsl:call-template
name="constructXPathExpr">
                 <xsl:with-param name="node"
select="ancestor-or-self::*[1]" />
                 <xsl:with-param name="xpath"
select="name(ancestor-or-self::*[1])" />
               </xsl:call-template>
             </xsl:variable>
             <xsl:value-of
select="concat($expr,'/',name(),':',.)"
/>:<xsl:value-of
select="count(ancestor-or-self::node() |
preceding::node())" />  
           </xsl:for-each>
         </xsl:when>
         <xsl:otherwise>
           <xsl:for-each select="$doc1//node()">
              <xsl:variable name="expr">
                <xsl:call-template
name="constructXPathExpr">
                  <xsl:with-param name="node"
select="ancestor-or-self::*[1]" />
                  <xsl:with-param name="xpath"
select="name(ancestor-or-self::*[1])" />
                </xsl:call-template>
              </xsl:variable>
              <xsl:value-of
select="concat($expr,'/',name(),':',.)"
/>:<xsl:value-of
select="count(ancestor-or-self::node() |
preceding::node())" />  
           </xsl:for-each>
         </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>  
    
    <!-- store hash of 2nd document into a variable;
    it is concatination of name and values of all
nodes -->
    <xsl:variable name="two">
      <xsl:for-each select="$doc2//@*">
        <xsl:sort select="name()" />
        <xsl:variable name="expr">
          <xsl:call-template name="constructXPathExpr">
            <xsl:with-param name="node" select=".." />
            <xsl:with-param name="xpath" select="name(..)" />
          </xsl:call-template>
        </xsl:variable>
        <xsl:value-of
select="concat($expr,'/@',name(),':',.)"
/>:<xsl:value-of
select="count(../ancestor-or-self::node() |
../preceding::node())" />  
      </xsl:for-each>
      <xsl:choose>
         <xsl:when test="$iws='y'">
           <xsl:for-each
select="$doc2//node()[not(normalize-space(self::text())
= '')]">
             <xsl:variable name="expr">
               <xsl:call-template name="constructXPathExpr">
                 <xsl:with-param name="node"
select="ancestor-or-self::*[1]" />
                 <xsl:with-param name="xpath"
select="name(ancestor-or-self::*[1])" />
               </xsl:call-template>
             </xsl:variable>
             <xsl:value-of
select="concat($expr,'/',name(),':',.)"
/>:<xsl:value-of
select="count(ancestor-or-self::node() |
preceding::node())" />  
           </xsl:for-each>
         </xsl:when>
         <xsl:otherwise>
           <xsl:for-each select="$doc2//node()">
             <xsl:variable name="expr">
               <xsl:call-template name="constructXPathExpr">
                 <xsl:with-param name="node"
select="ancestor-or-self::*[1]" />
                 <xsl:with-param name="xpath"
select="name(ancestor-or-self::*[1])" />
               </xsl:call-template>
             </xsl:variable>
             <xsl:value-of
select="concat($expr,'/',name(),':',.)"
/>:<xsl:value-of
select="count(ancestor-or-self::node() |
preceding::node())" />  
           </xsl:for-each>
         </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>  
    <xsl:choose>
      <xsl:when test="$one = $two">
        Equal
      </xsl:when>
      <xsl:otherwise>
        Not equal        
      </xsl:otherwise>
    </xsl:choose>
 </xsl:template>
 
 <!-- a template to construct an XPath expression, for
a given element node -->
 <xsl:template name="constructXPathExpr">
   <xsl:param name="node" />
   <xsl:param name="xpath" />
      
   <xsl:choose>       
     <xsl:when test="$node/parent::*">
       <xsl:call-template name="constructXPathExpr">
          <xsl:with-param name="node"
select="$node/parent::*" />
          <xsl:with-param name="xpath"
select="concat(name($node/parent::*),'/',$xpath)" />
       </xsl:call-template>
     </xsl:when>
     <xsl:otherwise>
       <xsl:value-of select="concat('/',$xpath)" />
     </xsl:otherwise>
   </xsl:choose>
 </xsl:template>

</xsl:stylesheet>

I'll also explain few other things which will help in
understanding the algorithm easily..

1) For attribute nodes, I am constructing XPath
expression of their elements

2) The named template constructXPathExpr accepts 2
arguments:
   "element node itself" and "element node's name". 

 2.1 For non attribute nodes, parameters are written
like this -
     <xsl:with-param name="node"
select="ancestor-or-self::*[1]" />
     <xsl:with-param name="xpath"
select="name(ancestor-or-self::*[1])" />

     (So we get the nearest element node along
ancestor-or-self axis)
 
 2.2 And for attribute nodes, it is written like -
     <xsl:with-param name="node" select=".." />
     <xsl:with-param name="xpath" select="name(..)" />

     (This points to the attribute's element)

I'll be happy if you (or others!) can test the
stylesheet further and report any defects.. I'll be
obliged.

Regards,
Mukul


--- David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk> wrote:

      <xsl:for-each select="$doc1//@*">
        <xsl:sort select="name()" />
        <xsl:value-of select="name()"
/>:<xsl:value-of
    select="." />:<xsl:value-of select="name(..)"
      />:<xsl:value-of
     select="count(../ancestor-or-self::node())"
/> 
      </xsl:for-each>

No.  You can't use //@* for this at all.
You have to do normalise the attributes for each
element separately, ie
inline the string from each attribute along with
the
string for each
element.
<x>
 <y a="2"/>
 <y/>
</x>
is equal to
<x>
 <y/>
 <y a="2"/>
</x>

by the above as you only reecord that the a
attribute is on a level 2 y
element, you don't record which element it is on.

What is your definition of equality that you are
trying to implement?
This definition (even if corrected) is not
namespace
aware so
<x:foo xmlns:x="a"/> would be different from
<y:foo
xmlns:y="a"/>
but equal to <x:foo xmlns:x="b"/>
so the definition of equality wouldn't be much use
for any XPath use,
two "equal" inputs would behave diffently as input
to a stylesheet.

David



________________________________________________________________________
This e-mail has been scanned for all viruses by
Star. The
service is powered by MessageLabs. For more
information on a proactive
anti-virus service working around the clock,
around
the globe, visit:
http://www.star.net.uk


________________________________________________________________________



--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:

<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




              
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile
phone. 
http://mobile.yahoo.com/maildemo 


--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




                
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--