xsl-list
[Top] [All Lists]

Re: Testing 2 XML documents for equality - a solution

2005-04-04 09:19:15

--- David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk> wrote:

For the vast majority of nodes this is still a) very
expensive way of
comparing them and b) doesn't help with the
comparison.

I agree ! I understand that generating the string hash
of the entire XML document is a expensive operation..
If I reflect deeply, I would imagine that even if 2
XML documents are different, they may generate same
concatenated string representation.. So my algorithm
will probably fail in some cases. But I have no proof
of my this new view. The XML examples with which I
worked over my stylesheet, gave right answer as I
expected. I'll test more to see if it shall fail for
some cases..

For a given element node if you calculate an XPath
to the current node,
and then use that XPath to find a node in the other
document, you have
two nodes, you then need to compare whether they are
equal, but that is
_exactly_ the problem you are trying to solve. The
earlier stylesheet
just took the string value of the node but that is
just the
concatenation of all the element content so loses
most of the markup
information. 

I think you are right! (as always :) )

What is wrong with the much simpler alternative of
just writing out the
string corresponding to a specific "canonical"
linearisation, and then
jsut comparing those two strings?

I think I should explore this option. But I believe
that converting a XML document to canonical form is
not a trivial task. For e.g. we need to convert
documents to UTF-8 . i.e. if XML document has encoding
ISO-8859-1 , then its canonical representation will
have UTF-8 encoding .. (this I think cannot be easily
accomplished with XSLT; infact I think it is
impossible with XSLT?) . I think, there are also other
canonicalization conversion rules which cannot be
easily done with XSLT. 

I think by using a SAX parser, it is probably easier
to convert XML to canonical form (ofcourse one must
know all the rules as well!)..

Regards,
Mukul

David


________________________________________________________________________
This e-mail has been scanned for all viruses by
Star. The
service is powered by MessageLabs. For more
information on a proactive
anti-virus service working around the clock, around
the globe, visit:
http://www.star.net.uk

________________________________________________________________________


--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




                
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--