xsl-list
[Top] [All Lists]

RE: Testing 2 XML documents for equality - a solution

2005-03-31 15:09:18
You're still struggling a bit.

Let's start with requirements. What is this for? This is part of the
difficulty: there are many reasons for wanting to compare two XML documents,
and the different requirements don't necessarily lead to the same
specification. If you describe some use cases this will help you on the way.
For example, it will tell you whether it's enough to give a boolean answer,
or whether you need to pinpoint where the two trees differ.

The next step is specification. This doesn't have to be mathematical, but it
does have to be rigorous. Specifying it in terms of a comparison of two
drawings of the trees being alike isn't going to be helpful. I know what
you're getting at: you're trying to say that there's a one-to-one
correspondence between the nodes and arcs in one tree and the nodes and arcs
in the other. But you haven't said which properties of the nodes are
important (namespace prefix? base uri? type annotation?), you haven't said
how you will compare values (string comparison, with or without Unicode
normalization? Collations? typed value comparison?), and you haven't said
how you will handle the significance of ordering.

Finally, implementation (which is where you started). Before you embark on
an implementation you should have an idea of the use cases (see above) and
their performance requirements. For example, is the algorithm to be
optimized for comparing trees that are probably the same or very similar, or
for comparing trees that are likely to be wildly different?

Sorry if this is a bit severe: but you did ask for help. 

Michael Kay
http://www.saxonica.com/



-----Original Message-----
From: Mukul Gandhi [mailto:mukul_gandhi(_at_)yahoo(_dot_)com] 
Sent: 31 March 2005 22:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Testing 2 XML documents for equality - a solution

Hi Dimitre,
  Below is the "scope" of my solution. My definition
of equality of XML documents consists of 2 parts:

Part 1) Node types, to which the stylesheet does
comparison
-------
"XPath 1.0" trees define 7 kinds of nodes. These are
listed below. I have marked yes or no against node
types, indicating whether my stylesheet has logic to
compare these nodes. If XML documents have nodes of
kind which are marked "no", then my stylesheet may
give wrong result(I have not done any testing for no
marked nodes)..

root nodes - yes
element nodes - yes
text nodes - yes
attribute nodes - yes
namespace nodes - no
processing instruction nodes - no
comment nodes - no

Part 2) My notion of equality of 2 XML documents
-------
Imagine that the XPath tree of 2 documents are *drawn
on paper*. The diagram is just similar to the XPath
tree diagram in Mike's book (XSLT 2nd Edition,
Programmer's Reference) page 57(section "The Tree
Model"). 

If XPath tree of 2 XML documents will "look same" on
paper (as in Mike's book's page 57), the documents
will be considered equal by my stylesheet. 

The scope of my stylesheet presently covers only these
2 points.

I don't claim any other capability from my stylesheet.

I have not attempted to equate the XML documents in
terms of mathematical terms (like relations as you
mentioned; the subject I don't understand well) or
canonical terms(as defined by the canonical XML spec).

So considering the above scope of my work, can my
stylesheet be evaluated for correctness? 

I have deep regard for people who participated on this
thread.. They surely have deep knowledge of the
subject.

Regards,
Mukul

--- Dimitre Novatchev <dnovatchev(_at_)gmail(_dot_)com> wrote:
Hi Mukul,


On Thu, 31 Mar 2005 04:36:32 -0800 (PST), Mukul
Gandhi
<mukul_gandhi(_at_)yahoo(_dot_)com> wrote:
Hi Dimitre,
 I am really not good at mathematics at this
level. I
did studied about relations like "symmetric,
reflexive
and transitive" time back. But I did so just to
score
grades. I had no idea then their practical use..
It is
indeed enlightening for me to know they have real
practical use (in XML & XSLT!). I cannot define my
problem in these terms.. As my knowledge is
limited.

This confirms the conclusion that here we see
attempts at offering a
solution to a problem that is not well defined.

How can we then judge the solution? 


I would be happy if you can define in these
precise
terms the problem I am trying to solve(based on my
earlier posts to this thread).

Impossible.

 I'll keep it as a
reference for future use. I defined the problem (I
am
trying to solve) from an average programmer's
point of
view.. And I think that it is quite understandable
to
an average programmer ;)

A number of very wise people already explained why
this is difficult
to define -- they also found holes in your
definition (and
understanding) of the problem. These people
obviously are not average
programmers.

Cheers,
Dimitre Novatchev.


--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




              
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--