xsl-list
[Top] [All Lists]

RE: Testing 2 XML documents for equality - a solution

2005-04-01 09:47:18
Hi Mike,
  Thanks for your response. Sorry for a bit late to
reply as I was not sure how should I frame my this
reply. Now handling your and Dimitre's comments is
getting difficult for me!

I am taking up defining my problem of "equating 2 XML
documents" as a seperate assignment! I'll define my
problem first, and then do the XSLT coding for my
"equivalence definition". 

I think there will be 3 aspects of the definition:

1) A equivalence defintion of trees from a pure
computer science point of view. Probably in terms of
sets and relations.

2) A equivalence defintion in term of "XPath trees".
XPath nodes have characteristics like namespace
prefix, base uri, type annotation etc as you said
below. 
I'll try to map definitions 2) with 3) and generate a
mixture:)

3) A equivalence definition from physical storage
point of view. I'll try to do this. But I am not sure
whether I'll succeed. I am not sure whether different
OS store text files(which will be XML documents) in
different ways. Can I compare a byte stream on Unix
with a byte stream on Windows? And can I define
equivalence at hardware level i.e. storage in memory
locations(-:)) ?

Studying 1) and 2) will be my priority. 

I'll do this work at my own pace.. I'll get back to
the list when I am ready!

Regards,
Mukul

--- Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:

You're still struggling a bit.

Let's start with requirements. What is this for?
This is part of the
difficulty: there are many reasons for wanting to
compare two XML documents,
and the different requirements don't necessarily
lead to the same
specification. If you describe some use cases this
will help you on the way.
For example, it will tell you whether it's enough to
give a boolean answer,
or whether you need to pinpoint where the two trees
differ.

The next step is specification. This doesn't have to
be mathematical, but it
does have to be rigorous. Specifying it in terms of
a comparison of two
drawings of the trees being alike isn't going to be
helpful. I know what
you're getting at: you're trying to say that there's
a one-to-one
correspondence between the nodes and arcs in one
tree and the nodes and arcs
in the other. But you haven't said which properties
of the nodes are
important (namespace prefix? base uri? type
annotation?), you haven't said
how you will compare values (string comparison, with
or without Unicode
normalization? Collations? typed value comparison?),
and you haven't said
how you will handle the significance of ordering.

Finally, implementation (which is where you
started). Before you embark on
an implementation you should have an idea of the use
cases (see above) and
their performance requirements. For example, is the
algorithm to be
optimized for comparing trees that are probably the
same or very similar, or
for comparing trees that are likely to be wildly
different?

Sorry if this is a bit severe: but you did ask for
help. 

Michael Kay
http://www.saxonica.com/



-----Original Message-----
From: Mukul Gandhi [mailto:mukul_gandhi(_at_)yahoo(_dot_)com]

Sent: 31 March 2005 22:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Testing 2 XML documents for
equality - a solution

Hi Dimitre,
  Below is the "scope" of my solution. My
definition
of equality of XML documents consists of 2 parts:

Part 1) Node types, to which the stylesheet does
comparison
-------
"XPath 1.0" trees define 7 kinds of nodes. These
are
listed below. I have marked yes or no against node
types, indicating whether my stylesheet has logic
to
compare these nodes. If XML documents have nodes
of
kind which are marked "no", then my stylesheet may
give wrong result(I have not done any testing for
no
marked nodes)..

root nodes - yes
element nodes - yes
text nodes - yes
attribute nodes - yes
namespace nodes - no
processing instruction nodes - no
comment nodes - no

Part 2) My notion of equality of 2 XML documents
-------
Imagine that the XPath tree of 2 documents are
*drawn
on paper*. The diagram is just similar to the
XPath
tree diagram in Mike's book (XSLT 2nd Edition,
Programmer's Reference) page 57(section "The Tree
Model"). 

If XPath tree of 2 XML documents will "look same"
on
paper (as in Mike's book's page 57), the documents
will be considered equal by my stylesheet. 

The scope of my stylesheet presently covers only
these
2 points.

I don't claim any other capability from my
stylesheet.

I have not attempted to equate the XML documents
in
terms of mathematical terms (like relations as you
mentioned; the subject I don't understand well) or
canonical terms(as defined by the canonical XML
spec).

So considering the above scope of my work, can my
stylesheet be evaluated for correctness? 

I have deep regard for people who participated on
this
thread.. They surely have deep knowledge of the
subject.

Regards,
Mukul

--- Dimitre Novatchev <dnovatchev(_at_)gmail(_dot_)com>
wrote:
Hi Mukul,


On Thu, 31 Mar 2005 04:36:32 -0800 (PST), Mukul
Gandhi
<mukul_gandhi(_at_)yahoo(_dot_)com> wrote:
Hi Dimitre,
 I am really not good at mathematics at this
level. I
did studied about relations like "symmetric,
reflexive
and transitive" time back. But I did so just
to
score
grades. I had no idea then their practical
use..
It is
indeed enlightening for me to know they have
real
practical use (in XML & XSLT!). I cannot
define my
problem in these terms.. As my knowledge is
limited.

This confirms the conclusion that here we see
attempts at offering a
solution to a problem that is not well defined.

How can we then judge the solution? 


I would be happy if you can define in these
precise
terms the problem I am trying to solve(based
on my
earlier posts to this thread).

Impossible.

 I'll keep it as a
reference for future use. I defined the
problem (I
am
trying to solve) from an average programmer's
point of
view.. And I think that it is quite
understandable
to
an average programmer ;)

A number of very wise people already explained
why
this is difficult
to define -- they also found holes in your
definition (and
understanding) of the problem. These people
obviously are not average
programmers.

Cheers,
Dimitre Novatchev.




--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:

<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




            
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like.
Join the fun. 

http://www.advision.webevents.yahoo.com/emoticontest



--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--






--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





                
__________________________________ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--