xsl-list
[Top] [All Lists]

Re: [xsl] Diffing XML

2012-10-24 10:28:54
One of the drawbacks of deep-equal is that it doesn't tell you where the differences are; another is that it doesn't give you any control over how to do the comparison (Saxon's saxon:deep-equal() variant tries to remedy both problems). But it can certainly used as part of the solution, by quickly eliminating subtrees that don't need to be examined any further.

Michael Kay
Saxonica

On 24/10/2012 15:27, Emma Burrows wrote:
Thinking about it further - I'm wondering whether something like deep-equals might suffice. 
What the users apparently really want right now is to know which parts of the document have 
changed so they can concentrate on those when checking the output on a website. In which case, 
starting with top-level elements and iterating my way down through the children, I could in 
theory at the very least output "Something has changed in <p> number 3 in the topic 
entitled 'Topic Title'".

I realise there are many pitfalls ahead and of course the minute they see it, they will 
say "oh, but can't you make it do X?", but if I can convince them they don't 
need to know exactly what has changed (I'm an optimist), that might help. Or is there an 
even better way? (Assuming one were daft enough to take on such a project :)

Emma


-----Original Message-----
From: Emma Burrows [mailto:Emma(_dot_)Burrows(_at_)rpharms(_dot_)com]
Sent: 24 October 2012 14:48
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Diffing XML

Thanks Michael,

Thanks for the response. Yes, I'm thinking doing it entirely myself might a bit 
too ambitious. The data is relatively stable at this point and gets updated 
once a month which should theoretically reduce the number of things to check 
for each time. But even so, I can tell diffing is an art.

DeltaXML does seem to offer some interesting options and it could probably be 
integrated into our CMS (given a chisel and a mallet - the CMS is getting a bit 
old), but I don't think we have any budget to buy another tool and it sounds as 
if the users have some very specific requirements (like exporting the list of 
user-friendly differences to an Excel spreadsheet!). So I was looking for ideas 
about how to tackle the problem just in case I do indeed need to implement it!

Emma


-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: 24 October 2012 11:48
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Diffing XML

In general differencing well is quite a challenge, e.g. handling an arbitrary number of 
inserted elements in either document,  addition or removal of "div" layers, 
combining/splitting of paragraphs, reformatted indentation, etc. Doing it better than a 
general-purpose product such as DeltaXML could turn out to be a project that will keep 
you busy for a while.

Michael Kay
Saxonica

On 24/10/2012 11:36, Emma Burrows wrote:
I have a requirement to produce an end-user-readable "checklist" of all the places where an XML file has 
changed since the last version, with custom explanations of what each difference is. I'm able to run diffs which are 
fine for my own purposes, but the end users need the differences spelled out more precisely in plain language (eg: 
"there is an extra paragraph here", "the text 'xyz' has changed", "the attribute 'audience' 
has been changed to 'book'" etc).

Being an XSLT developer, I'm thinking of using an XSLT stylesheet to work on the "new" version of the file, 
document() in the "old" version, and then compare the nodes in the "new" version to those in the 
"old" version, generating appropriate messages into an HTML output as I go along.

Does that sound like a reasonable approach? Are there existing tools
or examples that might do what I'm after? Any recommendations on the
best way of comparing individual nodes? I am planning to do this in
Oxygen 14 so the world is my oyster as far as XSLT is concerned. :)

Just looking for general suggestions to point me in the right direction. Thanks!


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com 
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com 
______________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com 
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>