This continues my earlier post, unfortunately unresponded to,
on the same subject. The original post is here:
http://www.biglist.com/lists/xsl-list/archives/200503/msg00332.html
It's bad tactics to suggest that you're 80% of the way to a solution, and
not show us the 80%. No-one wants to redo the work you've already done on
the off-chance that they'll be able to help you with the final 20%.
That's especially true as it's a difficult problem and one has to do a lot
of guesswork about the requirements. What output would you expect if the two
source documents are:
<a><b/><c/></a>
and
<a><c/><b/></a>
?
The diff quest came from the following problem: I get
periodic XML "feeds"
from a news syndicate; these feeds are parsed, formatted in HTML, and
published on a website. Each feed is an XML file, and
contains zero or more
"stories". A story may be exactly like that in the
immediately-prior feed,
may be slightly different, or may be completely new. Hence
my desire to
"diff" 2 feeds rather than simply regenerate all stories.
When only, say,
20 stories change among 1000+ stories, this is a processing win.
This looks a rather easier problem, because order is irrelevant.
It's fairly easy, I would have thought, to identify a <story> in one file
for which there is no corresponding <story> in the other. Identifying
finer-grained differences seems to require making some assumptions: what if
one story in the second file is similar to two stories in the first file,
but not identical to either?
I used an augmented vset:difference from "XSLT Cookbook" ... I can't
think of other
algorithmic improvements to make; if anybody else can, please post.
Sorry, but suggesting improvements to code I haven't seen is beyond my
abilities.
Michael Kay
http://www.saxonica.com/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--