RE: How Do I Generate A Set-Difference With Context - Part A
2005-03-12 17:52:23
This continues my earlier post, unfortunately unresponded to, on the same
subject. The original post is here:
http://www.biglist.com/lists/xsl-list/archives/200503/msg00332.html
I've put on the back burner my initial quest of writing a generic
diff-finding stylesheet. While I know I could do this in a, say, procedural
programming language (walk the trees, compare children, recurse), and since
[I read that] XSLT is Turing Complete, I know I can achieve this in XSLT as
well. Just not right away.
The diff quest came from the following problem: I get periodic XML "feeds"
from a news syndicate; these feeds are parsed, formatted in HTML, and
published on a website. Each feed is an XML file, and contains zero or more
"stories". A story may be exactly like that in the immediately-prior feed,
may be slightly different, or may be completely new. Hence my desire to
"diff" 2 feeds rather than simply regenerate all stories. When only, say,
20 stories change among 1000+ stories, this is a processing win.
I've modified my initial diff quest to one that is content with looking for
differences in known nodes of an input document. In this case I
pass-through everything other than a <story> node; when I encounter a
<story> node I look for a matching story in the prior feed, and pass through
if not found. "matching" means local-names are same, attributes are same in
number, name and value, and all child nodes of <story> are present and
similarly equal to those in the prior <story>, if any. I realize "match"
may be application dependant.
I used an augmented vset:difference from "XSLT Cookbook" to determine
<story> differences. MSXML 4x (?) from a .NET app took 6-something minutes
for a 1200-node transform. Saxon v653, JDK 1.4, from the commandline took
4-something minutes. Then I used keys to "index" stories in the prior,
compared-to document. Now Saxon 653 took 42 seconds for the transform.
Then I modified vset:difference to "short-circuit" at the first difference;
now Saxon 653 took 30 seconds for the transform. I can't think of other
algorithmic improvements to make; if anybody else can, please post.
Now for a question: each <story> has, say, a @date attribute; multiple
stories in 1 feed may have the same @date value. If I find a changed (or
new) story, I want to pass through not just that story, but also all other
stories with that same @date value, no matter they're unchanged.
My first thougth about accomplishing this is to perform some sort of
pipeline processing (it's immaterial if it's done in-situ in the XSLT
processor). The first pass would generate a diff result, and a second pass
would add back all stories of the same [unique] dates as the ones in the
diff result.
It would be nice to do this in one [conceptual] pass:
--for every story check if it should pass through
--if a passthrough also pass through otheries stories with the same @date
Of course, a naive approach would have stories output multiply, unless I
somehow "remember" what [say, dates] has already passed through. I could do
that in a procedural language, but don't know how to [efficiently] in a
functional one such as XSLT. While I read up on functional programming
techniques, if anybody can suggest a way to accomplish this, please post.
Here are 2 sample feeds:
feed1.xml
======
<feed>
<story @date="1" text="a"/>
<story @date="2" text="a"/>
</feed>
feed2.xml
======
<feed>
<story @date="1" text="a"/>
<story @date="1" text="b"/>
<story @date="2" text="a"/>
</feed>
If feed2.xml were the "current" feed, and feed1.xml the propr one, I want to
end up with the following, in 1 pass.
final.xml
=====
<feed>
<story @date="1" text="a"/>
<story @date="1" text="aa"/>
</feed>
Thank you
--A
_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
|
|