Viral,
At 11:47 AM 4/21/2004, you wrote:
Thanks for your feedback. The fact is that my xml file is huge and I am
trying to avoid multiple passes if possible. Also I am guessing that
<xsl:key> would be a lot of overhead. Thats why I tried to do
preceding-sibling and see if in my current context if the current city that
I processed has the same state as the last city that I processed. If yes,
then just process that city. If no, then I want to create a new row, output
the state name and then process that city.
Ah I see: that makes sense. Somewhat. If that's the way you go, remember
that "[1]" predicate on preceding-sibling::*[1], since you'll need it; and
hope your processor optimizes that lookup.
You will find, however, that you can't really start and stop a row
conditionally (i.e. "if a new state, end this row and start a new one"
doesn't translate into XSLT). Rather, you'll create a whole row for a state
and then populate it with the cities that belong to it. Think in terms of
node-tree-building, not tag-writing. (Thinking tag-writing will get you
into trouble in XSLT.)
But unfortunately, it sounds like its not possible unless I do several
passes through the xml document? Any other possible suggestions? How
expensive are the xsl:key and indexing that it does?
What David just said: the key takes space but saves much time. (Since my
datasets tend to be small, not to say miniscule by some standards, I tend
to like to use keys.)
You might experiment with node-set() to make your sorted results into a set
you can process. It will let you use the technique you're proposing for the
de-duplication. How efficient it proves to be will depend on your processor
(in some processors the conversion from a result-tree-fragment to a node
set just means flipping a bit, I hear), but there's no harm in trying.
Because of certain architectural assumptions in its design, XML/XSLT
(stored as files) isn't always the best tool for working over a large data
set. You might consider pulling a database into service if it gets really
hairy: they're designed for this kind of thing, and these days you can wrap
the results of a DB query in XML easily enough. (But I'm a bit out of my
depth here; maybe others have ideas for useful approaches with monster data
sets.)
Cheers,
Wendell
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================