xsl-list
[Top] [All Lists]

Re: [xsl] efficient traversal of combined collections in XSLT 3.0

2012-11-27 03:58:39
On Sat, Nov 24, 2012 at 03:27:24PM +0000, Michael Kay scripsit:
The way we do this in maintaining the XSLT/XQuery specs (admittedly
much smaller than your 4GB) is to maintain a derived document
containing a list of valid link targets. This is regenerated when
the base documents change, which is less frequently than the list is
used. The list of valid anchors is much smaller than the base
documents, so it can be loaded more quickly, and uses less memory.

That gets saxon:discard-document() to work.  (well, up until the point
the transform fails with no error message _and_ closing the outer loop;
something, somewhere, is awful in the input.  Which is not a surprise
but is hard to find!)

I _suspect_, but could not take the time to prove, that the use of 

for $x in collection($pathToContent) return
(saxon:discard-document($x)//link,saxon:discard-document($x)//target[not(.//link)])

means that discard-document can't tell it is supposed to let go.
Separating those out into distinct for-each statements made things
behave in a much more useful fashion.

Also, generating the list of anchors is an operation that can be
streamed; hopefully the resulting list is small enough that it can
be held in memory for look-up purposes.

It can; once I've got the list of anchors the compare runs in about
fifteen seconds.

Thank you!

-- Graydon, who keeps getting freaked out by the orders-of-magnitude
run-time differences from apparently small code changes

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--