RE: [xsl] Processing Memory-Hungry Data Sets with XSLT 2

Almost any performance question is processor-specific to some extent.
However, it's not unlikely that different processors use similar
implementation techniques much of the time.

Given your description of the problem, I would be looking for unnecessary
temporary trees and copy operations. With Saxon it's usually the case that
tree-construction (xsl:variable with content and no "as" attribute) is done
eagerly, whereas sequence construction (xsl:variable with a select
attribute) is done lazily.

But with performance the devil is always in the detail, and sometimes it can
be in quite surprising places in the detail.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Eliot Kimber [mailto:ekimber(_at_)reallysi(_dot_)com] 
Sent: 11 March 2008 19:51
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Processing Memory-Hungry Data Sets with XSLT 2

I'm implementing some DITA processing that is applied against 
a large tree of maps and topics referenced from the maps in 
order to generate HTML from the maps and the topics. There 
are 10s of 1000s of maps and topics.

I have two processors: one is essentially an identity 
transform that process the map tree and copies it to the 
output with a little bit of modification. The other is the 
XML-to-HTML transform. It is still essentially a one-to-one 
file-to-file transform but the result files are HTML instead 
of copies. The process essentially does a top-down process of 
the tree of maps, which consist of either links to submaps or 
links to topics. Submaps are loaded and their topic links 
processed. Links to topics result in loading the target 
topics and processing them normally to generate HTML output. 
This obviously results in a lot of source and target 
documents in memory. The business logic is very simple, it's 
just a lot of data.

Using Saxon 9 the first script can process my entire corpus 
but the second one (the HTML generator) fails about 1/2 way 
through with an out of memory failure using the largest VM I 
can request under OS X (2Gig).

I tried using Saxon's extension discard-document() method but 
that appeared to have no effect (I didn't really expect it to 
since I don't think anything referenced ever gets unreferenced).

My question is, are there any XSLT 2 techniques that might 
help avoid this type of memory usage issue that are generic 
(as opposed to Saxon specific)? I can think of several 
multi-pass approaches involving the creation of intermediate 
files that would work but time is short so I'm trying to keep 
this as simple as I can and still have it work, so I was 
hoping there might be some clever way to make an otherwise 
naive top-down process more memory efficient.

If the only answer is Saxon-specific then I'll move my 
question to the Saxon list.

Thanks,

Eliot
--
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--