xsl-list
[Top] [All Lists]

[xsl] Processing Memory-Hungry Data Sets with XSLT 2

2008-03-11 15:51:29
I'm implementing some DITA processing that is applied against a large tree of maps and topics referenced from the maps in order to generate HTML from the maps and the topics. There are 10s of 1000s of maps and topics.

I have two processors: one is essentially an identity transform that process the map tree and copies it to the output with a little bit of modification. The other is the XML-to-HTML transform. It is still essentially a one-to-one file-to-file transform but the result files are HTML instead of copies. The process essentially does a top-down process of the tree of maps, which consist of either links to submaps or links to topics. Submaps are loaded and their topic links processed. Links to topics result in loading the target topics and processing them normally to generate HTML output. This obviously results in a lot of source and target documents in memory. The business logic is very simple, it's just a lot of data.

Using Saxon 9 the first script can process my entire corpus but the second one (the HTML generator) fails about 1/2 way through with an out of memory failure using the largest VM I can request under OS X (2Gig).

I tried using Saxon's extension discard-document() method but that appeared to have no effect (I didn't really expect it to since I don't think anything referenced ever gets unreferenced).

My question is, are there any XSLT 2 techniques that might help avoid this type of memory usage issue that are generic (as opposed to Saxon specific)? I can think of several multi-pass approaches involving the creation of intermediate files that would work but time is short so I'm trying to keep this as simple as I can and still have it work, so I was hoping there might be some clever way to make an otherwise naive top-down process more memory efficient.

If the only answer is Saxon-specific then I'll move my question to the Saxon list.

Thanks,

Eliot
--
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>