I think it might be interesting for the discussion:
I developed an extension jd.bigxml [1] for my XSLT processor jd.xslt [2] to
process arbitrary large XML documents.
It is implemented on the tree model "layer" using some sort of paging
mechanism: Instead of building an object tree of the whole document, a
temporary index file is created and only the current accessed parts of the
document are loaded into memory.
Advantage: No stylesheet information or analysis is required. In fact, the
core XSLT engine is completely agnostic of the tree model implementation
and has not been changed in any way to support the bigxml version.
To give some numbers from early experiments:
Transformation of a 35 MB document showed on my computer (equipped with 256
MB RAM)
- jd.xslt with the normal tree model: done in 26 seconds, using 254 MB peek
memory
- jd.xslt with the big tree model: done in 27 seconds, using 13 MB peek memory
- Xalan and other Java processors aborted with a OutOfMemoryError (when
invoked from their standard commandline interface)
regards,
Johannes Döbler
[1] http://www.aztecrider.com/bigxml/
[2] http://www.aztecrider.com/xslt/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list