xsl-list
[Top] [All Lists]

Re: [xsl] Transforming large XML docs in small amounts of memory

2007-04-30 04:30:26
Andrew Welch wrote:
Much can be done, but your available options all depend on the
processor and environment you're running, and how flexible you are -
is it a pure XSLT 1.0/2.0 solution you're after, or can you use
extensions or modify the processing pipeline?

It's purely XSLT 1.0, using Saxon (on Linux and Windows, if that
matters...), although suggestions to change this would not be shunned.
The input XML is the only real fixed quantity, due to the amount of work
that would be required to change the code generating it, given that it
already 'works'.

Also you need to let us know:

- Is the input uniform chunks of data in a single file?  (likely if
its a "data-centric" xml file) or does the processing require access
to the whole input for the whole transform?

The majority of the XSL draws on data from all over the input document,
which I suspect will be constraining. There are substantial sections of
the input document which could be described as uniform, but I would not
say that the term applies to the document as a whole.

- What is your current memory usage?  Whats the limit, what is an
acceptable bound? etc..

The servers we're using have several Gb of memory in them, but my
objective is to increase the potential for concurrency, by reducing the
resource requirements of each transform. I think that transforming 150Mb
of data in 400Mb of RAM would be a sensible target (is this sensible?)

- How are you measuring memory usage?  Is it simply the input XML that
is using up all available memory, or do other parts of the pipeline
use a lot of memory too?

I'm measuring it by increasing the maximum amount of memory available to
Java until it runs without throwing OutOfMemory errors (to solve the
immediate problem). The larger transforms (150Mb of input) are taking
~1Gb of memory to run. I'm not sure how to tell what proportion of the
memory is used for the input DOM, output DOM, etc...
Which reminds me, I should mention that the output document is ~<1Mb

        # r

-- 
Ronan Klyne
Business Collaborator Developer
Tel: +44 (0)870 163 2555
ronan(_dot_)klyne(_at_)groupbc(_dot_)com
www.groupbc.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--