xsl-list
[Top] [All Lists]

RE: [xsl] memory usage of xslt processing

2006-04-19 05:59:29
XSLT processors generally read the whole document into memory. Some products
may be able to avoid this under certain circumstances, for example see
http://www.saxonica.com/documentation/sourcedocs/serial.html for Saxon.

Running one transformation per row is certainly feasible in principle though
there may be a significant start-up overhead - you'll only find out by
measurement.

Alternatively, why not retrieve the data from the database in
transformer-sized chunks?

Michael Kay
http://www.saxonica.com/ 

-----Original Message-----
From: Thomas Porschberg [mailto:thomas(_dot_)porschberg(_at_)osp-dd(_dot_)de] 
Sent: 19 April 2006 13:36
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] memory usage of xslt processing

Hi,

I have the following task:
Create an arbitrary formatted file (XML/HTML/CSV whatever) 
based on a Select from a database.

As a constraint the amount of data fetched from the database 
can not be stored in memory as a whole.
Another constraint is that I can not use XML-functionality in 
the database, I have to implement the functionality on top of 
our database access framework. This database access framework 
fetches record for record one after another.
And I have to use Java and Xalan.

My idea was to decorate every fetched row from the database 
with simple generic XML and fire this to Xalan.

Let do an example:
If my result set from the database looks like:

ID  Name  Description
--  ----  -----------
1  "dog"  "an animal may be dangerous"
2  "cat"  "an animal likes milk"

I create the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
 <row>
  <value>1</value>
  <value>dog</value>
  <value>an animal may be dangerous</value>  </row>  <row>
  <value>2</value>
  <value>cat</value>
  <value>an animal likes milk</value>
 </row>
</dataset>

I create this XML as "Sax fire events" in an java 
class[StringArrayXMLReader], which implements the 
org.xml.sax.XMLReader interface.
I have three methods:

public void init() throws SAXException {
        ch.startDocument(  );
        ch.startElement("","dataset","dataset",EMPTY_ATTR);
}

public void close() throws SAXException {
        ch.endElement("","dataset","dataset");
        ch.endDocument(  );
}

public void parse(String [] input) throws SAXException {
        ch.startElement("","row","row",EMPTY_ATTR);
        for (int i = 0; i< input.length; ++i){
           ch.startElement("","value","value",EMPTY_ATTR);
           ch.characters(input[i].toCharArray(), 
0,input[i].length(  ));
           ch.endElement("","value","value");
       }
       ch.endElement("","row","row");
}

The parse method creates the <row>...</row> entries for an 
overhanded String array.
The StringArrayXMLReader is associated with a 
TransformerHandler, which uses a XSL stylesheet to transform 
the XML to the desired output.

What happens here is, that when the fetch from the database 
starts I call init() ( and thus startDocument() ) and at 
last, after the fetch finished, I call close() (and thus 
endDocument()).
I observed that the xslt processing starts when endDocument() 
is called.
This is not acceptable for me because I fear the xslt 
processor reads all the rows into memory until endDocument() 
is called and in this case I take a risk to run in OutOfMemory.

My second idea was to eliminate the init()/close() methods 
and to consider one <row>...</row> section as complete 
document input for the processor. This has the disadvantage 
that I have to create the head and tail of the document 
manually (and in my example I get a NullPointerException when 
I the transformer is called twice).

I have the following questions:
Is it possible to create the output without having the whole 
data in memory ?
The basis XML for xslt processing
<dataset>
  <row><value>...
  <row><value>...
</dataset>
looks very simple and the supplied XLS stylesheets will be 
not complex so my hope is to get it working.
I also think that the task in general - produce formatted 
output from a potential very large data pool - should be a common one.
Unfortunately I did not do much xslt-processing in the past 
so I lack the experience (a bit libxslt which I feed a DOM tree). 
If someone has some striking links I would very glad to hear. 
My test code I provide at:

http://randspringer.de/sax_row.tar and
http://randspringer.de/sax.tar

If someone could have a look at it I would really appreciate it.

Thomas


-- 


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--