xsl-list
[Top] [All Lists]

[xsl] memory usage of xslt processing

2006-04-19 05:36:02
Hi,

I have the following task:
Create an arbitrary formatted file (XML/HTML/CSV whatever) based on a
Select from a database.

As a constraint the amount of data fetched from the database can not
be stored in memory as a whole.
Another constraint is that I can not use XML-functionality in the
database, I have to implement the functionality on top of our database
access framework. This database access framework fetches record for
record one after another.
And I have to use Java and Xalan.

My idea was to decorate every fetched row from the database with simple
generic XML and fire this to Xalan.

Let do an example:
If my result set from the database looks like:

ID  Name  Description
--  ----  -----------
1  "dog"  "an animal may be dangerous"
2  "cat"  "an animal likes milk"

I create the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
 <row>
  <value>1</value>
  <value>dog</value>
  <value>an animal may be dangerous</value>
 </row>
 <row>
  <value>2</value>
  <value>cat</value>
  <value>an animal likes milk</value>
 </row>
</dataset>

I create this XML as "Sax fire events" in an java
class[StringArrayXMLReader], which implements the org.xml.sax.XMLReader
interface.
I have three methods:

public void init() throws SAXException {
        ch.startDocument(  );
        ch.startElement("","dataset","dataset",EMPTY_ATTR);
}

public void close() throws SAXException {
        ch.endElement("","dataset","dataset");
        ch.endDocument(  );
}

public void parse(String [] input) throws SAXException {
        ch.startElement("","row","row",EMPTY_ATTR);
        for (int i = 0; i< input.length; ++i){
           ch.startElement("","value","value",EMPTY_ATTR);
           ch.characters(input[i].toCharArray(), 0,input[i].length(  ));
           ch.endElement("","value","value");
       }
       ch.endElement("","row","row");
}

The parse method creates the <row>...</row> entries for an overhanded
String array.
The StringArrayXMLReader is associated with a TransformerHandler, which
uses a XSL stylesheet to transform the XML to the desired output.

What happens here is, that when the fetch from the database starts I
call init() ( and thus startDocument() ) and at last, after the fetch
finished, I call close() (and thus endDocument()).
I observed that the xslt processing starts when endDocument() is called.
This is not acceptable for me because I fear the xslt processor reads
all the rows into memory until endDocument() is called and in this case
I take a risk to run in OutOfMemory.

My second idea was to eliminate the init()/close() methods and to
consider one <row>...</row> section as complete document input for the
processor. This has the disadvantage that I have to create the head and
tail of the document manually (and in my example I get a
NullPointerException when I the transformer is called twice).

I have the following questions:
Is it possible to create the output without having the whole data in
memory ?
The basis XML for xslt processing 
<dataset>
  <row><value>...
  <row><value>...
</dataset>
looks very simple and the supplied XLS stylesheets will be not complex
so my hope is to get it working.
I also think that the task in general - produce formatted output from a
potential very large data pool - should be a common one.
Unfortunately I did not do much xslt-processing in the past so I lack
the experience (a bit libxslt which I feed a DOM tree). 
If someone has some striking links I would very glad to
hear. My test code I provide at:

http://randspringer.de/sax_row.tar and
http://randspringer.de/sax.tar

If someone could have a look at it I would really appreciate it.

Thomas


-- 


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>