xsl-list
[Top] [All Lists]

RE: use XSLT or XQuery in Saxon?

2005-01-06 05:10:46
I have extremely large (over 300 MB) XML file and tens
of thousands of small xml files generated after
applying various XSLT on the one big XML file.

I don't know whether Mr Kay have tested Saxon with 100+MB 
files or not, but we 
did (6.5.?), and could not get a simple transform to complete 
within hours (I 
think we gave up after ~4hours on a 80-100MB file), on a 
machine with 1GB of RAM.

I've only gone up to about 50Mb myself, but I know of users who've gone up
to 200Mb.

For one Saxonica client I managed to get the processing time for a 40Mb
transformation down from 90 minutes to 45 seconds. Once you've allocated
enough memory, if it still takes hours then it's because there's a
non-linearity in the stylesheet logic, and this can usually be eliminated by
careful use of keys, sorting, or grouping.

But I do agree with you that there are some problems that are better tackled
with a SAX-based Java application: or sometimes a SAX filter as a precursor
to an XSLT transformation.

Michael Kay
http://www.saxonica.com/
 

I wrote a custom transformer in Java doing exactly what we 
needed using;
 *  SAX events
 *  Only keeping one branch/leaf of the XML tree in memory at 
any time.
 *  Aggregation of content into small mutable value objects, 
which were output 
and discarded when completed.

1500 files, varying from 360MB to ~10MB of a total of ~10GB 
could be processed 
in a linear speed of ~2MB per second, or close to the disk 
drive speed, on a 
dual CPU workstation.

I suspect that you will end up in 'custom transformer' 
territory, but perhaps 
Saxon has improved and can deal with the transforms you give 
it. I suggest 
that you make some simple tests first, which somewhat 
ressemble what you need 
to do later.


Cheers
Niclas
-- 
---------------
If at first you don't succeed, destroy all evidence that you tried.
 -  Steven Wright

+---------//-------------------+
|   http://www.dpml.net        |
|  http://niclas.hedhman.org   |
+------//----------------------+


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>