I have extremely large (over 300 MB) XML file and tens
of thousands of small xml files generated after
applying various XSLT on the one big XML file.
I don't know whether Mr Kay have tested Saxon with 100+MB
files or not, but we
did (6.5.?), and could not get a simple transform to complete
within hours (I
think we gave up after ~4hours on a 80-100MB file), on a
machine with 1GB of RAM.
I've only gone up to about 50Mb myself, but I know of users who've gone up
to 200Mb.
For one Saxonica client I managed to get the processing time for a 40Mb
transformation down from 90 minutes to 45 seconds. Once you've allocated
enough memory, if it still takes hours then it's because there's a
non-linearity in the stylesheet logic, and this can usually be eliminated by
careful use of keys, sorting, or grouping.
But I do agree with you that there are some problems that are better tackled
with a SAX-based Java application: or sometimes a SAX filter as a precursor
to an XSLT transformation.
Michael Kay
http://www.saxonica.com/
I wrote a custom transformer in Java doing exactly what we
needed using;
* SAX events
* Only keeping one branch/leaf of the XML tree in memory at
any time.
* Aggregation of content into small mutable value objects,
which were output
and discarded when completed.
1500 files, varying from 360MB to ~10MB of a total of ~10GB
could be processed
in a linear speed of ~2MB per second, or close to the disk
drive speed, on a
dual CPU workstation.
I suspect that you will end up in 'custom transformer'
territory, but perhaps
Saxon has improved and can deal with the transforms you give
it. I suggest
that you make some simple tests first, which somewhat
ressemble what you need
to do later.
Cheers
Niclas
--
---------------
If at first you don't succeed, destroy all evidence that you tried.
- Steven Wright
+---------//-------------------+
| http://www.dpml.net |
| http://niclas.hedhman.org |
+------//----------------------+
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--