Re: optimization for very large, flat documents

Thanks to everyone who responded.  For now I plan to follow Pieter's
idea of chunking the data into manageable pieces (16-64 MB).  Then I'm
going to look into Michael's suggestions about STX (unfortunately, not
yet a W3C recommendation and thus not widely implemented) and XQuery.

For anyone interested in some numbers, I've split each of my 2 large
files (613 MB and 656 MB) into subfiles of 16 K independent entries
(which vary in size), yielding sets of 25 and 37 subfiles (of approx. 25
MB and 17 MB each, respectively).  I process them by running Saxon 8.2
from the command line (with an -Xmx value of 8 times the file size) on a
Sun UltraSPARC with 2 GB of real memory.  The set of 37 17 MB XML
subfiles are processed with a slightly simpler stylesheet, and take
about 1:15 (minutes:seconds) each; the set of 25 25 MB XML subfiles use
1 document() call per entry to/from a servlet on a different host and
take about 8 minutes each.

My next step is to use Saxon's profiling features to find out where I
can improve my stylesheet's performance.

Thanks again to everyone on xsl-list for all your help!
-- 
Kevin Rodgers


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--