xsl-list
[Top] [All Lists]

RE: [xsl] Processing large XML Documents [> 50MB]

2010-02-23 02:28:45
We have a need to process XML Documents that could be over 50 
megs in size.

Due to the huge size of the document, XSLT is getting tough, 
with the environment we are running in.

Actually, 50Mb isn't really that big nowadays. Some people are transforming
1Gb or more.

Basically, the nature of the data procesing are

a) assemble around 30-40 XML documents [each with a common 
header and its own lines] into one single XML document, with 
the common header and all the lines
b) Update the assembled document in specific locations
c) generate multiple XML document fragments from the huge XML 
document based on query criteria. Each XML frgment is created 
by mapping specific fields in the big document. Each document 
is created for a specific key element value in the huge document.

Am puzzled how to handle this one efficiently.
Any comments are welcome.


It's not entirely clear why you are creating the one big document: it's
perfectly possible to work directly with the 30-40 small ones. Perhaps the
main advantage of building the big document is that you can then use a key
to search across all the data. But if you use a processor like Saxon-EE that
optimizes searches by means of implicit indexing, this might not be
necessary.

Is the 50Mb the size of the combined document, or the size of the individual
pieces?

From your description, it doesn't look as if streaming approaches are going
to get you very far, because the output documents are "slices" across the
input documents. So the data needs to be in memory. If the total size is
50Mb, that seems quite feasible.

I would turn this around: how are you currently processing this, and what
performance problems are you seeing?

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>