xsl-list
[Top] [All Lists]

Re: [xsl] Optimizing XSLT iteration

2007-10-07 16:23:09
Sujata Gohad wrote:
I am dealing with documents that are 20-60MB in size and have a number
of sequences as shown in my example xml. As you can imagine, iterating
through those many takes huge amounts of time.

What would be the best way to attack the issue? Is there a good
reference/article on this topic?

I have tried using Saxon with TinyTree, Xalan, MSXML6 and Altova.

On a Intel Core Duo with 2Gig memory, Saxon, Xalan would run out of heap space.

Altova could handle files only upto 40MB.

MSXML successfully dealt with all file sizes, but the 60MB file took
over 25 minutes to transform.

Saxon, for the file sizes it could do, it was at least 1.2-1.3 times
faster than MSXML.

Ah, ignore my previous message. This begs for a different approach. Recall that most XSLT processors load a document as a whole in memory before it is processed. Typically this takes from the memory about four to five times the size of the file, and likely this part is more intensive for your processor than the processing itself.

To circumvent this, I recommend you try Saxon-SA. It has an option to use it in a streaming way, which would mean that *not* the whole document would first be loaded in memory. However, there are a couple of things in your code you have to be very strict about to make this work. On the Saxon website there's a clear explanation of how you can do this: http://www.saxonica.com/documentation/sourcedocs/serial.html (one of the things: it's probably best to move to template matching instead of for-each, one other thing: you cannot use xsl-stylesheet PI).

Also, (sorry Michael), if you stick to XSLT 1.0, you can process your file with the .NET 2.0 (or 3.0) version of Microsoft's XSLT processor. It outperforms most other processors around.

Last but not least: if speed is really an issue, you should consider using Perl and regular expressions. I've seen 10GB files going through Perl in just a matter of minutes, outputting only the necessary strings from where we took it further. This is more of "use the right toolset for a certain job" and where XSLT is not always the best choice. Running Perl (or any other interpreter that can use regular expressions, but Perl just so happens to be the fastest around) on 20-80 MB files should take a couple of seconds, depending on the speed of your hard drive.

Cheers,
-- Abel Braaksma




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--