xsl-list
[Top] [All Lists]

RE: [xsl] Optimizing XSLT iteration

2007-10-07 21:01:13
Thanks for the further data (sent offlist).

Your chart seems to show quadratic performance.

Your 10mb transformation takes 29secs on my machine; that's 0.3Mb/sec
compared with 1.5Mb/sec for the smaller data file, which is consistent with
the idea of it being quadratic, as far as one can tell from two data points.


Success: I've found the cause by looking at the Java CPU profile. The
culprit is the calls on xsl:number at line 42 and 128. There are some
circumstances in which Saxon optimizes xsl:number, but where it can't do so,
it does essentially what the spec describes: each call on xsl:number has to
count all the previous elements, which immediately gives you O(n^2)
performance.

You're using xsl:number to generate a unique id. I don't know what the
constraints are on this id, but you can probably solve the problem using
generate-id() or position(). If it absolutely must be the sequential number
of the locus element within the entire file, consider using a prepass that
numbers the elements (attaching the number as an extra attribute).

It's probably time I looked again at better optimizations for xsl:number -
this hasn't been touched in a long time. The current optimization
essentially looks at the previous execution of the same xsl:number
instruction, and if that was numbering the previous node, it adds one to the
previous number. This doesn't kick in in your case because you are only
numbering every fifth node.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com] 
Sent: 08 October 2007 03:03
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Optimizing XSLT iteration

Thanks for sending your sample data (offlist).

I've fixed various errors in your source file to make it well-formed.
Running the transformation 100 times in Saxon I get

*** Average execution time over 100 runs: 26ms

which seems entirely reasonable.

It's dangerous to scale up from something quite so small, but 
38Kb in 26ms scales up to 76Mb in 52s if the performance is 
linear, and I can't really see why it shouldn't be. The 
processing in this stylesheet is a bit more complex than the 
previously posted sample (includes string-to-number 
conversion and arithmetic) but it looks linear.

Michael Kay
http://www.saxonica.com
  

-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: 08 October 2007 01:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Optimizing XSLT iteration

I just tried Saxon parser after increasing the initial/max
Java heap
space to 512M and it worked with even a 74MB file, but took
about 28
minutes to finish.

That does seem quite high, given that Saxon can do an identity 
transform on a file of that size in something close to 28 
seconds. I 
wonder if there is some other problem in your code (are there parts 
you haven't shown us?). Is the performance linear with 
document size?

Michael Kay
http://www.saxonica.com/



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--