xsl-list
[Top] [All Lists]

Re: [xsl] XPath expression to check that there are no intervening elements?

2016-07-20 10:36:34

On 20 Jul 2016, at 09:36, Michael Kay mike(_at_)saxonica(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

I did some timings:


Kay 3 - 32.18ms (ouch!)

doc ! (let $fsm := map{
0: function($x) {if ($x[self::B]) then 1 else 0},
1: function($x) {if ($x[self::B]) then 1 else 2},
2: function($x) {if ($x[self::B]) then 3 else 2},
3: function($x) {3}
} return
(fold-left(*, 0, function($state, $node){$fsm($state)($node)}) ne 3))



I decided to do some analysis of this disappointing result.

It turns out that dynamic function calls are quite expensive, and this is 
almost entirely due to the very dynamic nature of the type checking that is 
done (so-called function coercion).

Running the above in a slightly different test environment I got a baseline 
time of 41ms.

Java profiling showed an inordinate amount of time spent computing hashcodes. 
It turned out that during the dynamic call of the various "function($x)" 
functions, Saxon was checking that the supplied item type element(C) was a 
subtype of the required type item()*, which it does by first checking this pair 
of types in a cache of known type relationships, and the lookup in the cache 
involved an expensive hashcode computation. Avoiding the lookup by applying the 
knowledge that everything is a subtype of item()* reduced the total time to 
29ms.

I achieved a further reduction to 22ms by replacing the dynamic function call 
$fsm($state) with a static call map:get($fsm, $state).

I then got this down to 13ms by avoiding the inline functions for computing the 
state transition, changing the query to

declare namespace map='http://www.w3.org/2005/xpath-functions/map'; 
doc ! (let $fsm := map{
0: (1,0), 
1: (1,2), 
2: (3,2), 
3: (3,3)} 

return (
  fold-left(*, 0, function($state, $node){
             map:get($fsm,$state)[if ($node[self::B]) then 1 else 2]
             }) ne 3))

Even though there is now only one dynamic function call per element in the 
input sequence (and this is unavoidable) the cost still seems to be dominated 
by dynamic function calling.

Declaring the types of the arguments ($state, $node) made this worse, because 
it increases the amount of dynamic type checking that needs to be done. This is 
quite different from the usual scenario where declaring types enables the 
checking to be done statically.

There is clearly a lot of scope for optimization in this area. It's actually 
the first time I've looked seriously at the performance of higher order 
functions in a real query.

Michael Kay
Saxonica
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>