Re: how to estimate speed of a transformation

On Wed, Dec 10, 2003 at 05:35:13PM +0400, David Tolpin wrote:

How can I ensure that differences between processors cause linear changes 
in speed? How
can I predict space consumtion?


  You can't ! Well not in a reasonable way IMHO:

Processors like Saxon and jd use the fact that their implementation
language is garbage-collected to reuse nodes in things like node-set()
implementation. xsltproc usually can't do this kind of things. this
also mean you cannot predict the cost of an stylesheet execution
because you would have to model the cost of the garbage collection
which may occur asynchronously, during the transform, after the transform
or any combination of those depending on the complexity of the stylesheet,
the size of the data, the memory pressure inside the JVM, etc ...


Whether an implementation is garbage-collected does not depend on the fact
whether it is written in Java or not. Node re-use can as well be implemented
in C, B, Pascal, or any other language. Besides, GC is not a magic property of 
Java.

GC has little relation to optimizations advanced processors perform.
jd.xslt and saxon display consistent results under all versions (1.1.8 to 1.3.1)
of JDK available to me; however, they have limited support for tail-recursion,
that is, only 'simple' cases are optimized correctly. On the other hand, they
perform expressions optimizations rooted in immutability of data, but I cannot
find any source besides the source code that describes which optimizations
are performed exactly.

Memory use is also implemented differently, not in terms of Java GC, but in 
terms
of creation and allocation of temporary objects.

The estimation of the complexity cannot simply be derived from a simple
parsing and analysis of the stylesheet. It depends on the processor
implementation, the implementation of the JVM if using one, and on the
version of the processor used.


For XSLT, as well as for most programming languages, it is possible 
to estimate complexity for a non-optimizing implementation; such as 
James Clark's XT, which is very fast but does not do much beyond verbose
interpretation of the transformation.

The fact is that, due to restrictions of the data model of XSLT, there
are several optimizations which are always possible and are caused not
by sophistication of an underlying layer, but by features of the language
itself, that is, XSLT.

My questions are following: 

- what are these optimizations?
- how these optimizations affect computation complexity for certain operations?
- how 'much' of the 'mandatory' optimizations is implemented in each of widely 
used 
  implementations?

These questions have no relation to either implementation language or execution 
platform.
They should have their answers regardless of the implementation media.

Having answers to these questions would help program advanced algorithms 
adequately,
without recourse to testing with each of a dozen of available implementations. 
In
particular, while writing stylesheets with only a basic implementation in mind 
helps
ensure that a stylesheet behaves adequately (in terms of memory usage and 
speed) in
the worst case, knowing which expressions are in fact evaluated recurrently and 
which
data structures are re-used can help in constructing programs that benefit from 
optimizations
when the optimizations are available. Again, the optimizations of this class 
should not
be an 'art', they can be specified formally and in fact are features of the 
language.

David Tolpin
http://davidashen.net/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list