On Wed, Dec 10, 2003 at 05:35:13PM +0400, David Tolpin wrote:
How can I ensure that differences between processors cause linear changes
in speed? How
can I predict space consumtion?
You can't ! Well not in a reasonable way IMHO:
Processors like Saxon and jd use the fact that their implementation
language is garbage-collected to reuse nodes in things like node-set()
implementation. xsltproc usually can't do this kind of things. this
also mean you cannot predict the cost of an stylesheet execution
because you would have to model the cost of the garbage collection
which may occur asynchronously, during the transform, after the transform
or any combination of those depending on the complexity of the stylesheet,
the size of the data, the memory pressure inside the JVM, etc ...
Whether an implementation is garbage-collected does not depend on the fact
whether it is written in Java or not. Node re-use can as well be implemented
in C, B, Pascal, or any other language. Besides, GC is not a magic property of
Java.
GC has little relation to optimizations advanced processors perform.
jd.xslt and saxon display consistent results under all versions (1.1.8 to 1.3.1)
of JDK available to me; however, they have limited support for tail-recursion,
that is, only 'simple' cases are optimized correctly. On the other hand, they
perform expressions optimizations rooted in immutability of data, but I cannot
find any source besides the source code that describes which optimizations
are performed exactly.
Memory use is also implemented differently, not in terms of Java GC, but in
terms
of creation and allocation of temporary objects.
The estimation of the complexity cannot simply be derived from a simple
parsing and analysis of the stylesheet. It depends on the processor
implementation, the implementation of the JVM if using one, and on the
version of the processor used.
For XSLT, as well as for most programming languages, it is possible
to estimate complexity for a non-optimizing implementation; such as
James Clark's XT, which is very fast but does not do much beyond verbose
interpretation of the transformation.
The fact is that, due to restrictions of the data model of XSLT, there
are several optimizations which are always possible and are caused not
by sophistication of an underlying layer, but by features of the language
itself, that is, XSLT.
My questions are following:
- what are these optimizations?
- how these optimizations affect computation complexity for certain operations?
- how 'much' of the 'mandatory' optimizations is implemented in each of widely
used
implementations?
These questions have no relation to either implementation language or execution
platform.
They should have their answers regardless of the implementation media.
Having answers to these questions would help program advanced algorithms
adequately,
without recourse to testing with each of a dozen of available implementations.
In
particular, while writing stylesheets with only a basic implementation in mind
helps
ensure that a stylesheet behaves adequately (in terms of memory usage and
speed) in
the worst case, knowing which expressions are in fact evaluated recurrently and
which
data structures are re-used can help in constructing programs that benefit from
optimizations
when the optimizations are available. Again, the optimizations of this class
should not
be an 'art', they can be specified formally and in fact are features of the
language.
David Tolpin
http://davidashen.net/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list