Xalan is capable of "streaming processing".
The interesting challenge is to work out when you can discard parts of
the tree that won't be needed again. I think this could be done quite
easily for a small class of very simple stylesheets, but the general
problem is quite hard.
That's been our conclusion. XSLT's semantics require at least the
appearance of having the whole document in memory at once. Figuring out
how to reduce
The terminology Xalan uses for these issues:
Incremental: We can build the source model "on demand" (eg, by
"throttling" the incoming SAX stream, or by using Xerces incremental
parsing). If your stylesheet doesn't need to examine the whole source
document, this can reduce the resources required. It does have some
throughput costs. IMPLEMENTED; optional due to the performance trade-off.
Streaming: In incremental mode, we can begin generating output before the
entire source document has been read. This reduces latency, which can be a
major advantage when the next stage of processing (eg a browser) can
itself operate in a streaming mode and begin displaying data immediately;
the user sees the system as more responsive despite the throughput costs.
IMPLEMENTED.
Filtering: An optimization consisting of not building portions of the
source model which stylesheet analysis proves will never be referenced by
the stylesheet. Conceptually straightforward, but runs into the "stopping
problem" to some extent; may be hard to apply generally. May require some
rewriting of the stylesheet and/or retaining of "stub" branches of the
tree to avoid breaking XPaths. NOT IMPLEMENTED at this time.
Pruning: An optimization consisting of discarding portions of the source
model which stylesheet analysis proves will never again be referenced by
the stylesheet. Similar issues to filtering. Some of the optimizations in
our internal data structures (DTM) fight with this approach; as currently
implemented, DTM can be "tail pruned" fairly easily (remove the most
recently added subtree) but general pruning is challenging. We currently
do some tail-pruning to manage RTF/Temporary Tree storage, but pruning of
the source document is NOT IMPLEMENTED at this time.
______________________________________
Joe Kesselman / IBM Research
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list