Michael:
Thanks for the response. BTW, I use your XSLT book as my primary
reference...nice work!
You might find it better to ask such questions on the xsl-list at
mulberrytech.com, or if you're really interested only in Xalan, on a
Xalan-specific forum.
Like many, I suffer from YAL syndrome. (Yet another list) and am hesitat to
sub to any more lists, given how much stuff I already receive. I knew some
XSLT heavyweights (like yourself) hang here, and hence the decision to post
to the xml-dev group. However, I've also now x-posted to the xsl group as
well.
I also think that as XML adoption continues to accelerate, transformations of
extremely
large documents using XSLT will be more and more a general concern to the
community.
In general, every mainstream XSLT processor today builds a tree
representation of the input document in memory. I believe Xalan does parsing
and transformation in parallel, but it still builds the tree. The fact that
the parser and the transformer communicate using SAX is irrelevant - it just
means that the transformer and not the parser is building the tree. (This
isn't totally irrelevant, because the transformer can build a much more
efficient tree knowing it is read-only. But it's still an in-memory tree.)
I might have to redesign how we handle our XML in that case, to keep each
mailmerge
recipient entry in a separate document, rather than have the whole thing as one
monolithic document.
Do you happen to know if anyone has tried to build an XSLT engine that does
incremental
transformations on incoming SAX events, without requiring the building of a
tree? That
kind of approach, where the transform is appropriate, would be much more
efficient in
memory useage and would allow transforms of virtually unlimited size documents
I should
think. Something to investigate...
I can't speak for Xalan, but Saxon users are running transformations up to
200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
requires a little care in configuring the memory allocation, and in writing
the stylesheet to avoid non-linear constructs, but it's certainly doable.
Beyond that, it probably gets difficult.
I'm using Xalan (inside Cocoon), and for this task have not yet figured out a
way to use
Saxon due to some extensions I'm using. More specifically, I need to get/put
stuff into
the session and using something like this (in Xalan):
<xalan:component prefix="javaSession">
<xalan:script lang="javaclass"
src="xalan://org.apache.cocoon.environment.Session"/>
</xalan:component>
Then have templates like:
<xsl:template name="javaCall:setSessionAttribute">
<xsl:param name="attributeName" select="'unknown'" />
<xsl:param name="attributeValue"/>
<xsl:param name="session"/>
<xsl:variable name="dummy"
select="javaSession:setAttribute( $session, $attributeName,
$attributeValue )"/>
</xsl:template>
<xsl:template name="javaCall:getSessionAttribute">
<xsl:param name="attributeName" select="'unknown'" />
<xsl:param name="session"/>
<xsl:copy-of select="javaSession:getAttribute( $session, $attributeName
)"/>
</xsl:template>
The session parameter is a reference to the user's session that is passed in
from the
calling stylesheet with a bit of magic from a custom Cocoon transformer class.
This works fine with Xalan, if you save a tree fragment, and then retrieve it,
you end up
with a node list/tree fragment as desired. With Saxon, however, if I instead
use the
saxon component definition:
<saxon:script language="java"
implements-prefix="javaSession"
src="java:org.apache.cocoon.environment.Session"/>
I can save a result fragment, but when I retrieve it, I don't get a node
list/tree
fragment. Haven't figured out how to correct this yet with Saxon.
If it wasn't for this, I could freely change between the two XSLT engines with
a build
parameter.
You don't actually say what you mean
by a "large document". (Personally, I am amazed to see people handling a 200Mb
database as a single in-memory document, but perhaps I'm just old-fashioned).
I'm not sure yet...the client has not given me any indication of how big the
mail merge
might be. 1M letters would make hit the database limit of 2GB for the xml
document in
the table column (clob). 100K letters would hit the 200MB level that you
mentioned.
I'ld rather implement a solution that has no limitations, so with the lack of a
true
"incremental/SAX" based transformer implementation, I'm thinking that I'll
need to move
away from the monolithic document approach and store each recipient's info in a
separate
small document to work around the current xslt document size limitations.
If you really need purely serial processing, you might consider STX as an
alternative. However, the existing STX implementations are far less
widely-used or mature than the popular XSLT implementations.
That's not an option in our case, since we rely on xslt so much.
Andrzej Jan Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--