xsl-list
[Top] [All Lists]

Re: Incremental transformations with Xalan and performance issues?

2004-12-04 23:46:42

--- Andrzej Jan Taramina <andrzej(_at_)chaeron(_dot_)com> wrote:

I'm in a situation where I need to parse some large documents, where the 
first few elements are a preamble with various parameters and the end of the 
document is a large list of entries.

Think of a mail merge, where the letter to be sent is defined first in the 
mail merge xml, followed by numerous recipient entries, something like this:

<mailmerge>
      <letter>
              ...letter def goes here
      <letter>
      <recipients>
              <recipient>
                      ...recipient data
              </recipient>
              <recipient>
                      ...recipient data
              </recipient>
              etc...
      </recipients>
<mailmerge>

What I was wondering was how Xalan handles the processing of such large 
documents (say a million recipient entries) when the parser is using SAX?

More specifically, if I create global variables such as:

      <xsl:variable name="letterTemplate" select="/mailmerge/letter"/>

then later:

      <xsl:template match="recipients/recipient>
              <!-- process the recipient using $letterTemplate -->
      </xsl:template>

Will the processing be incremental in nature, as SAX events are received by 
Xalan?  That is, is Xalan smart enough to create the global as soon as it 
can, followed by processing of each individual recipient as each related SAX 
event is received?  In that case, having the shared global info early in the 
document and the large list at the end would probably have beneficial 
performance implications.

Or will the whole document have to be instantiated as some sort of internal 
tree first?

Hopefully, it's incremental in nature, since otherwise we might blow out 
memory with such large documents.

Any insight into the implications of processing such large documents, using 
globals, xslt stylesheet structure, impact of element ordering in the 
document and the like would be very much appreciated.

Thanks!


First of all, my experience says that if you are concerned about performance, 
stay away from
Xalan. I must admit that I wasn't concerned about XSLT and speed since Summer 
of 2002 (when school
made me work at a XSLT compiler (in which I was focused about speed, but not 
about incremental
parsing :-D , because I didn't really find a good application for it)) and 
testing different
processors I got the following results:
        AXXEL/1 AXXEL/3 XSLTC   XALAN   MSXML4  MSXML3  SAXON 
Mo.xsl  1352    3155    2564    61950   2379    10451   3985
Sh.xsl  250     1713    ***     6205    655     1787    681
n-s.xsl 1041    1321    1201    4897    1065*   2243    2825

* = wrong output
*** = coundn't compile

Processors:
AXXEL/1 - my project: XSLT compiled to Java sourcecode, output fully suppressed 
(JVM)
AXXEL/3 - my project: XSLT compiled to Java sourcecode, with output (JVM)
XSLTC - XSLT to Java bytecode, found in Xalan (JVM)
SAXON - SAXON 6.5.2 (JVM)
XALAN - XALAN 2.3.1 (JVM)
MSXML3 - Microsoft MSXML 3.0
MSXML4 - Microsoft MSXML 4.0


Tests:
mo.xsl - a XML2HTML presentation sheet, fairily complex (a lot of templates and 
a lot of modes).
Artificially run 100 times (the main template: run the stylesheet 100 times, 
without re-parsing of
the input XML)
sh.xsl - a XML2HTML presentation sheet, quite simple. Run internally 100 times, 
except for MSXML3
and MSXML4 (I don't remember why, but it didn't work) for which the time for 
executing once was
multiplied by 100
n-s.xsl (number-string.xsl) - an artificial stylesheet, to test the computation 
power for the
string value of a node (i.e: how fast you compute string(/) ), the speed of 
normalize-space.

For Java processors, JDK 1.4.0 was used (HotSpot client). The time was computed 
after the hot spot
compiler did its job (simulation of server-side environment) .

I must admit, tests were performed with mid-2002 software, but as you can see, 
Xalan is way worst
than anything else tested, MSXML 4.0 works great (written in C++) and SAXON is 
very close behind
(although it is written in Java). Xalan was 10 to 15 times slower than SAXON 
(on real
stylesheets).
What I also found out is that Java is not great at I/O in XSLT transformation: 
file manipulation
and string manipulation is quite slow.

Maybe the things have changed changed in 2.5 years, but I doubt that people 
from Apache foundation
learned how to write fast software. Latest release of Xalan is 2.6 and latest 
releas of Saxon is
8.1.1. Still, latest release of MSXML is 4.0. I also bet that they didn't 
change much in XSLTC




About the big XMLs issue: I recomend you not to expect any magic from a XSLT 
processor (like
efficient incremental parsing) and make all your XMLs small by dividing the 
information into more
than an XML (which later you can access them using "document" function). For 
example, you may take
the mail content into a separate XML file if you don't access this info too 
often. In my
experience, any XML over 3 or 5 MB is a bad XML.
More, don't expect that after you used an external XML (using "document" 
function) and you have no
refference to it any more, the XSLT processor will free the XML tree for that 
external XML.


=====
Marian
http://www.utdallas.edu/~mgo031000/


                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>