xsl-list
[Top] [All Lists]

Re: [xsl] Running the same transformation on many input files, optimisation possible?

2019-12-15 16:12:26
Note that there's a double overhead here: firstly you're bringing up a new Java 
VM for each transformation, and secondly you're recompiling the stylesheet for 
each transformation.

You can avoid the Java loading overhead by using ant or XProc, but I'm not sure 
either of them will avoid the overhead of recompiling the stylesheet; though if 
you use a a recent Saxon version, you could achieve that by reloading the 
stylesheet from a pre-compiled SEF (stylesheet export file).

You could write your own Java application to control the process, invoking 
Saxon via the JAXP or s9api APIs - both allow you to compile a stylesheet once 
and execute it repeatedly.

You might be able to write the control loop in XSLT, for example by using the 
collection() function, or functions in the EXPath file module. However, this 
could require stylesheet changes if your XSLT code binds global variables to 
values derived from the source document.

In very simple cases you can take advantage of the fact that the -s option for 
the Saxon command line can be a directory, in which case all the input files 
are transformed to corresponding files in the -o directory.

Michael Kay
Saxonica

On 15 Dec 2019, at 09:03, Trevor Nicholls 
trevor(_at_)castingthevoid(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi
 
An application I am working on contains a large number of source documents 
which are all run through the same series of transformations. While initially 
the build process didn't take long the cost of repeatedly initialising the 
XSL processor soon adds up, so I am looking at ways to streamline it.
 
Our processor of choice is Saxon (currently we are using 8.7.3) so I can 
shift this question to the Saxon list if there are extensions there that are 
relevant.
 
So the question; given a script that essentially includes the following:
 
cd documents
for d in `cat dlist`; do
  cd $d
  for f in `cat flist`; do
    java -jar $SAXONDIR/saxon8.jar  -o  $f.new.xml  $f.xml  
$SCRIPTDIR/transform.xsl  doc=$d  file=$f
  done
done
 
is there a mechanism which would allow a single Java process to perform the 
equivalent?
 
Thanks
T
 
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by 
email <>)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>