xsl-list
[Top] [All Lists]

Re: [xsl] Running the same transformation on many input files, optimisation possible?

2019-12-16 08:43:58
The way I do this is with Ant indeed:

Ant does a single XSLT compilation, then applies it to all input files
where the output file is older than the input file or doesn't exist
(which may provide another optimisation).

I use a build.xml like this to run `ant transform-files`.

<project>
  <target name="transform-files">
    <xslt
      basedir="/workspace/input/"
      includes="*.xml"
      destdir="/workspace/tmp"
      extension=".new.xml"
      style="transform.xslt"
     />
  </target>
</project>

Instead of the basedir and includes attributes, you should be able to
create "filelist" or "fileset" collections of files to be processed
inside the <xslt> tags. There are ways to combine these, to end up with
a single list of input files and benefit from a single XSLT
compilation.

https://ant.apache.org/manual/Types/filelist.html
https://ant.apache.org/manual/Types/fileset.html

~~Rolf.

On Sun, 2019-12-15 at 22:12 +0000, Michael Kay mike(_at_)saxonica(_dot_)com 
wrote:
Note that there's a double overhead here: firstly you're bringing up
a new Java VM for each transformation, and secondly you're
recompiling the stylesheet for each transformation.
You can avoid the Java loading overhead by using ant or XProc, but
I'm not sure either of them will avoid the overhead of recompiling
the stylesheet; though if you use a a recent Saxon version, you could
achieve that by reloading the stylesheet from a pre-compiled SEF
(stylesheet export file).

You could write your own Java application to control the process,
invoking Saxon via the JAXP or s9api APIs - both allow you to compile
a stylesheet once and execute it repeatedly.

You might be able to write the control loop in XSLT, for example by
using the collection() function, or functions in the EXPath file
module. However, this could require stylesheet changes if your XSLT
code binds global variables to values derived from the source
document.

In very simple cases you can take advantage of the fact that the -s
option for the Saxon command line can be a directory, in which case
all the input files are transformed to corresponding files in the -o
directory.

Michael Kay
Saxonica

On 15 Dec 2019, at 09:03, Trevor Nicholls 
trevor(_at_)castingthevoid(_dot_)com
 <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi
 
An application I am working on contains a large number of source
documents which are all run through the same series of
transformations. While initially the build process didn't take long
the cost of repeatedly initialising the XSL processor soon adds up,
so I am looking at ways to streamline it.
 
Our processor of choice is Saxon (currently we are using 8.7.3) so
I can shift this question to the Saxon list if there are extensions
there that are relevant.
 
So the question; given a script that essentially includes the
following:
 
cd documents
for d in `cat dlist`; do
  cd $d
  for f in `cat flist`; do
    java -jar $SAXONDIR/saxon8.jar  -o  $f.new.xml  $f.xml
 $SCRIPTDIR/transform.xsl  doc=$d  file=$f
  done
done
 
is there a mechanism which would allow a single Java process to
perform the equivalent?
 
Thanks
T
 
XSL-List info and archiveEasyUnsubscribe (by email)




XSL-List info and archive

EasyUnsubscribe
(by email)


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>