xsl-list
[Top] [All Lists]

Re: [xsl] An observation on the performance of fn:transform

2020-07-03 03:52:41
Am 03.07.2020 um 10:45 schrieb Norman Tovey-Walsh ndw(_at_)nwalsh(_dot_)com:
Hello world,

This isn’t a complaint, or explicitly a request for advice (though I’m
always happy for helpful suggestions), just an observation. The workflow
for processing DocBook documents is roughly this pipeline:

1. Fixup the logical structure of the document (expand entities and
    replace entityref attributes with the corresponding fileref
    attributes).
2. Perform XInclude
3. Convert DocBook 4.x markup to 5.x markup if the source document
    appears to be DocBook 4.x (i.e., if its root element is in no
    namespace)
4. Perform transclusion[1]
5. Profile
6. Resolve annotations
7. Resolve XLinks (including external link bases)

These are all relatively small stylesheets and they’re currently run
with fn:transform. (This will, as I’ve said before, all be driven by
XProc in the medium term, but I have short term requirements.)

The last two or three steps are: transform the result of step 7 from
DocBook to HTML and then do a little cleanup on that output and, if
“chunking” has been requested, break it into chunks.

Doing a little post-conversion cleanup improves the output and greatly
simplifies the chunking tasks.

Because I’m old school, and because I initially had a “I can’t do this
as a pipeline because I don’t have XProc” mindset, I wrote up the
conversion to HTML, the cleanup, and the chunking as modes in the same
stylesheet.

Then this morning I thought, hang on, I could use fn:transform for those
steps too and get all the benefits of pipelines there (easier to
maintain, separately testable, etc.)

So I coded that up. I now have an *eight* stage pipeline where the last
stage does the transformation to HTML, cleanup of that HTML, and
possible chunking. It’s all still in one stylesheet with modes because I
haven’t teased it apart yet, it’s just being run with fn:transform
instead of with a mode in the same stylesheet.

The performance difference is interesting.

Running 1,426 tests through the 8 stage pipeline: 4m19s.
Running 1,542 tests through the original 7 stages: 50s.

There are fewer tests in the former case because some of my XSpec tests
just can’t work against the new driver; I’ll have to run two sets of
tests which is kind of a drag, but I should be running separate tests
for all the stages anyway so I guess that’s just the way it is.

The performance difference is presumably because it takes ~0.15s to
compile the main stylesheet each time. Which is, you know, pretty damned
fast, but adds up if you’re going to do it thousands of times in a row.

There is an option
  'cache' : true()
on the option map you pass to `fn:transform` which serves as a hint to
the XSLT processor to cache the stylesheet, perhaps setting that
explicitly improves performance.

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>