xsl-list
[Top] [All Lists]

Re: [xsl] Performance of predicate-based patterns

2015-01-23 08:19:35
On Fri, Jan 23, 2015 at 11:28:31AM -0000, Michael Kay 
mike(_at_)saxonica(_dot_)com
scripsit:
We've started doing some performance work in Saxon on the DITA
stylesheets, which use large numbers of match patterns in the form

<xsl:template match="*[contains(@class, ' token ')]">

If anybody ever starts using XSLT 2.0 for DITA processing, there are
going to be things like

<xsl:template match="*[(tokenize(@class,'\p{Zs}+')[normalize-space()])[2] eq 
'topic/li']]">

showing up.  ("some $x in tokenize(@class,...."  seems pretty likely,
too.)

Currently these require a very inefficient sequential search to find
the matching rule for each element.

Does anyone know of any other commonly-used stylesheets (or even,
uncommonly used ones) which show similar characteristics, that is,
large numbers of match patterns using predicate matching only, with no
explicit element names? We'd like any optimizations we implement to be
as general-purpose as possible.

I've done some conversion work on legal documents where the goal was to
get everything back on a single schema after a couple decades of
evolution in the element names of various DTDs.  Matches of the form

<xsl:template match="*[name() = ('P','NP','PARA')]">

showed up a fair bit to match on the abstract "that's a paragraph"
across the range of evolved element names.

There was also a fair bit of 

<xsl:template match="*[not(name() = ('PARA','LIST','TABLE')))]">

used as general "we don't think there's anything but those in the data
but let's not make rash assumptions" surprise handler templates.

-- Graydon
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>