xsl-list
[Top] [All Lists]

Re: [xsl] Matching on keys

2017-01-06 15:29:24
Hi Mike Kay, Graydon, and XSL-List,

All your remarks are very interesting and helpful.

In the main case I am looking at, I can hard-wire the order of the
rules by setting @priority ... which I suppose will help that problem.
(Except: then I use next-match to set up a cascade, ha! ... so there
goes that I suppose.)

However, I also have the advantage that the documents are not large,
typically 100-200KB of XHTML at a time. (Which is part of the reason I
was afraid I wasn't really seeing a performance hit.)

Thanks --
Wendell




On Fri, Jan 6, 2017 at 4:10 PM, Michael Kay mike(_at_)saxonica(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:


Question: I can define a key, as in

<xsl:key name="elements-by-class" match="*[matches(@class,'\S')]"
use="tokenize(@class,'\s+')"/>

then

<xsl:template match="key('elements-by-class','foo')">
...
</xsl:template>


A further thought on this.

Often, as a rule of thumb, with document-oriented XML, a transformation 
applies-templates to each node in the source document approximately once.

With Saxon, if you apply-templates to an element named E, the maximum number 
of rules matched will be the number of rules that can only match elements 
named E, plus the number of rules that can match any element regardless of 
its name. The expected number of matching attempts will be half that, unless 
you've been very smart about ordering the rules.

So if you've got a stylesheet with 50 rules of the form match="*[some 
predicate]", and you're matching 10K element nodes, then you're going to be 
performing about 250K predicate evaluations.

Without John Lumley's indexation help (which, as I say, is not really 
productized at present), the only effective ways of reducing the number of 
matching attempts are:

* split the rules into multiple modes

* think about ordering the rules

* make as many rules as possible match a specific element name

* have fewer rules

In pathological cases you could consider a manually constructed decision tree:

<xsl:template match="*[starts-with(@class, 'a')]">
  <xsl:apply-templates select="." mode="class-a"/>
</xsl:template>

<xsl:template match="*[@class='aardvark']" mode="class-a">
  ...
</xsl:template>

<xsl:template match="*[@class='abingdon']" mode="class-a">
  ...
</xsl:template>

etc...

<xsl:template match="*[starts-with(@class, 'b')]">
  <xsl:apply-templates select="." mode="class-b"/>
</xsl:template>

<xsl:template match="*[@class='banana']" mode="class-b">
  ...
</xsl:template>

The other approach is to not worry about the number of rule matches, but to 
concentrate on making the matches faster. This is what using keys can 
achieve. But the keys need to be carefully chosen. If you do

<xsl:template match="key('k', 'foo')"/>

and there are 10K elements with the key value 'foo', then the test for 
whether an element matches this rule consists of seeing whether the element 
is present in a set of 10K elements, and in Saxon that's a serial search of 
10K entries (it could be done better, but it isn't - indexes are not 
optimized for this use case). So the cost of the match probably is going to 
take longer than doing

<xsl:template match="*[@class = 'foo']"/>

Michael Kay
Saxonica




-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>