In the specific DITA case you're searching for strings reliably bound by
blanks, so the contains is correct in this case.
Your statement "Intrinsically, tokenizing is more complex than just
searching for a substring." is I think what I was looking for--that
suggests that as a general policy that preferring contains() over tokenize
and sequence comparison will be the better choice if performance is the
only concern (and assuming that it actually produces a meaningful
performance difference, which it very well may not).
On 9/2/16, 2:02 PM, "Michael Kay mike(_at_)saxonica(_dot_)com"
Well, the tokenize() seems more correct: presumably if $link-classes is
"green" and $node/@class is "pale-green" you want the answer to be false,
which it will be with the tokenize() approach but not with the contains()
But shouldn't the regex for the tokenize case be '\s+' rather than ' '?
Performance of course is product dependent and you just have to measure
it. Intrinsically, tokenizing is more complex than just searching for a
On 2 Sep 2016, at 17:04, Eliot Kimber ekimber(_at_)contrext(_dot_)com
In the DITA processing code, where we are using XSLT 2 and checking for
string matches in attribute values, I have the requirement to see if any
of a number of strings might match.
The current code is:
some $c in $link-classes satisfies contains($node/@class, $c))
Where $link-classes is a sequence of strings and @class is a
blank-delimited sequence of strings.
Another way to do this check would be:
$link-classes = tokenize($node/@class, ' ')
This is a check that will be made a lot so performance may important (or
it may not be). The tokenize version seems simpler and clearer to me but
the satisfies approach has a certain elegance that I also like.
My question: is there any reason to prefer one or the other of these? I
realize that XSLT 3 provides a new way to do token matching in strings
for now we're stuff with XSLT 2.
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com