Geert Bormans wrote:
If I change it to this
(removing \d{2} in favour of \d\d)
[...]
it works
Am I overlooking something?
The regex attribute of analyze-string is an AVT. Now accolades have a
special meaning in both an AVT and a regular expression and to use an
accolade in any AVT without it being interpreted as the start/end of an
expression is to double it. Because accolades are are use often in
regexes and because their contents is usually a number, the result is
not an illegal AVT:
\d{2}
is interpreted as the regular expression:
\d2
which will quite likely match sometimes and sometimes not, but not when
you want it. The resulting behavior has all the features of a buggy
regular expression parser which in fact is a buggy expression itself... ;)
Because I used to make this mistake often (and because escaped quotes
and doubled accolades look ugly), I started to put the regular
expression into a variable in all but the most trivial cases. The added
benefit of this is that you can now use comments in a regular expression:
<xsl:variable name="regex" as="xs:string">
\d <!-- a digit -->
{2} <!-- must occur twice and only twice -->
</xsl:variable>
<xsl:analyze-string regex="{$regex}" flags="x">
...
</
Note the use of the 'x' modifier, which is necessary here. Regular
expressions have the tendency to be the most unreadable of existing
mini-languages, so comments and whitespace are often very welcome. The
as="xs:string" is there because we don't need a document node but a string.
For the fun of it and to complete this little story, note that in the
world of obfuscation a lot is possible, if you set your mind to it. If
you want it and you like fun code, you *can* put comments inside a
regular expression (but only inside an AVT) using the following, imo
rather silly construction:
<xsl:analyze-string flags="x" regex="
\d {()(: a digit :)}
{{2}} {()(: must occur twice and only twice :)}">
The () is because an xpath cannot be an empty string. The (: and :) are,
of course, the comment delimiters for an XPath 2.0 expression. I don't
know about other's opinions on this, but from my point of view, this
doesn't add much to readability, so I still prefer the "best practice"
of putting the regex in a variable (what aids to that decision is that
some XSLT 2.0 processors do not allow the smiley comments).
Cheers,
-- Abel Braaksma
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--