xsl-list
[Top] [All Lists]

Re: [xsl] regular expressions in XSLT 2.0

2011-08-28 16:32:50
This is a case of operator precedence (in a sense, anyway).  The
specification of regular expression syntax ([1], with modifications in
[2]) says:

regExp ::= branch ( '|' branch )*
branch ::= piece*
piece ::= atom quantifier?
atom ::= Char | charClass | ( '(' regExp ')' )
charClass ::= charClassEsc | charClassExpr | WildCardEsc | "^" | "$"

Thus, the "^" and "$" are each, in turn, a charClass, atom, piece and,
along with adjacent pieces, a branch, the whole of which is subject to
the alternation operator ("|").  So, your original expression matches
either 1) a string that starts with the part before the bar or 2) a
string that ends with the part after.

By putting the parentheses in, you've put the alternation expression
in sequence with the "^" and "$", so they both must match, along with
one of the alternatives inside the parens.

I'd say the tool you tried that gave a false for the first test was
either implementing a version of regular expressions with different
defined semantics or it was wrong.

-Brandon :)

[1] http://www.w3.org/TR/xmlschema-2/#regexs
[2] http://www.w3.org/TR/xpath-functions/#regex-syntax


On Sun, Aug 28, 2011 at 5:12 PM, Wolfhart Totschnig
<wolfhart(_at_)totschnig(_dot_)org> wrote:
Hello,

I have a question about regular expressions in XSLT 2.0. I noticed that

test="matches('40e','^\d{1,3}|[ivxl]{1,7}$')"

will be evaluated as true, which puzzles me, since I thought it should be
evaluated as false. (A regular expressions test page I found on the internet
(http://www.fileformat.info/tool/regex.htm) indeed evaluates the test as
false.)

When I add parentheses in the regular expression, i.e.,

test="matches('40e','^(\d{1,3}|[ivxl]{1,7})$')"

the test comes out false, however.

So my question is this: Why does the test without the parentheses come out
true? That is, how is the regular expression interpreted by the xslt engine
such that "40e" is considered a match? And why to the parentheses make a
difference? (I thought the parentheses would be redundant in this case.) Or
is this maybe an issue specific to the xslt engine I use (Saxon9he)?

Thanks in advance for your help!
Wolfhart

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>