xsl-list
[Top] [All Lists]

RE: [xsl] lookaheads in XSLT2 regexes

2010-03-03 15:10:05
On Tue, 2010-03-02 at 09:21 +0000, Michael Kay wrote:

I would imagine there would also be raised eyebrows about including "_" in
the set of "word" characters. That's something that only happens in geekdom.
But in the past the principle has been "if Perl defines it well, do what
Perl does, otherwise leave it out completely." In my view we've already
copied too many of Perl's mistakes, like the strange rules on recognizing
whether \12 is a back-reference to group 12 or a back-reference to group 1
followed by a digit 2.

I don't remember what first introduced back-references beyond 9;
it might have been sed.

More recently Perl provides named capture buffers, instead of having to
use numbers, and also \g to get the back references --

\g{12}
\g{-1} # the last buffer
and with (?<sock> ....pattern.... ) ..... \g{sock}

.net and perl regexps are incompatible in what happens if you mix
the (...) and \1 with named buffers -- Perl counts both named and
unnamed buffers, and .net only counts unnamed ones.

On the subject of \b I'll note we do have \W and \w -- Perl at least
defines \b as a boundary between \W and \w.  It _is_ crazy that \b in
a character class represents backspace.  Perl also has \B to match at a
non-word boundary -- between \w and \w or between \W and \W.

Historically, the Unix vi editor used (uses) \< for matching \W\w (i.e.
the start of a "word") and \> for the end, \w\W, which always seemed a
little clearer to me, but for use with XML we need to stay away from
assigning meaning to < and > I think :-)

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--