Hi Michael,
Thanks for your advice on this. I understand now the reasons behind not
allowing newline-ended comments. After reading Colin Adams comments, and
Frans Englich's, I came up with the following as a best-practice for
ourselves:
<xsl:variable name="re-extract-filename" >
^.*? <!-- non-greedy: grab everything -->
([^/\\]+) <!-- filename in $1 -->
\. <!-- extension-dot -->
[^\.]*$ <!-- extension (not-a-dot*) -->
</xsl:variable>
<xsl:value-of select="replace(., $re-extract-filename, '$1.xml', 'x')" />
I see several benefits on this approach:
1. It allows for dissecting the regex into smaller parts
2. The comments are understood by most xml/xslt editors (smileys are not)
3. You have all freedom when it comes to whitespace
4. Using a good variable name works like a function: it tells what the
regex does
5. One may create a lib of time tested regexes.
Michael, on your comment about readability, I'd like to add that I agree
that regexes are hard to read. Even harder to learn and perhaps hardest
to really master. I always recommend Jeffrey Friedl's excellent book to
my programmers.
Unfortunately, it is often impossible to just cut a regular expression
into several steps. And I don't agree that it adds to readability by
adding more steps to it (sometimes it does, when the steps are clear, I
guess, but what if there aren't logical steps). Finally, I think that
even the simplest regular expression can be hard to read when not
commented, and the simplest are often quite long already.
Using the above (non-foolproof) "simple" regex, it could be dissected
into steps like: removeprotocol, removesite, removeport, removepath,
removeextension. But that would add to a lot of verbosity to the xslt /
xpath. My men are not very experienced when it comes to regexes. They
find it very hard to understand the flaws of the above regex. Using
comments, they grab the idea a lot better.
In terms of performance, I think (but am not sure) that a well crafted
regex is often a lot quicker and less resource intensive. Yet, a good
understanding of how regex parsers work is a necessity.
Cheers,
Abel
Michael Kay wrote:
As I see it, XPath 2.0 has that flag too. See XQuery 1.0 and
XPath 2.0 Functions and Operators, section 7.6.1.1 Flags:
Yes, but in XPath the "x" flag does not enable comments. This is because the
Perl comment syntax uses newline to mark the end of a comment. In XSLT,
regular expressions will often appear inside XML attributes, where newlines
get normalized to spaces by an XML parser, so we've always adopted the view
that the grammar should never treat newlines differently from spaces.
My own advice is to avoid using regular expressions that are so complex that
they need comments to explain them. If you need to explain them, then it's
also going to be very hard to debug them, and if you hit performance
problems it will be very difficult to analyze the problems. If you can,
break up the task into separate stages, each defined by simpler regular
expressions.
Another approach to commenting, however, is like this:
<xsl:variable name="x"
select="replace(.,
'^.*?([^/\\]+)\.[^\.]*$)', '$1.xml')"/>
<!-- ^non-greedy: grab everything
^the last part of the path: does not contain (back)slashes. Grab
it to $1
^the dot separating the extension from the filename
^not-a-dot until end of string, this is the extension
-->
or if you prefer:
<xsl:variable name="x"
select="replace(., '^.*?([^/\\]+)\.[^\.]*$)', '$1.xml')"/>
<!--
.*? non-greedy: grab everything
([^/\\]+) the last part of the path: does not contain (back)slashes.
Grab it to $1
\. the dot separating the extension from the filename
[^\.]* not-a-dot until end of string, this is the extension
-->
(Sorry if the mailer mangles this!)
Michael Kay
http://www.saxonica.com/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--