Re: [xsl] How to split an RegEx into several lines for readability?

Dimitre Novatchev wrote:

As I am an absolute RegEx beginner, please excuse me if this is a
trivial question.

A good thing to know about regexes is that, besides being powerful, theycan be very dangerous too, esp. to the unaware, when backtracking causesthe regex to run with exponential times for non-matching strings. Anexample of such a regex is in this post:http://www.nabble.com/Certain-non-zero-length-non-matching-regexes-run-forever-on-Saxon-tf3065127.html#a8524868

If you are going to use regexes in a production environment make sure totest them thoroughly for this behavior or your processor may hangoccasionally.




Is there any way I can split this RegEx on separate lines and/or add
whitespace so that it would be more readable?

You already heard of the 'x' modifier, but there are a few things thatyou should know before splitting your regex into a more readable format:

* If you use Saxon, several bugs concerning whitespace handling havebeen fixed in the 8.8 and 8.9 release, some of which you may considersignificant, like this one, which is now fixed:http://www.nabble.com/Bug%3A-whitespace-at-beginning-of-regex-fails-the-regex-when-in-%27x%27-%28ignore-whitespace%29-mode-tf2870226.html#a8022584

* The "ignore whitespace" is very literally so. I.e., in XSLT regexes,this: fn:matches("hello world", "hello\ sworld", "x") returns true. The"\ s" part in the regex is, with whitespace removed, "\s" and matches aspace. Most regex engines (Perl for one) consider an escaped space as aspace.

* The only place where you must be aware of whitespace with 'x' on iinside classes, where it is not ignored: [abc ] matches 'a', 'b', 'c' or' '.

* You probably don't want to do this, but this is allowed with the'x' modifier: "\p{ I s B a s i c L a t i n }+" and is the same as"\p{IsBasicLatin}+".

And a tip for making your regexes more readable: introduce commentsinside your regexes. In other regex languages you can do that inside theregex language, but not with a regex in XSLT. You can easily fix this byputting your regexes inside a variable and always calling them with the'x' modifier:


<xsl:variable name="myregex" as="xs:string">
   (          <!-- grab everything -->
   "          <!-- start of a q. string -->
   [^"]*      <!-- zero or more non-quotes -->
   "          <!-- end of a q. string -->
   )          <!-- closing 'grab all' -->
</xsl:variable>

I use this method to some extend in a format that allows recursive andrepetitive regexes on input by just supplying a 'parser' written in XSLTwith a set of regexes placed in XML that are then applied to the input.If you have many regexes, you will find that it is easier to maintainthem by working on some library and reuse.


Cheers,
-- Abel Braaksma

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--