Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$'))
2011-07-13 12:46:36
That is interesting. I was aware that there are some very smart regex
engines out there, but wasn't aware that they had made it to any
XQuery/XSLT processors yet.
Another interesting article is this one describing some of the
optimizations performed by the regex engine in Google Chrome:
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html
This mentions another trick used by some regex implementations. In
their example "Sun|Mon", their engine recognises that a match for this
expression always contains "n" in the third character, and so rather
than testing for a match at each index in the string (which was the
problem with the example given) they first scan the string to find "n"
characters and only try to apply the regex starting two characters
preceding one. I would not be at all suprised if they recognized that a
regex beginning .* needs only be applied to the first character.
Oliver
XQSharp
On 13/07/2011 15:13, Michael Kay wrote:
It would be perfectly valid (and sensible) for a query processor to
realise that the two expressions you gave were equivalent and so not
perform n^2 tests, but I am unaware of a processor that makes these
kinds of optimizations to regular expressions.
Actually I've heard it said that there's a wide variation between
different regex engines in how well they handle this kind of thing.
See for example here:
http://swtch.com/~rsc/regexp/regexp1.html
The article at
http://eyalsch.wordpress.com/2009/05/21/regex/
is also useful.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Alex Muir
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Alex Muir
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')),
Oliver Hallam <=
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Liam R E Quin
|
Previous by Date: |
Re: [xsl] format-number/bankers' rounding problem, Michael Kay |
Next by Date: |
Re: [xsl] format-number/bankers' rounding problem, David Carlisle |
Previous by Thread: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay |
Next by Thread: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay |
Indexes: |
[Date]
[Thread]
[Top]
[All Lists] |
|
|