Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$'))
2011-07-13 07:55:27
Thanks Oliver, that makes a lot of sense.
Regards
On Wed, Jul 13, 2011 at 12:33 PM, Oliver Hallam <oliver(_at_)xqsharp(_dot_)com>
wrote:
The way that fn:matches usually works is that it attempts to apply the
regular expression starting from each character in the string.
So, looking at the regular expression ".*?Expr" where Expr is another
regular expression:
.*? is a reluctant match and so we try to match the shortest expression
possible first.
Starting from the first character:
Try match the empty string for .*? and then try to mach Expr starting from
the first character.
Try to match a single character for .*? and then try to match Expr starting
from the second character.
Try to match two characters for .*? and then try to match Expr starting
from the third character.
...
Thus in trying to find a match for ".*?Expr" starting at a particular
character we try to match Expr starting at every other caharcter in the
string, until a match is found.
fn:matches attempts to find a match for the regular expression starting from
each character in the string. If a match isnt found starting at the first
character then it will try again starting at the second and so on. So if
there are no matches then Expr is tested n^2 times where n is the number of
characters in the string, wheras matches applied to Expr would just test it
n times.
Now, if your query was (note the '^'):
matches($title,'^.*?(\.|,)\s*$')
then I would expect perfomance to be very similar to:
matches($title,'(\.|,)\s*$')
It would be perfectly valid (and sensible) for a query processor to realise
that the two expressions you gave were equivalent and so not perform n^2
tests, but I am unaware of a processor that makes these kinds of
optimizations to regular expressions.
Oliver Hallam
XQSharp
On 12/07/2011 17:26, Alex Muir wrote:
Hi,
I'm wondering why matches($title,'.*?(\.|,)\s*$')) can perform so much
worse than matches($title,'(\.|,)\s*$'))
I found at least in one file out of thousands that I process the first
one can take a good 30 minutes to complete and the second is quick.
No doubt the .*? causes some problem but what exactly is the problem
that is causes?
Regards
--
Alex Muir
Instructor | Program Organizer - University Technology Student Work
Experience Building
University of the Gambia
http://sites.utg.edu.gm/alex/
Low budget software development benefiting development in the Gambia,
West Africa
Experience of a lifetime, come to Gambia and Join UTSWEB -
http://sites.utg.edu.gm/utsweb/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Alex Muir
Instructor | Program Organizer - University Technology Student Work
Experience Building
University of the Gambia
http://sites.utg.edu.gm/alex/
Low budget software development benefiting development in the Gambia,
West Africa
Experience of a lifetime, come to Gambia and Join UTSWEB -
http://sites.utg.edu.gm/utsweb/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Alex Muir
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')),
Alex Muir <=
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam
- Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Liam R E Quin
|
Previous by Date: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam |
Next by Date: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay |
Previous by Thread: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Oliver Hallam |
Next by Thread: |
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')), Michael Kay |
Indexes: |
[Date]
[Thread]
[Top]
[All Lists] |
|
|