[Top] [All Lists]

Re: Complexity of :matches ?

2005-07-12 02:44:25

The algorithm I used was just essentially this.
- ? can be handled trivially.
- if the pattern doesn't begin with *
   - handle that leader
- find the next string between stars
   - search for string between the stars
   - search returns a location of that substring
   - no match -> return false
   - we have a match; from now on, search only
     from the location of that match forward
- if the pattern doesn't end with *
   - handle that trailer

Just forget my previous mail.  Here we go, an algorithm without

Now that I see it, the fundamental difference to regexps is obvious:
Regexps can contain looping patterns, and wildcard patterns can not,
thus allowing to replace backtracking by string search.  Pretty cool,
Tim, and thanks for opening my eyes!

I think that this implements right-to-left greedy matching, that is,
completely ungreedy matching.  If you wanted greedy matching, you might
be able to get it by doing the search backwards.  I'm real sure that I
can't prove this.

It does sound reasonable, though.  A backward search will match the
shortest possible pattern right to a string, starting at the last one,
leaving the longest possible one left to it.  Since that applies to all
strings, indeed you get greedy matching that way. (Unless I write crap
again. ;-)

I am unsure how to proceed on documenting something in the security
considerations.  Tim showed that there is no risk really, if you do
things right.  I showed how tempting a naive approach is.  Ned was sure
that his code was not to be breaken easily, but didn't want to exclude
that there might be a pathological case for sure.

It feels silly to document that done wrong, there is a security risk.
Then again, it is not easy to see that it is possible to avoid exponential
complexity even for the worst case, and how.  Future implementations
might work unsafe, if we don't document something.


<Prev in Thread] Current Thread [Next in Thread>