Re: Complexity of :matches ?


Michael Haardt <michael(_at_)freenet-ag(_dot_)de> writes:
...

So you suspect that there might be an algorithm that does not perform
backtracking? I can not imagine O(N*M) as worst case unless we can kick
backtracking out, although it might well be an average case.  But that's
just a feeling and it may be wrong.


Does your :contains implementation (substr()?) perform backtracking?
What's its worst-case big-oh?


...

If we can not come up with a proof for worst-case complexity of non-greedy
algorithms,  <...>


Given that you can obviously locate a match for a string of 'n' non-star
characters in O(n*M), isn't the algorithm that Tim described a
worst-case of O(N*M) where N is the number of characters _other_ than
star in the pattern?  (Stars are the cheap part, as they can never force
a backtrack!)


...

<Ned wrote>

In any case, the computational complexity of glob-style matches is
clearly less than that of regexp, which is hardly surprising given
that regexps require a NFA. This is why we disable regexp support by
default in our implementation.


That's what I wonder about.  The NFA can be converted to a DFA,
backtracking being an alternative, but that same appears to be true for
wildcard patterns to me, if multiple stars are allowed.


<pause to pull out the dragon book>

(To reiterate, N is the size of the pattern, M is the size of the text
being matched against.)

For an NFA, the setup is O(N) in time and space and matching is O(N*M).

For a non-lazy DFA, the matching is O(M), but the worst-case setup is
O(2^N).

A lazy DFA has setup O(N), but the dragon book glosses its matching
complexity: they claim it's "almost as fast" as a non-lazy DFA, but I
think it degenerates to a weird NFA if no states are reused, making it
O(N*M).  A precise statement of the complexity of calculating a new
state is needed to fully characterize that.


Philip Guenther