However, I do not know if simply using an optimized non-greedy approach
actually reduces the computational complexity to something less than O(M*N**K)
for extremely pathological cases.
I had a implementation (wasn't Sieve, but had * and ? and worked the
same way) that was only O(M^N), that is, the number of stars didn't
matter. Barring pathological cases that reduce the match to O(N^2), I
believe this algorithm is Θ(N). It's possible to be actually O(M+N) if
you use Boyer-Moore.
The algorithm I used was just essentially this.
- ? can be handled trivially.
- if the pattern doesn't begin with *
- handle that leader
- find the next string between stars
- search for string between the stars
- search returns a location of that substring
- no match -> return false
- we have a match; from now on, search only
from the location of that match forward
- if the pattern doesn't end with *
- handle that trailer
That is, for with a pattern like *a*b*c* is three strings, search for
each of which can be found in best-case O(N) time. If we search for 'a'
first and find the first match at position 10, we start searching for
'b' at position 11. For a pattern like a*b*c*d*e, we do the same thing
after making sure the string starts with a and ends with e.
The pathological case for this, like string comparison, is only, uh,
O((M+N)^2) or so, and then only if you're using naive string comparisons
(which my implementation did).
Since I'm not sure that I've gotten my point across, let me make a more
straightforward statement. If you're doing globbing, and you don't care
if it's greedy or not, and you're doing recursion, then you're doing
something wrong. Finding one match is something that can be done in
roughly linear time. Recursion is only necessary for finding the "best"
match which is not something that's in the base Sieve spec.
I think that this implements right-to-left greedy matching, that is,
completely ungreedy matching. If you wanted greedy matching, you might
be able to get it by doing the search backwards. I'm real sure that I
can't prove this.
Tim