ietf-mta-filters
[Top] [All Lists]

Re: Complexity of :matches ?

2005-07-11 12:51:41

However, I do not know if simply using an optimized non-greedy approach
actually reduces the computational complexity to something less than O(M*N**K)
for extremely pathological cases.

I had a implementation (wasn't Sieve, but had * and ? and worked the same way) that was only O(M^N), that is, the number of stars didn't matter. Barring pathological cases that reduce the match to O(N^2), I believe this algorithm is Θ(N). It's possible to be actually O(M+N) if you use Boyer-Moore.

The algorithm I used was just essentially this.
- ? can be handled trivially.
- if the pattern doesn't begin with *
  - handle that leader
- find the next string between stars
  - search for string between the stars
  - search returns a location of that substring
  - no match -> return false
  - we have a match; from now on, search only
    from the location of that match forward
- if the pattern doesn't end with *
  - handle that trailer

That is, for with a pattern like *a*b*c* is three strings, search for each of which can be found in best-case O(N) time. If we search for 'a' first and find the first match at position 10, we start searching for 'b' at position 11. For a pattern like a*b*c*d*e, we do the same thing after making sure the string starts with a and ends with e.

The pathological case for this, like string comparison, is only, uh, O((M+N)^2) or so, and then only if you're using naive string comparisons (which my implementation did).

Since I'm not sure that I've gotten my point across, let me make a more straightforward statement. If you're doing globbing, and you don't care if it's greedy or not, and you're doing recursion, then you're doing something wrong. Finding one match is something that can be done in roughly linear time. Recursion is only necessary for finding the "best" match which is not something that's in the base Sieve spec.

I think that this implements right-to-left greedy matching, that is, completely ungreedy matching. If you wanted greedy matching, you might be able to get it by doing the search backwards. I'm real sure that I can't prove this.

Tim


<Prev in Thread] Current Thread [Next in Thread>