procmail
[Top] [All Lists]

Re: bug: overoptimizing RE

1998-01-21 15:30:29
"Eli's Procmail Stuff" <procmail(_at_)qz(_dot_)to> writes:
This was shown to me on procmail 3.11pre3, and I have duplicated it
on 3.11pre7. Apparently in some cases REs are not "greedy" on the
right side of a \/ capture.

------ rc file ------
VERBOSE=y
:0:
* ^Subject:.*Keywords.*\/[0-9]*
/tmp/$MATCH
------ rc file ------

------ test message ------
From just(_at_)test
Subject: Keywords 9999
To: just

baodyu
------ test message ------
...
What should have happened is that the 9999 should have been captured
and the mail saved into /tmp/9999. Instead MATCH is unset and the
normal save-to-a-directory action occurs.  As a work around, changing
the RE to be "...\/[0-9]+" does work, but this is a bug and may cause
other subtler problems.

Bzzt.  It *did* match greedily on the right hand side.  The problem is
that you let it match too early.  It matched as follows:

                         1              0      0
RE:     ^     Subject:   .*  Keywords  .* \/ [0-9]*  (beyond what was matched)
Data:   "\n" "Subject:" " " "Keywords"               " 9999" "\n"

The ".*" right before the \/ was matched minimally, zero times, and
then the right-hand side matched maximally, which was *zero* because
there wasn't a digit there.  Procmail didn't backup and try a longer
match on the left- hand side because it didn't need to: the regexp
matched.

What this says is that you should almost always force the right-hand
side of the \/ to match at least one character.  For instance,
replacing the '*' with a '+' on the right-hand side solves the problem
in this case, as procmail will have to keep matching more and more on
the left-hand side until it gets a digit for matching on the right-hand
side.

        :0:
        * ^Subject:.*Keywords.*\/[0-9]+
        /tmp/$MATCH


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>