Eli wrote,
| This was shown to me on procmail 3.11pre3, and I have duplicated it
| on 3.11pre7. Apparently in some cases REs are not "greedy" on the
| right side of a \/ capture.
That's not the case at all. In Eli's example the right side was as greedy as
it could be; the problem is that he seemed to expect greed on the left as
well. Let's examine it:
| ------ rc file ------
| VERBOSE=y
| :0:
| * ^Subject:.*Keywords.*\/[0-9]*
| /tmp/$MATCH
| ------ rc file ------
|
| ------ test message ------
| From just(_at_)test
| Subject: Keywords 9999
| To: just
|
| baodyu
| ------ test message ------
... and MATCH is set to null, contrary to Eli's expectation.
It is not a bug but rather a frequently misunderstood effect of the way
extraction is advertised to operate. This has come up before, and Philip
Guenther has posted a long illustration.
Remember that only the right side is greedy; the left side is stingy, and
left-side stinginess takes precedence over right-side greed.
Extraction is implemented this way: the entire expression, left and right, is
pinned to the shortest possible match; then the division mark is placed and
the right side is repinned to the longest possible match starting at the di-
vision. The tricky part is to remember that the division is marked during
the stingy stage.
If the expression is
^Subject:.*Keywords.*\/[0-9]*
and the text is
<newline>Subject:<space>Keywords<space>9999<newline>
then the shortest possible match to the entirety is
<newline>Subject:<space>Keywords
because ".*" and "[0-9]*" both match to null. Then the division mark is
placed on the space after "Keywords" and procmail looks for the longest
possible match to [0-9]* starting with that space. That, again, is null,
so MATCH is set to null.
Eli noted that it works as he expected if the regexp is changed to this:
^Subject:.*Keywords.*\/[0-9]+
That is a whole other ball of wax. Now the shortest match to the entirety is
<newline>Subject:<space>Keywords<space>9
and the division mark is placed at the 9. Then procmail refigures the
longest match to the right side starting at the division mark and sets
MATCH=9999.
Note that this would have given a non-null value to MATCH:
^Subject:.*Keywords\/.*[0-9]*
With the ".*" after "Keywords" moved to the right, it becomes greedy and
reaches all the way to the digits, and MATCH="<space>9999".
Call it counterintuitive, but it's not a bug. General advice: always make
sure that the right side cannot match null or that the last element of the
left side cannot match null.