procmail
[Top] [All Lists]

Re: syntax rant (was Re: bug: overoptimizing RE)

1998-01-23 13:07:39
Eli wrote, understandably,

| It is very frustrating to have a world of programs
| which all have regular expressions that at least share a common subset
| of features and then come to procmail and find that those commonalities
| are missing (and that this is poorly documented is just icing on the cake).

Procmail's regexp engine was designed for its special needs.  Few users get
bitten by the differences, but Eli's tests are far more complex than the
average user's, and he's run into problems because of those differences again
and again.  I can understand his holding such an opinion.

That said, the original implementation of extraction was much, much worse.
It was like this: match the left side stingily and then require the right
side to begin immediately afterward.  Let's go back to Eli's second example,
which works as expected NOW:

  * ^Subject: Keywords.*\/[0-9]+

In the original implementation, if there were two matches to the left side
in the search area, and the first one didn't match the entirety,

Subject: Keywords are delicious.
....
Subject: Keywords9999

procmail would find the first occurrence of "<newline>Subject: Keywords.*",
see that the following text (" are delicious.") did not match the right side,
declare the entire condition failed, leave MATCH unchanged, and skip out
on the rest of the recipe ... even though, had there been no extraction
marker, the condition would have matched because of the later string.

On the other hand, suppose the first (or only) match in the search area
to the left side was this:

Subject: Keywords 9999

procmail would likewise see a match to the left side, find that the sub-
sequent text (" 9999") did not match the right side, declare the entire
condition failed, leave MATCH unchanged, and skip out on the rest of the
recipe, even though, had there been no extraction marker, the condition
would have matched that very string in the text.  There was no way at all
to allow for a variable number of spaces in the actual text and still extract
only the digits in a single step (at least none I ever found).

In either of those examples, however, if the condition had been

 * ^Subject: Keywords *\/[0-9]*

procmail would have declared the condition successful but set MATCH="".

<Prev in Thread] Current Thread [Next in Thread>