procmail
[Top] [All Lists]

Re: Trouble with multi-line MATCH

2000-02-16 12:42:10
Ralph SOBEK <sobek(_at_)irit(_dot_)fr> writes:
"SR" == Stan Ryckman <stanr(_at_)sunspot(_dot_)tiac(_dot_)net> writes:

SR> Ralph Sobek wrote:

Also if, I modify the second condition to include
the *next* line, as follows:

* 9876543210^0 B ?? $ ()\/^.*$?.*\<(${words})s?\>.*$?.*

the match fails entirely.

SR> Doesn't $? expand to the exit code of the previous command here?
SR> I'm guessing you'll want ($)? in the two spots that occurs.

I figured that one out, after I sent my request.  In fact, I just
anti-slashed those two occurences of $ followed by ?.  Philip, is
there any difference in efficiency between \$? and ($)?.  Do they mean
the same thing?

Given that the condition has shell expansion done on it, they mean the
same thing.  Furthermore, they are equally efficient.  What would be
more efficient would be to make the entire "slurp newline and following"
optional:

        * 9876543210^0 B ?? $ ()\/^.*($.*)?\<(${words})s?\>.*($.*)?

Otherwise, any non-newline character could be part of either the first
".*" or the second ".*".  The maximal matching will (when possible)
force it to be in the first, but procmail still has to consider the
other possibility.  The above grouping keeps that from happening.
Note that the other way to do the grouping:

        * 9876543210^0 B ?? $ ()\/^(.*$)?.*\<(${words})s?\>(.*$)?.*

Is less efficient -- the rule of thumb is to place the 'forcing character'
('$' in this case) at the _beginning_ of the optional (or repeated)
expression.

Hmm, do you really want MATCH to start with a newline?  If not, you could
move the '^' before the '\/' and eliminate the parens:

        * 9876543210^0 B ?? $ ^\/.*($.*)?\<(${words})s?\>.*($.*)?


A subtle bug just occured to me: the above will slurp the preceeding
_two_ lines if the word appears at the very beginning of the line, and
the following two lines if the word appears at the very end of the line,
because the '\<' or '\>' would have to match the newline.  Fixing that
requires expanding the '\<' and '\>' yourself making the regexp messier,
so you'll have to decide whether it's a real problem in this application.


Philip Guenther