procmail
[Top] [All Lists]

Re: strlen() in procmail

2003-01-15 11:38:28
On Wed, 15 Jan 2003, David W. Tamkin wrote:

All right, Bart:

| Now I want to add something that will extract a $MATCH without
| changing $= any further.

Will you settle for this?

 *  1^0 ^In-Reply-To:[^<]+<\/[^>]+
 * -1^0 ^In-Reply-To:[^<]+<[^>]+

I was going to answer "yes", but read on to the end.  This one is not
perfect if the score could be hovering near the supremum.  Your next
suggestion mostly addresses that:

 *  -.000001^0 ^In-Reply-To:[^<]+<\/[^>]+

I'd probably end up using that one in any recipe that already had scoring,
because (I think) it's more efficient than my solution below.

| Perhaps something along the lines of
|
| * (|^In-Reply-To:[^<]+<\/[^>]+)

That you'll have to test.

I did, and ...

I'd be more comfortable putting the empty
alternative second, or writing it this way:

 * (^In-Reply-To:[^<]+<\/[^>]+)?

... both of those too, and none of the three has the desired effect.

My concern, not having yet tested it, is not that parentheses and
extraction won't mix, but that in the absence of a weight where x!=0,
procmail will match on the null string between the opening putative
newline and the first character of the search area and extract nothing.

Yes, that's part of what I was worried about, better phrased.  Procmail 
normally matches minimally to the left of \/ and maximally to the right
of it, but when it appears inside parens it's not clear what counts as
"left" and "right".  Emperically, the entire parenthesized expression is
matched minimally before considering the \/ subexpression.

To illustrate the rest of my concerns, however ... I also tried this:

* ^^(.|$)*(^In-Reply-To:[^<]+<\/[^>]+)?(.|$)*^^

That is, force procmail to examine the entire header.  The results were
as if I had written this instead:

* ^^(.|$)*(^In-Reply-To:[^<]+<)?\/([^>]+)?(.|$)*^^

That is, it found the beginning of the MATCH where I wanted it to, but
it didn't end at the end of the parenthesized expression.

However, that gave me the clue I needed to come up with this:

* (^In-Reply-To:[^<]+<\/[^>]+|^^(.|$)*^^)

The "extraneous" anchoring in ^^(.|$)*^^ forces a scan of the full header,
during which the subexpression I actually care about matches before
reaching the end, and MATCH gets set properly.

Whew.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>