procmail
[Top] [All Lists]

Re: Matching "^" (Re: Adding Lines: header)

2003-06-19 20:38:26
Bart Schaefer wrote:

At the start/end of the buffer, ^ and $ match a zero-width boundary,
whereas for most embedded newlines they match the actual newline
character.

Er, no.  Here's how Stephen explained it to me years ago.

Nothing in procmail matches zero-width transitions. ^ and $ match literal newlines. So how do they match the start and the end of the search area? Well, procmail always sees the search area as surrounded by an additional newline on each side, which we've dubbed the putative newlines.

That's why I can get a match on

 * LOGNAME ?? ^dattier$

^^ is a special regexp in procmail that matches *only* to a putative newline and not to a real one. Additionally, extracting into $MATCH with \/ always pulls only real characters, never putative ones; if I do this,

 :0
 * LOGNAME ?? ()\/^dattier$
 { LOG=foo${MATCH}bar"
" }

I get "foodattierbar<one newline>" in the log, and the newline is the one from the LOG= assignment.

Now, the next thing is that when a scored non-negated regexp condition starts and ends with a newline and has a non-zero x-value, such that you're actually counting occurrences, then if a matching string ends in a newline, procmail backs up one character and starts with the newline again when it looks for the next appearance. That way, if the condition is something like

 * 1^1 ^somepattern$

the newline that just matched $ at the end of somepattern can be reused to match ^ if somepattern recurs on the next line.

So what do we have?

 * 1^1 ^

will match on the putative newline at the beginning of the search area, back up one character, match again on the opening putative newline, do it again, do it again, and do it again until the score hits supremum. It won't match just once per line. There might not be any real newlines in the search area at all, and you'll still score supremum from it. But

 * 1^1 ^.*$

will match once on every whole line with the newlines (real or putative) surrounding it, reusing the closing newline of each line as the opening newline of the next line. The problem is what happens at the end of the search area, which, if it is H or B or HB, ends with a real newline (two, in fact, because the last line is blank). After ^.*$ has matched on the last real line of the search area, there is a spurious match when procmail backs up and matches again on

 <closing real newline><closing putative newline>

so that's why we have to subtract 1.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>