Re: Matching "^" (Re: Adding Lines: header)

Bart Schaefer wrote:

At the start/end of the buffer, ^ and $ match a zero-width boundary,
whereas for most embedded newlines they match the actual newline
character.


Er, no.  Here's how Stephen explained it to me years ago.

Nothing in procmail matches zero-width transitions. ^ and $ matchliteral newlines. So how do they match the start and the end of thesearch area? Well, procmail always sees the search area as surroundedby an additional newline on each side, which we've dubbed the putativenewlines.


That's why I can get a match on

 * LOGNAME ?? ^dattier$

^^ is a special regexp in procmail that matches *only* to a putativenewline and not to a real one. Additionally, extracting into $MATCHwith \/ always pulls only real characters, never putative ones; if I dothis,


 :0
 * LOGNAME ?? ()\/^dattier$
 { LOG=foo${MATCH}bar"
" }

I get "foodattierbar<one newline>" in the log, and the newline is theone from the LOG= assignment.

Now, the next thing is that when a scored non-negated regexp conditionstarts and ends with a newline and has a non-zero x-value, such thatyou're actually counting occurrences, then if a matching string ends ina newline, procmail backs up one character and starts with the newlineagain when it looks for the next appearance. That way, if the conditionis something like


 * 1^1 ^somepattern$

the newline that just matched $ at the end of somepattern can be reusedto match ^ if somepattern recurs on the next line.


So what do we have?

 * 1^1 ^

will match on the putative newline at the beginning of the search area,back up one character, match again on the opening putative newline, doit again, do it again, and do it again until the score hits supremum.It won't match just once per line. There might not be any real newlinesin the search area at all, and you'll still score supremum from it. But


 * 1^1 ^.*$

will match once on every whole line with the newlines (real or putative)surrounding it, reusing the closing newline of each line as the openingnewline of the next line. The problem is what happens at the end of thesearch area, which, if it is H or B or HB, ends with a real newline(two, in fact, because the last line is blank). After ^.*$ has matchedon the last real line of the search area, there is a spurious match whenprocmail backs up and matches again on


 <closing real newline><closing putative newline>

so that's why we have to subtract 1.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail