Bart Schaefer wrote:
At the start/end of the buffer, ^ and $ match a zero-width boundary,
whereas for most embedded newlines they match the actual newline
character.
Er, no.  Here's how Stephen explained it to me years ago.
Nothing in procmail matches zero-width transitions.  ^ and $ match 
literal newlines.  So how do they match the start and the end of the 
search area?  Well, procmail always sees the search area as surrounded 
by an additional newline on each side, which we've dubbed the putative 
newlines.
That's why I can get a match on
 * LOGNAME ?? ^dattier$
^^ is a special regexp in procmail that matches *only* to a putative 
newline and not to a real one.  Additionally, extracting into $MATCH 
with \/ always pulls only real characters, never putative ones; if I do 
this,
 :0
 * LOGNAME ?? ()\/^dattier$
 { LOG=foo${MATCH}bar"
" }
I get "foodattierbar<one newline>" in the log, and the newline is the 
one from the LOG= assignment.
Now, the next thing is that when a scored non-negated regexp condition 
starts and ends with a newline and has a non-zero x-value, such that 
you're actually counting occurrences, then if a matching string ends in 
a newline, procmail backs up one character and starts with the newline 
again when it looks for the next appearance.  That way, if the condition 
is something like
 * 1^1 ^somepattern$
the newline that just matched $ at the end of somepattern can be reused 
to match ^ if somepattern recurs on the next line.
So what do we have?
 * 1^1 ^
will match on the putative newline at the beginning of the search area, 
back up one character, match again on the opening putative newline, do 
it again, do it again, and do it again until the score hits supremum. 
It won't match just once per line.  There might not be any real newlines 
in the search area at all, and you'll still score supremum from it.  But
 * 1^1 ^.*$
will match once on every whole line with the newlines (real or putative) 
surrounding it, reusing the closing newline of each line as the opening 
newline of the next line.  The problem is what happens at the end of the 
search area, which, if it is H or B or HB, ends with a real newline 
(two, in fact, because the last line is blank).  After ^.*$ has matched 
on the last real line of the search area, there is a spurious match when 
procmail backs up and matches again on
 <closing real newline><closing putative newline>
so that's why we have to subtract 1.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail