Bart Schaefer wrote:
At the start/end of the buffer, ^ and $ match a zero-width boundary,
whereas for most embedded newlines they match the actual newline
character.
Er, no. Here's how Stephen explained it to me years ago.
Nothing in procmail matches zero-width transitions. ^ and $ match
literal newlines. So how do they match the start and the end of the
search area? Well, procmail always sees the search area as surrounded
by an additional newline on each side, which we've dubbed the putative
newlines.
That's why I can get a match on
* LOGNAME ?? ^dattier$
^^ is a special regexp in procmail that matches *only* to a putative
newline and not to a real one. Additionally, extracting into $MATCH
with \/ always pulls only real characters, never putative ones; if I do
this,
:0
* LOGNAME ?? ()\/^dattier$
{ LOG=foo${MATCH}bar"
" }
I get "foodattierbar<one newline>" in the log, and the newline is the
one from the LOG= assignment.
Now, the next thing is that when a scored non-negated regexp condition
starts and ends with a newline and has a non-zero x-value, such that
you're actually counting occurrences, then if a matching string ends in
a newline, procmail backs up one character and starts with the newline
again when it looks for the next appearance. That way, if the condition
is something like
* 1^1 ^somepattern$
the newline that just matched $ at the end of somepattern can be reused
to match ^ if somepattern recurs on the next line.
So what do we have?
* 1^1 ^
will match on the putative newline at the beginning of the search area,
back up one character, match again on the opening putative newline, do
it again, do it again, and do it again until the score hits supremum.
It won't match just once per line. There might not be any real newlines
in the search area at all, and you'll still score supremum from it. But
* 1^1 ^.*$
will match once on every whole line with the newlines (real or putative)
surrounding it, reusing the closing newline of each line as the opening
newline of the next line. The problem is what happens at the end of the
search area, which, if it is H or B or HB, ends with a real newline
(two, in fact, because the last line is blank). After ^.*$ has matched
on the last real line of the search area, there is a spurious match when
procmail backs up and matches again on
<closing real newline><closing putative newline>
so that's why we have to subtract 1.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail