Nobody ever answered this:
On Wed, 4 Jun 2003, Holger Wahlen wrote:
By the way, I was wondering whether it wouldn't do to use just
* 1^1 ^
instead, but this always yields the maximum score (2147483647). Any
ideas why?
I believe this happens because ^ matches either the start of the entire
buffer, or a newline between lines. (Just as $ matches either the end of
the entire buffer, or a newline between lines.) At the start/end of the
buffer, ^ and $ match a zero-width boundary, whereas for most embedded
newlines they match the actual newline character. I say "most" because
when NOT using scoring, using ^ at the begining of a pattern plus $ at the
end of the pattern matches two newlines, but in scoring matches only one.
If this were not the case, "* 1^1 ^.*$" would count the first, third,
fifth, etc. lines, breaking the text up like so:
(^first$)
second(^
third$)
fourth(^
fifth$)
etc.
Also when scoring, procmail has to ignore the part that already matched
before it begins matching again. So the unexpected behavior is that when
"^" matches the zero-width start of the buffer, the scan starts over at
beggining of buffer, because there is no "already matched part" to ignore;
and thus matches again. This repeats until the score hits the maximum.
In fact, anything that can match an empty span will score the maximum,
including:
* 1^1 ()
* 1^1 .*
* 1^1 $
And the truly baffling:
* 1^1 $^
* 1^1 (.|$)
The latter being the reason that to compute the size of a message you
must either use the special case:
* 1^1 > 1
or separately count non-newline characters and newlines:
* 1^1 .
* 1^1 ^.*$
(which actually counts one too many).
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail