procmail
[Top] [All Lists]

Re: Comments?

1997-06-18 17:37:00
Where W. Wesley Groleau had,

|  > * -95^1  ^.*$

Era Eriksson suggested,

| I'm merely speculating that searching for just ()$ might be more
| efficient. (You need the parens because of Procmail's funny treatment
| of leading dollar signs.)

Too efficient, Era, and thus too inefficient.  ^ or ($) will be found an
infinite number of times in any text, making a scoring recipe run up to
maximum (whether that's supremum or infemum depends on the sign).  For some
values of w and x the score might eventually converge, and I can't guess if
procmail will ever give up.  [Procmailsc is smart enough to stop after one
match if x=0, so it might be smart enough to stop when the additional score
of another match is infinitesimal to the point of being beneath the resolu-
tion of the math library.]

The reason is this: in order to be able to count the newline character be-
tween two lines of text as both the $ of the line that ends there and as the
^ of the line that begins there, procmailsc's "overlapping occurrences don't
match" rule has an exception.  If a match to the expression ends with a
newline (and the condition is weighted, and x != 0), the search for the next
match will begin WITH, not after, that newline.  Thus a search for ()$ or ($)
or .*$ or ^ or ^.* will find a newline and count it again and again and again
and again.

That's why we use  ^.*$  to count lines of text.  By looking for two newlines
at a time, we advance through the text by one newline at a time.

<Prev in Thread] Current Thread [Next in Thread>