procmail
[Top] [All Lists]

Re: Relegating overly-quoted e-mail to a low priority mailbox

2000-03-24 12:33:26
Ralph Sobek has a recipe with these conditions ($MATCH begins with a caret),
trying to make sure that $MATCH appears in no more than 75% of the lines of
the search area:

|       *$ 1^1 $MATCH
|       *$ -3^1  ! $MATCH

and was expecting the second one to score -3 for every line in the search
area that doesn't match $MATCH.  But he found,

| The score for "*$ -3^1  ! $MATCH" is always ZERO.

Right.  Either the search area contains the regexp or it doesn't, so ab-
sence of the regexp occurs either once (if it is not in the search area)
or not at all (if the regexp does appear in the search area), so a condition
of the form

  * w^x ! regexp

will score w if the regexp is absent and 0 if it is present, regardless of
x; and if it is present at all, the score will be 0 whether it appears once
or a billion times.  In Ralph's example, an earlier condition in the recipe
was an unweighted test that determines the contents of $MATCH, so $MATCH will
always be present and the score for not finding $MATCH will always be zero.

To count the number of mismatches to a regexp, as Ralph seems to have
expected, doesn't make sense.  Between the opening putative newline and the
first character of the search area there are an infinite number of null
strings, each of which is a mismatch for any non-null regexp.  If the regexp
is null, then every character in the search area mismatches it.  So you can't
really count the number of mismatches inside the search area, only the number
of matches.

Since we're counting lines, the solution is fairly easy:

 * -3^1 ^.*$
 * 3^0
 * 4^1 $ $MATCH

(The second score is to adjust for a quirk in counting occurrences of ^.*$,
 which will count one too many.)

For the score to come out positive and the action to be executed, the number
of appearances of $MATCH must be greater than 3/4 the total number of lines.


<Prev in Thread] Current Thread [Next in Thread>