procmail
[Top] [All Lists]

Re: Matching all but one header

1997-08-03 11:13:00
When Antti-Juhani Kaijanaho suggested,

K-> How would this sound:
K-> * ^([^S][^u][^b][^j][^e][^c][^t]|[^:]?[^:]?[^:]?[^:]?[^:]?[^:]?\
K-> |[^:][^:][^:][^:][^:][^:][^:][^:]*):.*\<phrase(\>|$)

I said, among other things, that

T> it would fail to catch any header with a seven-character name in which
T> even one letter matched the corresponding position in "Subject" [such
T> as "Summary" or "Project"].

and Antti-Juhani responded,

K> Yes.  I knew there has to be something I had overlooked.  Thanks.

You're welcome.  Your logic could have been handled this way, though: the
approach was to permit shorter names than "Subject", seven-letter names that
are not "Subject", and longer names than "Subject", so let's try this:

 * ^(..?.?.?.?.?[       :]|[^s]|.[^u]|..[^b]|...[^j]|....[^e]|.....[^c]|\
     ......[^t]|Subject[^       :])(.*\<)?phrase\>

[Both \< and \> can match a newline, so it isn't necessary to alternate
"(\>|$)".]

Since the first character in a header line is never going to be a colon,
the first alternation lets through any header where a colon, a space, or a
tab is the second, third, fourth, fifth, sixth, or seventh character (white-
space between the name and the colon is permitted, though I've never seen it
used, so I'm allowing for it); that sanctions all names that are shorter than
"Subject".  The next seven permit those that don't start with S (assuming no
D flag and insensitivity to case), those where the second character isn't u,
etc.  Since it is not closed with a colon, these also allow any name longer
than seven characters as long as the first seven are not "Subject".

Finally, if the first seven characters *are* "Subject" but that's not the end
of the name, the last alternation permits it.

Come to think of it, any match to the first alternation is a match to at
least one of the next seven, so the first is unnecessary.  (For example,
a five-letter field name will have a space, a tab, or a colon in the sixth
position, so it will match ".....[^c]".)  Let's get rid of it:

 * ^([^s]|.[^u]|..[^b]|...[^j]|....[^e]|.....[^c]|......[^t]|Subject[^  :])\
    (.*\<)?phrase\>

But all told, I'd rather stick with the scoring method: since we're using
word delimiters now it needs a little tweaking.  We can go by lines
containing the phrase:

    :0h
    * 1^1 ^(.*\<)?phrase\>
    * -1^1 ^Subject:(.*\<)?phrase\>
    /dev/null

or by appearances of the phrase:

   :0
   * \<phrase\>
   * ^Subject:\/.*
   {
    SUBJECT_CONTENTS=$MATCH

    :0h
    * 1^1 \<phrase\>
    * -1^1 SUBJECT_CONTENTS ?? \<phrase\>
    /dev/null
   }

The second method gets an inexact count when we have \<phrase.phrase\> in the
head (in the particular case of the original question, that is not going to
happen anyway, but in the general case it might), but the net score will
still be of the correct sign, so the action will be taken under the correct
circumstances.

Oh, for the record, back to an earlier question in this thread, I did some
experimenting, and in

 * w^x ! regexp

or

 * w^x ! variable ?? regexp

x is indeed ignored.  Regardless of the value of x, the condition scores 0
if the search area contains one or more matches to the regexp and w if it
contains none.

<Prev in Thread] Current Thread [Next in Thread>