procmail
[Top] [All Lists]

Re: filtering question

1996-05-19 13:55:49
Dick Moores <rdm(_at_)netcom(_dot_)com> writes:
...
However, here's a search problem that has me stumped: I'd like to search
the Subject: line of the headers and bodies of all mail for "test" (case
insensitive), and attach the found messages to the folder "Test".  Easy
enough so far.  However, I don't want to find messages that have "test
case" in the body but no other instance of "test" on another line.  I.e.,
I want to find these messages:

(1) All messages that have "test" (no quotes) in the subject line,
without regard to the body.  And,

(2) All messages that have "test" (no quotes) in the body, where "test"
is not immediately followed by a space and "case" (no quotes).  ("test",
and also "test case" are not found on the same line in the mail I'm
looking for.)


That's a *beautiful* description of what you want to do.  I wish everyone
could write descriptions that clear.  Now let's do some translation.  Since
you have it written as two seperate cases, let's treat it as such, and use
two recipes.  Their writing to the same folder will be fine as long as the
use the same lockfile.

The first one is easy:

# (1) All messages that have "test" (no quotes) in the subject line,
# without regard to the body.
:0 :
* ^Subject:.*test
Test


As for the second, let's take a look at couple different recipes which
*don't* work.  First:

:0 B:
*   test
* ! test case
Test


This looks close, but it excludes too many messages, as _any_ occurences
of "test case" will make it fail.

Then there's

# Find the word "test", and save what's after it on the line.  Bail out
# if that starts with " case"
:0 B:
* test\/.*
* ! MATCH ?? ^ case
Test


This would make the choice _solely_ based on whether the first
occurence of the word "test" was followed by " case", regardless if a
latter one wasn't so suffixed.


Skipping those obvious (and wrong) "solutions", how about some that work.
There's the really-ugly-regular-expression solution:

:0 B:
* test($|[^ ]| ([^c]|$)| c([^a]|$)| ca([^s]|$)| cas([^e]|$))
Test


That should work, but it's it has to go to such an effort that the point
is lost on the reader, and any changes are difficult to make correctly.

I'd prefer to use:


# Count the number of times the word "test" appears, and subtract the
# number of times "test case" appears.  If the result is positive, then
# there was a "test" without a " case", and it's a match
:0 B:
*  1^1 test
* -1^1 test case
Test


Does that all make sense?

Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>