On Thu, Apr 28, 2005 at 12:07:34AM +1000, Peter Jones wrote:
First my procmail 'blacklist.rc' file:
* ^Subject:[ ]+\/.+
* ? /usr/local/bin/testsubject $BLACKLIST "$MATCH"
Where the "Subject:[ ]+" is actually "Subject:[<SPACE><TAB>]+"
(I'm not sure how necessary that construct is, but I picked it up
from somewhere on the 'net...)
Since what matches to the right of the `\/' match token is
rightward-greedy, your `+' there is essentially meaningless.
Suppose we have a case of a Subject where, after the colon,
there are two spaces. You are requiring a minimum of one
space or tab in your bracketted character class. Procmail
will try not to do more work than is minimally required of it.
It will match one space to the left, then save the second
space in the matched text to the right of the token. So
your $MATCH value will start with a leading space, then have
whatever followed that (second) space in the original
So one might think, then, that a good way to write the
regex to capture the Subject would be:
* ^Subject:[ ]+\/[^ ].+
That, too, would be imperfect. While it would work in almost
all cases where there is a non-whitespace-only Subject, it
will fail to match on a small minority of Subject lines that
have NO whitespace after the colon. Per the RFCs, that is
not to be ruled out.
So let's just grab any Subject starting with the first
non-whitespace character, shall we?
* ^Subject:.*\/[^ ].+
(Everywhere in the brackets above is both a space and a tab;
and, where appropriate, a leading caret char is present to
demarcate "not" in charclass-speak.)
Second, my /usr/local/bin/testsubject which is a perl script:
It's not at all clear to me why you would prefer a home-grown
perl call to one of the egrep variants.
See the list archives and search on "blacklist" and "grep"
procmail mailing list Procmail homepage: http://www.procmail.org/