At 12:20 2002-02-15 -0500, Ruben I Safir did say:
The procmail regex are bizzare.
I've had a run-in now and again with them, but it has nearly always been
because I was looking at things with blinders on and not remembering that
different tools are used for different things.
Why not just use the perl ones.
I'll take a slash at it: because procmail isn't perl? It isn't written in
perl, it doesn't emulate perl...
sed regex are sligtly different than perl, Grep is too. Please write gnu
and tell them to change their tools to fully emulate perl.
You can always pipe all your messages to a perl script if that strikes your
fancy though.
Anyone know why this fails?
There are several reasons, and probably even a few that I'm not noticing.
:0 iw
* ^Subject.*[A-Z]+$
|/home/ruben/complain.pl
I'm trying to capute all messages consisting of only upper case letters in
the subject and pipe
them to complain.pl which complains to the sender.
It seems to capture EVERYTHING, or nearly so.
'man procmailrc' shows one of the flags you should be using is 'D' for
case-sensitivity. Also, even if you converted that basic expression into
Perl, what you're asking for is one or more uppercase letters at the END of
the subject line - not that ALL letters in the subject are uppercase. You
also omitted the colon which terminates the PROPER subject header.
Thus, the following header could trip your expression, even if it were
perlesque:
Subject: Here's that file - AnnualReport.DOC
As it happens, regular readers of this list will recall that this very
topic (allcaps subjects) was covered just about a month ago here (and many
times in the past as well). The recipe was:
:0 # whitespace in brackets comprise a space and a tab
* ^Subject:[ ]\/.*
{
:0 D
* MATCH ?? [a-z]
{ }
:0 E:
ALLCAPS
}
(you can find that in the list archives)
Of course, this doesn't REALLY capture uppercase-only - it actually catches
messages NOT containing lowercase letters. All numerics for example would
trip it. As would a whitespace-only subject line.
For your purposes, you'd modify that last part of the rule (ALLCAPS) to be:
:0EDiw
* ^S[Uu][Bb][Jj][Ee][Cc][Tt]:.*[A-Z]
|/home/ruben/complain.pl
You'd be sure to be catching a subject with at least ONE honest-to-goodness
uppercase letter in it. Double up the character class at the end there if
you want to ensure that there are at least two adjacent letters (this ONLY
being executated if there are NO lowercase letters).
Another regexp (from the same discussion which netted the above), is:
# if no subject header at all or an empty one, it's likely spam; otherwise,
# extract, # but don't include Re: if it's there, because a lower-case "e"
# in "Re:" is no excusal
:0:
* ! ^Subject: *Re:\/.+
* ! ^Subject:\/.+
spam
# extracted text has at least one capital letter and no lower-case letters
:0ED:
* MATCH ?? [A-Z]
* ! MATCH ?? [a-z]
spam
You can modify these to your whim (note that the second recipe is directly
reliant on the first, since the first defines the MATCH variable).
I pray that you don't plan on running this perl script on list messages,
because you'll find yourself kicked from a lot of lists in a hurry.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail