procmail
[Top] [All Lists]

Re: procmail regex

2002-02-15 11:27:05
At 12:20 2002-02-15 -0500, Ruben I Safir did say:

The procmail regex are bizzare.

I've had a run-in now and again with them, but it has nearly always been because I was looking at things with blinders on and not remembering that different tools are used for different things.

Why not just use the perl ones.

I'll take a slash at it: because procmail isn't perl? It isn't written in perl, it doesn't emulate perl...

sed regex are sligtly different than perl, Grep is too. Please write gnu and tell them to change their tools to fully emulate perl.

You can always pipe all your messages to a perl script if that strikes your fancy though.

Anyone know why this fails?

There are several reasons, and probably even a few that I'm not noticing.

:0 iw
* ^Subject.*[A-Z]+$
|/home/ruben/complain.pl


I'm trying to capute all messages consisting of only upper case letters in the subject and pipe
them to complain.pl which complains to the sender.

It seems to capture EVERYTHING, or nearly so.

'man procmailrc' shows one of the flags you should be using is 'D' for case-sensitivity. Also, even if you converted that basic expression into Perl, what you're asking for is one or more uppercase letters at the END of the subject line - not that ALL letters in the subject are uppercase. You also omitted the colon which terminates the PROPER subject header.

Thus, the following header could trip your expression, even if it were perlesque:

        Subject: Here's that file - AnnualReport.DOC


As it happens, regular readers of this list will recall that this very topic (allcaps subjects) was covered just about a month ago here (and many times in the past as well). The recipe was:

:0 # whitespace in brackets comprise a space and a tab
* ^Subject:[    ]\/.*
{
 :0 D
 * MATCH ?? [a-z]
 { }

 :0 E:
 ALLCAPS
}

(you can find that in the list archives)

Of course, this doesn't REALLY capture uppercase-only - it actually catches messages NOT containing lowercase letters. All numerics for example would trip it. As would a whitespace-only subject line.

For your purposes, you'd modify that last part of the rule (ALLCAPS) to be:

 :0EDiw
 * ^S[Uu][Bb][Jj][Ee][Cc][Tt]:.*[A-Z]
 |/home/ruben/complain.pl

You'd be sure to be catching a subject with at least ONE honest-to-goodness uppercase letter in it. Double up the character class at the end there if you want to ensure that there are at least two adjacent letters (this ONLY being executated if there are NO lowercase letters).

Another regexp (from the same discussion which netted the above), is:

# if no subject header at all or an empty one, it's likely spam; otherwise,
# extract, # but don't include Re: if it's there, because a lower-case "e"
# in "Re:" is no excusal
:0:
* ! ^Subject: *Re:\/.+
* ! ^Subject:\/.+
spam

# extracted text has at least one capital letter and no lower-case letters
:0ED:
* MATCH ?? [A-Z]
* ! MATCH ?? [a-z]
spam


You can modify these to your whim (note that the second recipe is directly reliant on the first, since the first defines the MATCH variable).


I pray that you don't plan on running this perl script on list messages, because you'll find yourself kicked from a lot of lists in a hurry.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>