procmail
[Top] [All Lists]

Re: backreferences for repeated characters?

2001-10-10 00:55:54

Hi Philip.  Thanks for your response.  :)

On Tue, Oct 09, 2001 at 11:42:57PM -0500, Philip Guenther wrote:

    * ^Subject: .*([^!])\1\1\1\1\1[^\1]+$

Back references are not part of the POSIX egrep specification or
'traditional' egrep behavior.  However, GNU grep (which is used by
FreeBSD) supports them not just with grep but also egrep.

I've managed to get this at least running with the following:

        :0 H
        *       ^Subject: .\ \ \ \ \ [^ ]$
        spamtrap

        :0 HcW
        *       ! ^Subject: .*!!!!
        | /usr/bin/grep '^Subject' | /usr/bin/grep -vqE '^Subject: 
.*(.)\1\1\1\1\1[^\1]'

        :0 e:
        spamtrap

The first recipe is there so that the bulk of the matches will be
handled internally to procmail.  The second strips the bangs and uses
backreferences from the external grep.  I have to use a "grep -v" in a
second pipe because of the last rule, whose ":0 e" needs to see a
*failed* action in the recipe before it.

This is obviously cumbersome and wasteful.

Is backreferencing supported by procmail's internal egrep?  Is there
some other way I should be doing this?

Procmail does not support back references.  Whether there's another way
to do what you want depends on how closely the above regexp expressed
what you want.  Did you really want it to apply to any character but
'!', or would it be fine to only apply it to a handful of punctuation?

Yes indeed.  The original idea was basically the first recipe listed
above in order to catch spam with subject lines with unique identifiers
after a length of spaces, but I started noticing spam getting through
because it contained some other "invisible" character (at least to my
MUA and terminal) like 0xA0 (a space with the high bit set).  I had no
luck trying to include that character in a procmail condition, and I
figured there might be other usable characters as well.  But I haven't
figured out how to type high-bit characters into my .procmailrc.  :)

Once in a very long while, I get a message from one of my customers that
tells of an "URGENT!!!!!!" problem, so I decided to leave exlamation
points out of the recipe.  The idea is to match subject lines with five
or more of the same character, because a review of my procmail.log seems
to indicate that ALL of the mail matched by this is actually spam.

Another alternative would be to have multiple rules, one to check for
spaces and one to check for 0xA0, then add another any time I see another
character being used.  But I hate to implement a solution that I *know*
is flawed.  ;-)

What do you think?  Should I quit making noise and just delete the
bloody spam?

-- 
  Paul Chvostek                                             
<paul(_at_)it(_dot_)ca>
  Operations / Development / Abuse / Whatever       vox: +1 416 598-0000
  IT Canada                                            http://www.it.ca/

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>