procmail
[Top] [All Lists]

Re: A few rule questions

2003-12-14 07:34:42
At 08:21 2003-12-14 -0500, JoeHill wrote:

One that would *really* help is a rule that I am trying to adapt from my
Mailfilter rules. It is supposed to catch any mail which is addressed to more
than 2 .sympatico.ca addresses:

^(To|Cc):(.*sympatico\.ca){3}

The {n,m} regexp extension is not supported by procmail. Roll the regexp out manually:

* ^(To|Cc):.*sympatico\.ca.*sympatico\.ca.*sympatico\.ca

A problem you'll have though is that this won't match the _total_ number of recipients between the two headers combined, but will expect a match of three or more in EITHER header alone (2 in one, and 1 in the other won't work - not even with the {n,m} egrep form).

Or, extract the recipients into variables:

:0
* ^To:\/.*
{
        RECEIPS=$MATCH
}

:0
* ^Cc:\/.*
{
        RECEIPS=$RECEIPS$MATCH
}

Now, you have one variable with the cleartext recipients in it, *AND* because it's in a variable, it'll regexp somewhat differently than one anchored to a specific header:

:0
* -2^0
* 1^1 ^(To|Cc):.*sympatico\.ca

doesn't eval the same as:

:0
* -2^0
* 1^1 RECIEPS ?? sympatico\.ca

I'd look at using the latter. That odd looking numeric form is documented in 'man procmailsc'

I'm sure you're aware there are issues with users who have their address in their nametext.

Another is this one:

^From: <?[^[:digit:] \"]+[[:digit:]]+[^[[:digit:] \"]*@

which is intended to catch mail addresses with a bunch of numbers in them.

From my own rcfiles:

# If the From contains an 8-digit numeric-only address, ditch it as spam
# (this seems to be a new popular spammage technique - an 8-digit random
# number).
:0
* ^From:.*[     <]*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]+@


Finally, I have been getting a lot of this lately (body):

Cash in On Atm Traffic
Make $Money$ 24-7-365

where the subject is just a single random letter.

You'll likely find that checking for _OTHER_ characteristics common in spams will prove more reliable. From bogus dates, HTML-only,

I would imaging it would be simplest to filter on the subject, snagging all mail which has as it's subject only a single letter,

If you wanted to.

:0
* ^Subject:[    ]*[^    ][      ]*$

That matches whatever number of leading or trailing spaces/tabs and just ONE non space/tab character.

 but I would also be
interested in how to catch it by body as well...I believe this would be
something like:

Assuming that it isn't encoded in BASE64, random quoted-printable, ordinalized HTML, or HTML with random sequences of comments or bogus HTML-like tag constructs. IOW, when checking for spam, searching the body for text isn't generally a good approach - checking it for oddities such as gobs of HTML comments is a different matter.

:0 B
*atm\.traffic

but I tried that and it did not work.

Well, the text was "atm traffic", but you're trying to match on atm.traffic (literally, since you've escaped the dot).

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>