Re: A few rule questions

At 08:21 2003-12-14 -0500, JoeHill wrote:

One that would *really* help is a rule that I am trying to adapt from my
Mailfilter rules. It is supposed to catch any mail which is addressed to more
than 2 .sympatico.ca addresses:

^(To|Cc):(.*sympatico\.ca){3}

The {n,m} regexp extension is not supported by procmail. Roll the regexpout manually:


* ^(To|Cc):.*sympatico\.ca.*sympatico\.ca.*sympatico\.ca

A problem you'll have though is that this won't match the _total_ number ofrecipients between the two headers combined, but will expect a match ofthree or more in EITHER header alone (2 in one, and 1 in the other won'twork - not even with the {n,m} egrep form).


Or, extract the recipients into variables:

:0
* ^To:\/.*
{
        RECEIPS=$MATCH
}

:0
* ^Cc:\/.*
{
        RECEIPS=$RECEIPS$MATCH
}

Now, you have one variable with the cleartext recipients in it, *AND*because it's in a variable, it'll regexp somewhat differently than oneanchored to a specific header:


:0
* -2^0
* 1^1 ^(To|Cc):.*sympatico\.ca

doesn't eval the same as:

:0
* -2^0
* 1^1 RECIEPS ?? sympatico\.ca

I'd look at using the latter. That odd looking numeric form is documentedin 'man procmailsc'

I'm sure you're aware there are issues with users who have their address intheir nametext.

Another is this one:

^From: <?[^[:digit:] \"]+[[:digit:]]+[^[[:digit:] \"]*@

which is intended to catch mail addresses with a bunch of numbers in them.


From my own rcfiles:

# If the From contains an 8-digit numeric-only address, ditch it as spam
# (this seems to be a new popular spammage technique - an 8-digit random
# number).
:0
* ^From:.*[     <]*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]+@

Finally, I have been getting a lot of this lately (body):

Cash in On Atm Traffic
Make $Money$ 24-7-365

where the subject is just a single random letter.

You'll likely find that checking for _OTHER_ characteristics common inspams will prove more reliable. From bogus dates, HTML-only,

I would imaging it would be simplest to filter on the subject, snaggingall mail which has as it's subject only a single letter,


If you wanted to.

:0
* ^Subject:[    ]*[^    ][      ]*$

That matches whatever number of leading or trailing spaces/tabs and justONE non space/tab character.

 but I would also be
interested in how to catch it by body as well...I believe this would be
something like:

Assuming that it isn't encoded in BASE64, random quoted-printable,ordinalized HTML, or HTML with random sequences of comments or bogusHTML-like tag constructs. IOW, when checking for spam, searching the bodyfor text isn't generally a good approach - checking it for oddities such asgobs of HTML comments is a different matter.

:0 B
*atm\.traffic

but I tried that and it did not work.

Well, the text was "atm traffic", but you're trying to match on atm.traffic(literally, since you've escaped the dot).


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail