At 08:21 2003-12-14 -0500, JoeHill wrote:
One that would *really* help is a rule that I am trying to adapt from my
Mailfilter rules. It is supposed to catch any mail which is addressed to more
than 2 .sympatico.ca addresses:
^(To|Cc):(.*sympatico\.ca){3}
The {n,m} regexp extension is not supported by procmail. Roll the regexp
out manually:
* ^(To|Cc):.*sympatico\.ca.*sympatico\.ca.*sympatico\.ca
A problem you'll have though is that this won't match the _total_ number of
recipients between the two headers combined, but will expect a match of
three or more in EITHER header alone (2 in one, and 1 in the other won't
work - not even with the {n,m} egrep form).
Or, extract the recipients into variables:
:0
* ^To:\/.*
{
RECEIPS=$MATCH
}
:0
* ^Cc:\/.*
{
RECEIPS=$RECEIPS$MATCH
}
Now, you have one variable with the cleartext recipients in it, *AND*
because it's in a variable, it'll regexp somewhat differently than one
anchored to a specific header:
:0
* -2^0
* 1^1 ^(To|Cc):.*sympatico\.ca
doesn't eval the same as:
:0
* -2^0
* 1^1 RECIEPS ?? sympatico\.ca
I'd look at using the latter. That odd looking numeric form is documented
in 'man procmailsc'
I'm sure you're aware there are issues with users who have their address in
their nametext.
Another is this one:
^From: <?[^[:digit:] \"]+[[:digit:]]+[^[[:digit:] \"]*@
which is intended to catch mail addresses with a bunch of numbers in them.
From my own rcfiles:
# If the From contains an 8-digit numeric-only address, ditch it as spam
# (this seems to be a new popular spammage technique - an 8-digit random
# number).
:0
* ^From:.*[ <]*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]+@
Finally, I have been getting a lot of this lately (body):
Cash in On Atm Traffic
Make $Money$ 24-7-365
where the subject is just a single random letter.
You'll likely find that checking for _OTHER_ characteristics common in
spams will prove more reliable. From bogus dates, HTML-only,
I would imaging it would be simplest to filter on the subject, snagging
all mail which has as it's subject only a single letter,
If you wanted to.
:0
* ^Subject:[ ]*[^ ][ ]*$
That matches whatever number of leading or trailing spaces/tabs and just
ONE non space/tab character.
but I would also be
interested in how to catch it by body as well...I believe this would be
something like:
Assuming that it isn't encoded in BASE64, random quoted-printable,
ordinalized HTML, or HTML with random sequences of comments or bogus
HTML-like tag constructs. IOW, when checking for spam, searching the body
for text isn't generally a good approach - checking it for oddities such as
gobs of HTML comments is a different matter.
:0 B
*atm\.traffic
but I tried that and it did not work.
Well, the text was "atm traffic", but you're trying to match on atm.traffic
(literally, since you've escaped the dot).
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail