procmail
[Top] [All Lists]

Re: A few rule questions

2003-12-14 14:10:58
On Sun, 14 Dec 2003 06:25:49 -0800
PSE-L(_at_)mail(_dot_)professional(_dot_)org (Professional Software 
Engineering) wrote:

One that would *really* help is a rule that I am trying to adapt from my
Mailfilter rules. It is supposed to catch any mail which is addressed to more
than 2 .sympatico.ca addresses:

^(To|Cc):(.*sympatico\.ca){3}

The {n,m} regexp extension is not supported by procmail.  Roll the regexp 
out manually:

* ^(To|Cc):.*sympatico\.ca.*sympatico\.ca.*sympatico\.ca

A problem you'll have though is that this won't match the _total_ number of 
recipients between the two headers combined, but will expect a match of 
three or more in EITHER header alone (2 in one, and 1 in the other won't 
work - not even with the {n,m} egrep form).

But I *could* make them into seperate recipes, one for To, one for Cc, though
this would not be as elegant as below, of course, no?

Or, extract the recipients into variables:

:0
* ^To:\/.*
{
         RECEIPS=$MATCH
}

:0
* ^Cc:\/.*
{
         RECEIPS=$RECEIPS$MATCH
}

Now, you have one variable with the cleartext recipients in it, *AND* 
because it's in a variable, it'll regexp somewhat differently than one 
anchored to a specific header:

:0
* -2^0
* 1^1 ^(To|Cc):.*sympatico\.ca

doesn't eval the same as:

:0
* -2^0
* 1^1 RECIEPS ?? sympatico\.ca

I'd look at using the latter.  That odd looking numeric form is documented 
in 'man procmailsc'
I'm sure you're aware there are issues with users who have their address in 
their nametext.

That's a beaut, thanks. I *never* get legit mail which is addressed/cc'd to more
than 2 people in the Sympatico domain. In fact, I can't remember the last time I
got a legit mail which was addressed/cc'd to *only* 2 people in the Sympatico
domain. 99% of my mail is from lists or people who run their own mailservers
(ie. not newbs like me).

Another is this one:

^From: <?[^[:digit:] \"]+[[:digit:]]+[^[[:digit:] \"]*@

which is intended to catch mail addresses with a bunch of numbers in them.

 From my own rcfiles:

# If the From contains an 8-digit numeric-only address, ditch it as spam
# (this seems to be a new popular spammage technique - an 8-digit random
# number).
:0
* ^From:.*[     <]*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]+@


Finally, I have been getting a lot of this lately (body):

Cash in On Atm Traffic
Make $Money$ 24-7-365

where the subject is just a single random letter.

You'll likely find that checking for _OTHER_ characteristics common in 
spams will prove more reliable.  From bogus dates, HTML-only,

Bogus dates! Brilliant! If a piece of mail takes 3 days to get to me, it
probably ain't worth reading anyway, right? Love it.

I would imaging it would be simplest to filter on the subject, snagging 
all mail which has as it's subject only a single letter,

If you wanted to.

:0
* ^Subject:[    ]*[^    ][      ]*$

That matches whatever number of leading or trailing spaces/tabs and just 
ONE non space/tab character.

...and of course as soon as I implement this rule, this particular piece of spam
will die out...heh. So far I'm getting one a day though.

but I would also be
interested in how to catch it by body as well...I believe this would be
something like:

Assuming that it isn't encoded in BASE64, random quoted-printable, 
ordinalized HTML, or HTML with random sequences of comments or bogus 
HTML-like tag constructs.  IOW, when checking for spam, searching the body 
for text isn't generally a good approach - checking it for oddities such as 
gobs of HTML comments is a different matter.

Good point. That explains why even though I have some rules that check for
"viagra" in the body (a lot simpler, you would think), they still come through.

:0 B
*atm\.traffic

but I tried that and it did not work.

Well, the text was "atm traffic", but you're trying to match on atm.traffic 
(literally, since you've escaped the dot).

I think, based on your advice, I'll leave the body checks out :-)

Thanks very much for the tips! After reading the disclaimer linked in your sig,
I really appreciate the help. This regular expressions thing could occupy a
lifetime of learning, at least for me. Main point is, if I can just keep my use
of the delete key down to once or twice a day, I'll consider it a victory!

Happy Holidays!

BTW, I had a good chuckle over the Red Hat comments in the disclaimer page,
though I'm hesitant to ask what you think of Mandrake...;-)

-- 
JoeHill ++ ICQ # 280779813
Registered Linux user #282046
Homepage: www.orderinchaos.org
+++++++++++++++++++++++++++
"In this possibly terminal phase of human existence, democracy and freedom are
more than just ideals to be valued - they may be essential to survival...."
--Noam Chomsky

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>