procmail
[Top] [All Lists]

Re: regex syntax question

2004-03-02 07:48:50
On Tue, 2 Mar 2004, Ruud H.G. van Tol wrote:

There is, but often it is best to nest the body checks in a
discriminating header check, beacuse body checks are most
expensive.

There are so many domain names in these messages its not funny.  You can 
have all the crazy code you want, but it still comes down to 
href="somedomain.tld" somewhere in the code, it has to or it won't work.  
Most of the "click here to remove" are legitimate links.  Also links to 
external images and pages are also usually valid links.  The latest way 
that I've seen it done is by completely obfuscating the link by encoding it as 
"http:&#47&#47..." which translates to 
"http://...";  I'm still working on the search for that, though it only 
looks like I need to escape the "pound" (#) signs; \#, which I've done and 
haven't seen any email like that since yesterday.

I certainly need to write a couple of scripts for adding rules to the 
file.





Header check first:

  :0
  * ^some-header-test
  * B ?? some-body-test
  some-action

too many rules for this to be efficient.  I need to be less 
discriminitory.



But if you want to find a word throughout the whole message:

  :0
  * HB ?? \<some-word\>
  some-action

or (not necessarily less expensive)

  :0
  * 9876543210^0  ^Subject:.*\<some-word\>
  * 9876543210^0  B ?? \<some-word\>
  some-action


Hell, if there were a way for me to just put a list of domain names in a
file and have procmail read a list in from that.

There are many ways to do that. Jan Ehrhardt runs a nice project:
http://www.xs4all.nl/~monitor/rblhost.rc.txt
http://www.xs4all.nl/~monitor/rblqp.rc.txt
with results on
http://cgi.monitor.nl/rblhosts.html
http://cgi.monitor.nl/rblhosts.php3  (warning: big!)
http://cgi.monitor.nl/popstats.html


I'll look these up.  I've had to turn off rbl checking in most things as 
one of the rbls started reporting everything as bad.



I have
hundreds of domain names that spamassassin is just not catching.

There are domain-names anywhere in a message: in the Received headers,
in email addresses, in URLs, etc. etc. Which ones do you mean?
These domain-names are often forged, or deeply encoded, so you
will have to go through a couple of hoops before you can match
them to any rules.

I'm getting messages loaded with random garbage with external links.  the 
links don't vary much from message to message.  I've been feeding these 
messages to sa-learn, but, I'm not having much luck getting them stopped.  
I fed close to 100 messages like these so sa-learn yesterday.  

From one message:
Check out our Canadian Generic Pharmacy below:
http://ssole2.com/gp/default.asp?id=gm03
(No prior prescription required)

From another:

Ce.|3brex, Fi0ri'c3t, T'ram(_at_)do|, U|tr(_at_)`m, L3v|'tra, Pr0p3.cia, 
A:cyc|0vir,
Pr0z:@c, P(_at_)x:il, Bu:sp(_at_)r

Most trusted name brands.

Because you can add more to your life. Shop Now.  
http://www.majesticdrugs.biz.

from yet another:

<P><FONT SIZE=2>Shipped worldwide.<BR><BR>Your easy-to-use solution is 
here: <A
HREF="http://www.medz4cheap.com/cia/?nights";>http://www.medz4cheap.com/cia/?nigh
ts</A></FONT>

And the killer:

style="font-size: 1;">x</font>op 3 at the lo<font style="font-size:
1;">v</font>west pr<font style="font-size: 1;">n</font>ices any<font
style="font-size: 1;">w</font>where.<BR>
 <A href="http://ffr3ws.com/pc/";>Low man<font style="font-size:
1;">r</font>ufac<font style="font-size: 1;">y</font>turer direct p<font
style="font-size: 1;">i</font>ric<font style="font-size:

In this last one, the link is there.  Essentially, all I really need to do 
is look for intact domain names in a message and send them to the sandbox.

:0 B:
* ffr[0-9]ws\.(com|net|biz|org|info|name)
junkmail








 More
often than not, I make mistakes adding entries to the procmailrc file
and a lot of people won't get mail.  I'm having a hard time making out
how to do that from the documentation.

Sandbox!

junkmail=sandbox.  on a systemwide basis its /tmp/junkmail

Curtis




-- 
--
Curtis Maurand
mailto:curtis(_at_)maurand(_dot_)com
http://www.maurand.com



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>