procmail
[Top] [All Lists]

Re: Filtering URL In Message Body

2000-12-20 22:52:52
Eric Hilding <eric(_at_)hilding(_dot_)com> writes:
At 03:14 AM 12/17/00 -0600, Philip Guenther wrote:
...
2.  How would I code it to also filter on specific URL's which contain ANY
number(s) ???

_Any_ numbers?  What about "http://www.3com.com/";?  Perhaps you _all_
numbers:

Gee...you are right...I didn't think about this.  Most of the problems involve
2 (or more) digits together *somewhere* between the "http://"; & the end
(whether it be a .com or a .cn or a .com.cn URL.  At this point, I'd just like
to thrash ANYTHING like http://www.163.com  http://www.21.com  or even
http://something2786.com  !!!  Can the below be tweaked to do this simply?

First you need to figure out an algorithmic way to differentiate okay
addresses from unwanted addreses.  For example, excluding addresses
that contain at least one domain component of just digits (e.g.,
www.163.com) isn't too hard:

        http://([-a-z0-9]+\.)*[0-9]+[^-a-z.]

Again, that'll exclude pure IP numbers (http://138.236.128.18/), so
perhaps that's too tight.

The key point is that you have to come up with an expressible
criterion:  "at least one domain component that is just digits or that
ends with at least two digits".  Once that is done, it's 'just' a
matter of teaching procmail the criterion.


Philip Guenther
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>