procmail
[Top] [All Lists]

Re: Limiting extent

2002-10-19 21:10:04
On Sat, 19 Oct 2002 dman(_at_)nomotek(_dot_)com wrote:

From: fleet(_at_)teachout(_dot_)org

I'm trying to identify a percent sign within a URL (ie, a percent sign
between // and the next / only).  I've been trying

//.*[%[^/]].*/ (in various permutations) with no luck.  I think I'm
close; but I'm obviously missing something.

Yes.  Well, since `.*' means essentially anything at all, then it stands
to reason that `//.*something' need not be one phrasal "word."  That is,

      //someurl.htm and so on and so forth something

will match on "//.*something".  You need to rule out whitespace.
Something like this is likely to work, with a caret, space, and tab
within the brackets:

      //[^    ]*%

I came up with http//.*[^/]%.*/ within 5 seconds of hitting the send
button.  [Is there some sort of corollary to Murphy's Law that says the
solution to any request for assistance mailed to a discussion list
becomes
blatantly obvious as soon as the send key is struck?]


I don't see a need for the trailing /, but if you insist:

      //[^    /]*%[^  ]*/

Because this (in the message body) is perfectly acceptable:

http://www.asu.edu/educ/epsl/CERU/Annual%20reports/EPSL-0209-103-CERU.pd

while this:

http%3A%2F%2Fwww%2Ecoolandquiet%2Ecom%2F

appears to be a spammer trying to hide his/her URL.  (And I need to drop
the :// from my recipe apparently.)


                                - fleet -


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>