procmail
[Top] [All Lists]

match-on-match

2003-10-22 04:06:56
This is a recipe I just wrote for somebody
that wants to isolate a webhost from a
long URL that was QP-folded over many lines:

  :0B
  * http://s?rd\.yahoo\.com(.*=$)*.*\?\/(.*=$)*.*
  { WebHost = "$MATCH"
    :0
    * WebHost ?? =$\/.*
    { WH_end = "$MATCH" }
    :0A
    * WebHost ?? ^^\/[^=]*
    { WebHost = "$MATCH$WH_end" }
    :0
    * WebHost ?? ^^http://\/[^/'"> ?]+
    { WebHost = "$MATCH" }
  }

This person is focussed on finding the locations
of webhosts mentioned in messages:
http://cgi.monitor.nl/rblhosts.html
http://cgi.monitor.nl/rblhosts.php3  (warning: big!)

With the DNS-entries of rbl.cluecentral.net you can find in wich part
of the world that host is. Many .coms are in .cn.


Something I should send to the development-list:

Wouldn't it be great if you would be able to match-on,
where MATCH is still the value of the last \/, but
where implicit variables called MATCH1, MATCH2 etc.
are numbered by condition.
Implicit concatenation of the variables at the left
of the ?? (a bit like HB) would also be mighty handy.

  :0B
  * http://s?rd\.yahoo\.com(.*=$)*.*\?\/(.*=$)*.*   (1:
http://www.exa=<NL>mple.com">test</a>)
  * MATCH1 ?? =$\/.*                       (2: mple.com">test</a>)
  * MATCH1 ?? ^^\/[^=]*                    (3: http://www.exa)
  * MATCH3 MATCH2 ?? ^^http://\/[^/'"> ?]+ (4: http://www.example.com
  { WebHost = "$MATCH" }

-- 
Ruud


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • match-on-match, Ruud H.G. van Tol <=