Rejo Zenger <subs(_at_)sisterray(_dot_)xs4all(_dot_)nl> writes:
++ 27/11/99 21:06 -0600 - Philip Guenther:
Uh, which is the 'last' paren?  Note that answering involves keeping track
of how many open and close parens have been encountered so far.
No, i was in the understanding one doesn't need to know (which probably
makes the check no longer completely correct). I was thinking that a
regexp (simpliefied, possible some characters need to be excluded) like
this:
  (.*)
as the opening and the closing parens already say it's a comment.
However, i see another problem here, as it will catch the first closing
parenthes:
  (foo (bar (baz) era))
  ^             ^
instead of
  (foo (bar (baz) era))
  ^                   ^
I think that the .* regexp is a "greedy" match, which will match
as many characters as possible. So (.*) will match the whole
of "(foo (bar (baz) era))". But it will also match the whole
of "(comment) not comment (nother comment)".
This requires a push down automaton and not just a simple finite
automaton. Regexps just aren't powerful enough to express this type of
matching. The best you can do is to pick a number N and say "I'll match
parens upto a depth N of nesting".
That's a good idea. Too bad one cannot use the INCLUDERC to do this
recursivly (is that what they call it? - english is not my native
language).
If you just want to delete comments, then you can do it recursively
with INCLUDERC. Use ([^()]*) to match a non-nested comment and delete it.
If there was a match, use INCLUDERC to delete the next comment.
Processing (foo (bar (baz) era)) should go:
(foo (bar (baz) era)) -> (foo (bar era)) -> (foo ) -> empty
Details left as an exercise for someone more familiar with procmail :-)
                        Martin
Martin(_dot_)Ward(_at_)durham(_dot_)ac(_dot_)uk http://www.dur.ac.uk/~dcs0mpw/ 
Erdos number: 4
Maintainer of the G.K.Chesterton web site: http://www.dur.ac.uk/~dcs0mpw/gkc/
Shortcuts: http://i.am/mw and http://i.am/gkc -- try them!
Vote against spam: http://www.politik-digital.de/spam/en/