Rejo Zenger <subs(_at_)sisterray(_dot_)xs4all(_dot_)nl> writes:
era wrote:
...
match causes a transition from one state to another. The automaton
doesn't have a way to remember whether it's been in the same state
before, so it can't know how many opening parens there have been.
But it doesn't matter that much for what i want to achieve i guess as it
doesn't matter how many opening parens there has been as long as the
first one is closed properly. I have to look for
(foo bar)
and it doesn't matter if it is written as
(foo (bar (baz) era))
as it is a comment anyway because of the first and last parentheses.
Correct?
Uh, which is the 'last' paren? Note that answering involves keeping track
of how many open and close parens have been encountered so far. This
requires a push down automaton and not just a simple finite automaton.
Regexps just aren't powerful enough to express this type of matching.
The best you can do is to pick a number N and say "I'll match parens
upto a depth N of nesting". It's easy to build the regexp for any
given N, you just have to choose it:
ctext = '[^()\
\x80-\xFF]+'
comment0 = "\(($ctext|\\.)*\)"
comment1 = "\(($ctext|\\.|$comment0)*\)"
comment2 = "\(($ctext|\\.|$comment1)*\)"
comment3 = "\(($ctext|\\.|$comment2)*\)"
comment4 = "\(($ctext|\\.|$comment3)*\)"
...
Philip Guenther