[Top] [All Lists]

variables: greedy or non-greedy matching

2003-04-24 05:08:49

I can not find any precedence for greedy/non-greedy matching in the
RFC's (I only checked NNTP, I don't know where else to look).  it
seems to me that previously, we have only been interested in
TRUE/FALSE, _how_ the match was performed can not be observed.  the
variables extension introduces ${<n>} which allows us to observe
matching behaviour, and therefore whether to do greedy/non-greedy
matching must be part of the variables specification.

Jutta points out in private e-mail that greedy matching does not
follow the prinicple of least surprise for ordinary users.  e.g., in

  string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>" 

most users will probably expect ${1} to contain "alice", not
"alice(_at_)foo(_dot_)com>, <bob".  change it into:

  string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>" 
and ${2} is now "bob"...  not intuitive, right?

this issue was raised here earlier, and by analogy with regular
expressions, I claimed the wildcard "*" in :matches should match
greedily.  this allows an implementation to convert a match string to
a regular expression very easily, but is potentially confusing for

non-greedy matching requires implementers to write their own matching
code, or to use a regexp library which supports non-greedy matching.

what do you think?

if we go for non-greedy, Jutta suggested using "**" as a special case
to get greedy matching.  (actually, I think the rule should be that
"*" followed by another wildcard is greedy.)  I'm not sure it is worth
it, after all a site supporting variables will most likely also offer
Kjetil T.

<Prev in Thread] Current Thread [Next in Thread>