I can not find any precedence for greedy/non-greedy matching in the
RFC's (I only checked NNTP, I don't know where else to look). it
seems to me that previously, we have only been interested in
TRUE/FALSE, _how_ the match was performed can not be observed. the
variables extension introduces ${<n>} which allows us to observe
matching behaviour, and therefore whether to do greedy/non-greedy
matching must be part of the variables specification.
Jutta points out in private e-mail that greedy matching does not
follow the prinicple of least surprise for ordinary users. e.g., in
string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>"
"<*(_at_)foo(_dot_)com>*"
most users will probably expect ${1} to contain "alice", not
"alice(_at_)foo(_dot_)com>, <bob". change it into:
string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>"
"*<*(_at_)foo(_dot_)com>*"
^
and ${2} is now "bob"... not intuitive, right?
this issue was raised here earlier, and by analogy with regular
expressions, I claimed the wildcard "*" in :matches should match
greedily. this allows an implementation to convert a match string to
a regular expression very easily, but is potentially confusing for
users.
non-greedy matching requires implementers to write their own matching
code, or to use a regexp library which supports non-greedy matching.
what do you think?
if we go for non-greedy, Jutta suggested using "**" as a special case
to get greedy matching. (actually, I think the rule should be that
"*" followed by another wildcard is greedy.) I'm not sure it is worth
it, after all a site supporting variables will most likely also offer
regex.
--
Kjetil T.