ietf-mta-filters
[Top] [All Lists]

Re: variables: greedy or non-greedy matching

2003-04-24 09:39:29


I can not find any precedence for greedy/non-greedy matching in the
RFC's (I only checked NNTP, I don't know where else to look).  it
seems to me that previously, we have only been interested in
TRUE/FALSE, _how_ the match was performed can not be observed.  the
variables extension introduces ${<n>} which allows us to observe
matching behaviour, and therefore whether to do greedy/non-greedy
matching must be part of the variables specification.

Jutta points out in private e-mail that greedy matching does not
follow the prinicple of least surprise for ordinary users.  e.g., in

  string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>" 
"<*(_at_)foo(_dot_)com>*"

most users will probably expect ${1} to contain "alice", not
"alice(_at_)foo(_dot_)com>, <bob".  change it into:

  string :matches "<alice(_at_)foo(_dot_)com>, <bob(_at_)foo(_dot_)com>" 
"*<*(_at_)foo(_dot_)com>*"
                                                    ^
and ${2} is now "bob"...  not intuitive, right?

Um, no. For the very small subset of users sophisticated enough to
write this sort of code themselves, I believe their expectation will
be set by past experience with other matching systems, and these
systems tend to default to greedy matching.

this issue was raised here earlier, and by analogy with regular
expressions, I claimed the wildcard "*" in :matches should match
greedily.  this allows an implementation to convert a match string to
a regular expression very easily, but is potentially confusing for
users.

non-greedy matching requires implementers to write their own matching
code, or to use a regexp library which supports non-greedy matching.

what do you think?

I prefer greedy matching.

if we go for non-greedy, Jutta suggested using "**" as a special case
to get greedy matching.  (actually, I think the rule should be that
"*" followed by another wildcard is greedy.)  I'm not sure it is worth
it, after all a site supporting variables will most likely also offer
regex.

I don't think it is worth it either.

                                Ned