On Sun, 2005-10-23 at 23:29 +0100, Dave Cridland wrote:
On Sun Oct 23 22:02:38 2005, Ned Freed wrote:
Assuming:
(1) An octet-based comparator.
(2) A single ? used in isolation with no adjacent *s or ?s.
(3) Well formed UTF-8 as input.
The somewhat surprising result is that ? can only match an ASCII
character. Of
course something like ???? can get really interesting and match
anything that
encodes down to four octets.
I think you intended to say that "?" can only match a character if it
is within ASCII - or more generally, if it happens to encode to a
single octet in UTF-8. But it'll match any octet, of course, whatever
character it might happen to be part of the encoding for.
yes, but since the argument to :matches has implicit anchors, another
wildcard needs to follow the ?. e.g., with "foo?", only US-ASCII can be
matched, since all UTF-8 sequences are multi-octet.
A construct like:
require "variables";
if header :matches "subject" "*" {set "subject" "${1}"}
else {set "subject" ""}
ends up storing the subject in all caps, which likely isn't what
was intended.
I think that's a matter of interpretation.
Variables says, in section 3.2, that the list variables expand to
what the wildcard matched.
I see nothing saying that this must be in the internal transformation
of the string by a comparator (if such a thing exists), nor that it
should be those matching portions of the original string, but my gut
feeling is that a comparator should be essentially a black box - that
is, the internal transformations of the comparator shouldn't be
visible to the script.
yes, the behaviour of wildcard matching is under-specified (you might
say "unspecified"), and trying to extrapolate it from the matching
algorithm is probably wrong, especially since no one will want that
behaviour.
--
Kjetil T.