Re: document status: 3028bis, body, editheader

On Mon, 2006-03-20 at 22:59 -0800, Philip Guenther wrote:

I've made one tweak for the next rev to the wording in section 2.7.1
regarding ':match' and the definition of 'character'; I had overlooked
an off-list comment at the turn of the year.  The paragraph now
reads:
   The ":matches" match type specifies a wildcard match using the
   characters "*" and "?"; the entire value must be matched.  "*"
   matches zero or more characters in the value and "?" matches a single
   character in the value, where the comparator that is used (see 2.7.3)
   defines what a character is.  For example, the comparators "i;octet"
   and "en;ascii-casemap" define a character to be a single octet so "?"
   will always match exactly one octet when one of those comparators is
   in use.  In contrast, the comparator "i;basic;uca=3.1.1;uv=3.2"
   defines a character to be any UTF-8 octet sequence encoding one
   Unicode character and thus "?" may match more than one octet.  "?"
   and "*" may be escaped as "\\?" and "\\*" in strings to match against
   themselves.  The first backslash escapes the second backslash;
   together, they escape the "*".  This is awkward, but it is
   commonplace in several programming languages that use globs and
   regular expressions.

I find the description of the awkwardness of "\\" highly amusing
juxtaposed with the requirement of all tests in Sieve[1] to have the
argument :comparator "i;basic;uca=3.1.1;uv=3.2" (and the matching
"require" statement).


One man's meat is another man's poison... There are plenty of scripts that
depend on comparators continuing to work the way they always have.

  to be honest, I find it absurd that such verbiage

is forced upon users.  we need to find a better way.

[1] outside of the US of A, anyway.


This has nothing to do with geography and everything to do with backwards
compatibility. Some of the scripts I'm referring to were written and are used
outside the US.

And good luck using i;basic;uca=3.1.1;uv=3.2 to trap specific sequences of
illegal 8bit in headers. Such stuff is rarely if ever in UTF-8, in my
experience at least.

                                Ned