Re: document status: 3028bis, body, editheader

On Wed, 2006-03-22 at 16:12 -0800, Ned Freed wrote:

I find the description of the awkwardness of "\\" highly amusing
juxtaposed with the requirement of all tests in Sieve[1] to have the
argument :comparator "i;basic;uca=3.1.1;uv=3.2" (and the matching
"require" statement).


One man's meat is another man's poison... There are plenty of scripts that
depend on comparators continuing to work the way they always have.

that doesn't mean we can't find a better way than the above.


Even assuming we can "find a better way" (we haven't done so thus far) for
future scripts to use, the installed base issue remains.

I thought Dave Cridland's suggestion to specify matching behaviour in
the comparator itself was intriguing:

http://permalink.gmane.org/gmane.ietf.mta-filters/2689

unfortunately, [draft-newman-i18n-comparator-08] says «the equality test
MUST be reflexive, symmetric and transitive», so "EQUAL" can't be used.
I must admit I don't quite understand how :matches and :regex work with
comparators, though.


I think of it this way: A comparator has as one of it's components a
normalization operation. Pull that operation out, apply it, and then perform
the glob or regex operation on the result. Note that the output of the
normalization is best seen as a series of nonnegative integers or someting
similar, not octets or characters.

another possibility is to have a capability which adds an action which
changes the default comparator to reduce the verbosity.


It makes scripts a bit shorter, but at the expense of having something with
far-reaching impact specified at the top and not where the impact is felt. I'm
far from convinced this is a good tradeoff.

another possibility is to allow the wildcard comparator, so
that :comparator "*" «[selects] the collation with the broadest scope
(preferably international scope), the most recent table versions and the
greatest number of supported operations.»  (the comparators the server
chooses from would have to be "require"d in advance, I think, although
«require "comparator-*"» is a possibility)


I really don't like this - now you have scripts working in subtlely different 
ways in different places. This is big steps backwards, I think.

to be honest, I find it absurd that such verbiage
is forced upon users.  we need to find a better way.

[1] outside of the US of A, anyway.


This has nothing to do with geography and everything to do with backwards
compatibility. Some of the scripts I'm referring to were written and are 
used
outside the US.

And good luck using i;basic;uca=3.1.1;uv=3.2 to trap specific sequences of
illegal 8bit in headers. Such stuff is rarely if ever in UTF-8, in my
experience at least.

this is impossible today, isn't it?


It is done all the time and the base specification allows it, more or less.

how do you specify the string to compare with?


You just do it. Nothing in the current sieve specification says that it is an
error to specify material that isn't valid utf-8. (This is now prohibited by
the ABNF in the revised base specification, and IMO this restriction needs to
be removed for string constants. It is at a minimum in direct conflict with the
vacation specification.)

in any case, if you want to trap raw non-UTF8 in headers,
you should use i;octet.


Agreed, but in practice scripts often don't specify this.

 but then again:

5.7.  Octet Collation

   The i;octet (Section 9.5) collation is only usable with protocols
   based on octet-strings.  Clients and servers MUST NOT use i;octet
   with other protocols.

which would disqualify the use of i;octet with Sieve, since 3028bis says:

   The language is represented in UTF-8, as specified in [UTF-8].


Represent doesn't mean things are restricted to UTF-8. If it did it would be in
direct conflict with later language in the same document, which quite
specifically allows material in other charsets in strings.

The wording is a bit peculiar (and it needs to be clarified in the revision),
but the intent here was for UTF-8 to always be assumed when it was necessary to
assume a charset. So, in something like a string argument to fileinto, the
argument is to be interpreted in utf-8, never anything else. But in something
like, say, a whole MIME object, you have to be able to specify stuff in other
charsets.

Now, the base specification doesn't come out and say you can use this ability
in a header test. So I suppose it can be argued that it is more or less open as
to how an implementation should behave when this is done, which is why I said
"more or less" previously.

the collation-draft exempts RFC 3028 from this "MUST NOT", but it's not
clear to me that a 3028bis can get the same exemption.


It better or backwards compaatibility goes down the toilet, which is not
acceptable as far as I'm concerned.

notice that
"represented in UTF-8" only means constant strings can't contain raw
octets which are illegal UTF-8 sequences.


I disagree 100% that that is what it means. And since the specification later
talks about putting non-UTF-8 material in constant strings in section 2.4.2.4,
it seems it agrees with me.

which brings us back to the
discussion on character escapes from a year ago.

http://comments.gmane.org/gmane.ietf.mta-filters/2030

I'd like to suggest we implement (2), but with the extension defined in
the base spec.


I have no problem with this and would like to see it happen, but it may or may
not be relevant to the matter at hand. Regardless of whether you write some
sequence that's illegal in UTF-8 directly as a series of octets in a string
constant or indirectly using some sort of encoding, you're still presenting the
sieve interpreter with text that isn't in utf-8 at some level. Any attempt to
enforce some sort of rule that nothing but utf-8 can be present is still going
to fail. And any +Unnnn sort of scheme is inherently incapable of representing
UTF-8 anyway.

The ability to write octets using some sort of encoding is certainly helpful
when the script is presented in an editor or otherwise displayed. And the
intent of the resulting script is much much much clearer. That's why I support
the extension, not because it adds a capability that isn't there now.

note that if the base spec adds the restriction on
extensions that they can't modify other "require" statements, the
analysis of string escapes can be performed statically by a byte
compiler.


Agreed.

                                Ned