Re: status of 3028bis


On Tue, Oct 18, 2005 at 02:06:00PM -0700, Philip Guenther wrote:

In particular, there's some uncertainty
over whether comparators take octets or characters as input and how
i;octet is defined and used in the definition of other comparators.


We had that discussion before and you summarised it very well.  I
agree with it, but want to add reasons:

Summary:

In Sieve, comparators act on characters, not octets.  Looking only at the
base specification, "i;octet" is badly named and means: "Compare unicode
character by unicode character".  At least I think that's what we agreed
on and why the text said UTF-8 before, and unicode now.  The change from
UTF-8 to unicode was made, because although Sieve is written in UTF-8,
implementations should not be forced to work in UTF-8 inside, but use
whatever unicode encoding they like.

Reason:

Indeed, we lack the other two levels: There is no way to work on raw
headers, but at least we agreed that if we want to do that, we would
invent a new test like "rawheader".  And there is no way to work on
the MIME decoded data without character set conversion to unicode.
We have no access to the MIME character set specification, either.

If "i;octet" were to match arbitrary binary data, thus omitting the
character set translation, it would be mostly useless, as there are
ISO-8859-15 strings that are invalid UTF-8 strings and Sieve scripts
are expressed in UTF-8.  Sieve offers no binary data literals.  It is
bad enough that the decoded header data may contain NUL characters and
Sieve can not express those (\0 does not have a special meaning), but at
least we can match any unicode character else.  Since headers contain
only characters of a specific character set, and rarely just random
binary data, it makes sense "i;octet" does indeed work on decoded and
translated characters, despite its name.

If it were for me, I would: Introduce a new comparator instead,
e.g. "i;character", and change "i;octet" to match against a string
representation of binary data, e.g. hex characters.  And introduce
\0 to mean NUL at the same time. :-)

But Sieve is widespread by now and that would break many scripts.
Not even I dare to suggest going that way.  That's why I suggest:

Let "i;octet" work on characters and document the name is a misnomer.

IF someone really feels the need to specify and implement them: Introduce
Sieve extensions for a "i;binary" comparator that compares against a
string representation of binary data.  Introduce new tests "rawheader"
and "decodedheader".

I am not aware how "i;octet" is used at other places, so the above may
only fit well to Sieve.

Michael

<Prev in Thread]	Current Thread	[Next in Thread>
status of 3028bis, Philip Guenther Re: status of 3028bis, Michael Haardt <= Re: status of 3028bis, Alexey Melnikov Re: status of 3028bis, Michael Haardt Re: status of 3028bis, Dave Cridland Re: status of 3028bis, Ned Freed Re: status of 3028bis, Michael Haardt Re: status of 3028bis, Kjetil Torgrim Homme Re: status of 3028bis, Michael Haardt Re: status of 3028bis, Dave Cridland Re: status of 3028bis, Ned Freed Re: status of 3028bis, Dave Cridland