Re: non-UTF-8 sequences in Sieve scripts


Kjetil Torgrim Homme wrote:

okay, fine, we don't have to make 3028bis more explicit than 3028 about
disallowing arbitrary octets.  but we should definitely not add explicit
text which allows it either!  let's stick to the original wording, but
keep some of the clarifications from the draft:

[3028bis-09: 2.1. Form of the Language (§2)]:
|   With the exceptions of strings and comments, the language is limited
|   to US-ASCII characters.  Strings and comments may contain octets
|   outside the US-ASCII range.  Specifically, they will normally be in
|   UTF-8, as specified in [UTF-8].  NUL (US-ASCII 0) is never permitted
|   in scripts, while CR and LF can only appear as the CRLF line ending.

[my suggestion]:
|   With the exceptions of strings and comments, the language is limited
|   to US-ASCII characters.  Strings and comments are encoded in
|   UTF-8, as specified in [UTF-8].  NUL (US-ASCII 0) is never permitted
|   in scripts, while CR and LF can only appear as the CRLF line ending.

[3028bis-09: 2.4.2. Strings (§6)]:
|   As message header data is converted to [UTF-8] for comparison (see
|   section 2.7.2), most strings will use the UTF-8 encoding.  However,
|   implementations MUST accept all strings that match the grammar in
|   section 8.  The ability to use non-UTF-8 encoded strings matches
|   existing practice and has proven to be useful both in tests for
|   invalid data and in arguments containing raw MIME parts for extension
|   actions that generate outgoing messages.

[my suggestion]:
|   [strike paragraph 6 in its entirety]

the text from 8.1 is unchanged in 3028bis-09, so this is all we need to
maintain status quo.  we'll also keep the accurate UTF-8 definitions of
characters out of the ABNF, but may decide to change that later, in the
Standard revision of the document.

I hope this proposal is agreeable to all concerned.

I think the rough consensus on this issue around Montreal IETF was notto break existing implementations (which accept arbitrary octets).

I believe the existing -09 text represents this rough consensus.

If you disagree that this is the case, I would let you (as a chair) tohave one more round of discussion of this issue on the mailing list. Thediscussion must end by the Sieve meeting at San Diego IETF (November 6th).

Personally, I think I would agree with your proposal 6 months after3028bis is published, when we try to move 3028bis RFC to Draft statusand if your octet encoding extension is deployed.