Re: non-UTF-8 sequences in Sieve scripts

[3028bis-09: 2.1. Form of the Language (§2)]:
|   With the exceptions of strings and comments, the language is limited
|   to US-ASCII characters.  Strings and comments may contain octets
|   outside the US-ASCII range.  Specifically, they will normally be in
|   UTF-8, as specified in [UTF-8].  NUL (US-ASCII 0) is never permitted
|   in scripts, while CR and LF can only appear as the CRLF line ending.

[my suggestion]:
|   With the exceptions of strings and comments, the language is limited
|   to US-ASCII characters.  Strings and comments are encoded in
|   UTF-8, as specified in [UTF-8].  NUL (US-ASCII 0) is never permitted
|   in scripts, while CR and LF can only appear as the CRLF line ending.


That asks for no more than RFC 3028, it is just more detailed.  IMHO,
whoever thinks RFC 3028 allows non-UTF-8 should still think the same.
Sounds good to me.

[3028bis-09: 2.4.2. Strings (§6)]:
|   As message header data is converted to [UTF-8] for comparison (see
|   section 2.7.2), most strings will use the UTF-8 encoding.  However,
|   implementations MUST accept all strings that match the grammar in
|   section 8.  The ability to use non-UTF-8 encoded strings matches
|   existing practice and has proven to be useful both in tests for
|   invalid data and in arguments containing raw MIME parts for extension
|   actions that generate outgoing messages.

[my suggestion]:
|   [strike paragraph 6 in its entirety]


I think it is important to keep "Message header data is converted to
[UTF-8] for comparison".  Of course implementations must obey the grammar,
no matter how it looks like, so that part seems unneeded.

I thought -09 said that on headers, but I was wrong.  But there's something
else, while I am reading 2.4.2.2 (just nitpicking):

s/synactically/syntactically/

   Headers are a subset of strings.  In the Internet Message
   Specification [IMAIL], each header line is allowed to have whitespace
   nearly anywhere in the line, including after the field name and
   before the subsequent colon.  Extra spaces between the header name
   and the ":" in a header field are ignored.

If you think it's really just nitpicking, just forget about my
suggestion:

   Header field names are a subset of strings.  The obsolete header
   field syntax from [IMAIL], section 4.5, must be implemented,
   matching a header with white space between the field name and the
   subsequent colon.

But: Can a Sieve header contain trailing white space that is being
ignored, too? Like "from   "? 

Sorry I bring this up so late, never thought about it before.

I think the rough consensus on this issue around Montreal IETF was not 
to break existing implementations (which accept arbitrary octets).
I believe the existing -09 text represents this rough consensus.


That is correct, there was agreement not to word things that those
implementations DO violate the standard, but also not to word things in
a way that encourages to repeat their behaviour.

Personally, I think I would agree with your proposal 6 months after 
3028bis is published, when we try to move 3028bis RFC to Draft status 
and if your octet encoding extension is deployed.


Sounds good to me.  Once there is a draft on that extension, I will
certainly implement it in Exim.

Michael