Re: draft-ietf-sieve-3028bis-08.txt


On Tue, 2006-07-25 at 07:46 -0700, Philip Guenther wrote:

1) in section 2.1, replace
----
   The language is represented in UTF-8, as specified in [UTF-8].

   Tokens in the US-ASCII range are considered case-insensitive.
----
   with 
----
   With the exceptions of strings and comments, the language is limited
   to US-ASCII characters.  Strings and comments may contain octets
   outside the US-ASCII range.  Specifically, they will normally be in
   UTF-8, as specified in [UTF-8].  NUL (US-ASCII 0) is never permitted
   in scripts, while CR and LF can only appear as the CRLF line ending.

   Tokens other than strings are considered case-insensitive.
----


"normally" isn't good if you can't know if your script is normal or not.
I suggest you rephrase it as "Strings and comments may sometimes have a
different encoding than UTF-8, so for consistent behaviour across
implementations, it is recommended to avoid non US-ASCII".  (yes, tongue
is firmly in cheek.)

speaking of CRLF, I'd like a clarification of multi-line strings in
section 2.4.2 (some of its text duplicates the above, I'm not sure
that's good).  something like:

   Any CRLF before the final period are considered part of the string.

to make it a little more clear that implementations should NOT change
the CRLF into its local line delimiter sequence.

2) in section 2.4.2 ("Strings"), add the following paragraph:
----
   As messages header data is converted to [UTF-8] for comparison (see
   section 2.7.2), most strings will use the UTF-8 encoding.  However,
   implementations MUST accept all strings that match the grammar in
   section 8.  The ability to use non-UTF-8 encoded strings matches
   existing practice and has proven to be useful both in tests for
   invalid data and in arguments containing raw MIME parts for extension
   actions that generate outgoing messages.
----
   That appears directly after the paragraph that starts "Non-printing
   characters..."

Any comments on the above or the full text or do people feel this
is ready to be submitted to the IESG?


I don't like this at all.  keep it simple, force the scripts to be
encoded in UTF-8, it saves us a lot of grief and edge cases.  to be able
to express arbitrary octets, add an extension for \x -- I think someone
volunteered to write text?  if not, I'll be happy to.  note that the
contents of a string during execution is potentially arbitrary octets
(even NUL, as made clear in 2.7.2).

I'm afraid I didn't save the Jabber log from IETF-66, did anyone else?
it would be useful to post it here.
-- 
Kjetil T.