Re: Proposal for escaping on non-UTF-8 sequences in Sieve


On Fri, 2006-10-06 at 15:27 +0100, Alexey Melnikov wrote:

Kjetil Torgrim Homme wrote:

in other words, we can do whatever we like with ${keyword:data}.  I
prefer an extensible syntax over a compact one (Alexey's $%xx
suggestion), so my vote is for ${hex:7e}.  please see suggested patch
below.


I would prefer if we pick a more unique prefix. Something starting with 
'$' but not followed by '{' would be great. However if others feel 
strongly in favor of your variant, that would be fine too. Apart from 
that your proposal is fine with me.


glad to hear that.  yes, the resemblance to variables syntax is a mixed
blessing.

pro: it's easier to recognise magic sequences in the string.
con: it's easier to mix up the two syntaxes.

I want to note that a syntax mix-up can be flagged at upload time, so I
don't think it's a big problem in practice.

-   As message header data is converted to [UTF-8] for comparison (see
-   section 2.7.2), most strings will use the UTF-8 encoding.  However,
-   implementations MUST accept all strings that match the grammar in
-   section 8.  The ability to use non-UTF-8 encoded strings matches
-   existing practice and has proven to be useful both in tests for
-   invalid data and in arguments containing raw MIME parts for extension
-   actions that generate outgoing messages.
+   The extension "quoted-character" may be used to encode arbitrary
+   characters as a sequence of US-ASCII characters (see 2.4.2.4 for
+   details).

   For entering larger amounts of text, such as an email message, a
   multi-line form is allowed.  It starts with the keyword "text:",


I am against this change, as it doesn't agree with the rough consensus 
in the group, which is to try keep existing implementations compliant.


this was a bit lazy editting on my part.  I made the argument for the
reinstating of status quo in a different thread, so please ignore the
removal here.

+   quoted-arb-octets   = "${hex:" hex-pair-seq "}"
+   hex-pair-seq        = hex-pair *(WSP hex-pair)
+   hex-pair            = 1*2HEXDIG

Did you really want to allow for
${hex: 7 8 9}
?


not sure what you're pointing at here ...

a) yes, it may be a good idea to add leading and trailing WSP* in
quoted-arb-octets to allow arbitrary extra whitespace.

b) yes, it may be more readable to specify hex-pair as 2HEXDIG.  I made
it 1*2HEXDIG since the Unicode is 1*5HEXDIG, so I suggest we change the
latter to 2*5 if the former is 2HEXDIG.

so I'm game whatever you say :-)
-- 
Kjetil T.