Re: Proposal for escaping on non-UTF-8 sequences in Sieve


Kjetil Torgrim Homme wrote:

On Fri, 2006-10-06 at 15:27 +0100, Alexey Melnikov wrote:
Kjetil Torgrim Homme wrote:
in other words, we can do whatever we like with ${keyword:data}.  I
prefer an extensible syntax over a compact one (Alexey's $%xx
suggestion), so my vote is for ${hex:7e}.  please see suggested patch
below.
I would prefer if we pick a more unique prefix. Something starting with'$' but not followed by '{' would be great. However if others feelstrongly in favor of your variant, that would be fine too. Apart fromthat your proposal is fine with me.
glad to hear that.  yes, the resemblance to variables syntax is a mixed
blessing.

pro: it's easier to recognise magic sequences in the string.
con: it's easier to mix up the two syntaxes.

I think cons. outweigh pros. in this case. Somebody can forget to use':' after hex, etc.

I want to note that a syntax mix-up can be flagged at upload time, so I
don't think it's a big problem in practice.

-   As message header data is converted to [UTF-8] for comparison (see
-   section 2.7.2), most strings will use the UTF-8 encoding.  However,
-   implementations MUST accept all strings that match the grammar in
-   section 8.  The ability to use non-UTF-8 encoded strings matches
-   existing practice and has proven to be useful both in tests for
-   invalid data and in arguments containing raw MIME parts for extension
-   actions that generate outgoing messages.
+   The extension "quoted-character" may be used to encode arbitrary
+   characters as a sequence of US-ASCII characters (see 2.4.2.4 for
+   details).

  For entering larger amounts of text, such as an email message, a
  multi-line form is allowed.  It starts with the keyword "text:",

I am against this change, as it doesn't agree with the rough consensusin the group, which is to try keep existing implementations compliant.


this was a bit lazy editting on my part.  I made the argument for the
reinstating of status quo in a different thread, so please ignore the
removal here.

So I will argue in another thread ;-).

+   quoted-arb-octets   = "${hex:" hex-pair-seq "}"
+   hex-pair-seq        = hex-pair *(WSP hex-pair)
+   hex-pair            = 1*2HEXDIG

Did you really want to allow for
${hex: 7 8 9}
?


not sure what you're pointing at here ...

a) yes, it may be a good idea to add leading and trailing WSP* in
quoted-arb-octets to allow arbitrary extra whitespace.

This would be fine with me, but I was pointing out at single characterhex-pair.

b) yes, it may be more readable to specify hex-pair as 2HEXDIG.  I made
it 1*2HEXDIG since the Unicode is 1*5HEXDIG, so I suggest we change the
latter to 2*5 if the former is 2HEXDIG.

I would rather use 2HEXDIG for octets and 1*5HEXDIG for Unicode, but2*5HEXDIG is fine as well.

so I'm game whatever you say :-)