[Top] [All Lists]

Re: Proposal for escaping on non-UTF-8 sequences in Sieve

2006-10-22 22:55:10

On Sat, 21 Oct 2006, Kjetil Torgrim Homme wrote:
here's my amended, which attempts to fix it.  the wording is a
bit weasely, but I think it will work in practice.  if anyone can fix it
more formally in ABNF, please help out.
+   encoded-seq         = "${" enc-method ":" enc-argument "}"
+   enc-method          = "hex" / "unicode"
+   Values for enc-method or enc-argument which don't match the above
+   syntax SHOULD cause a syntax error.

Hmm, that won't work, as there isn't a defined meaning for something to not match _part_ of a syntax. The only sensible interpretation I can see would be to match *anything* for enc-method and enc-argument, and then compare them to their expected forms. The problem, with that is that it would render this string a syntax error:
        "${name}: ${value}"

because it's an attempt to use encoded-seq with an enc-method of "name}" and a enc-argument of " ${value". That's obviously not the desired result.

To obtain the desired result, we have to give the syntax for all the sequences that should be covered by the encoded-character extension, both those that have a defined expansion and those that should be treated as a syntax error. How broad do we want to make that? The broadest would cover any sequence matching this:

encoded-seq     = "${" enc-method ":" enc-argument "}"
enc-method      = *(%x01-39 / %x3b-7c / %x7e-ff)
                  ; zero or more characters other than ':' or '}'
enc-argument    = *(%x01-7c / %x7e-ff)
                  ; zero or more characters other than '}'

I.e., any sequence that starts with '${', followed by zero or more octets other than ':' and '}', followed by a ':', then zero or more octets other than '}', then finally a '}', would be considered a use of the encoded-seq syntax.

To put it another way, if you find a '${', and there's at least one colon between that and the next '}', it's an encoded-seq.

That would leave
        "${name}: ${value}"
with its expected value, but make this:
         "${name: sdlkfjs}"
or this:
a syntax error.  On the other hand, this:
would _not_ be a syntax error, because it doesn't contain a colon between the braces. That's correct, of course, because we _want_ it to be a variable reference instead.

The above is the broadest syntax as makes sense. We could tighten it up and only cover sequences where the enc-method is, say, alphanumeric. If we did that, then this:
would not be a syntax error.  Opinions?

Philip Guenther