[Top] [All Lists]

Re: Proposal for escaping on non-UTF-8 sequences in Sieve

2006-09-18 10:10:45

Michael Haardt wrote:

>On Sun, Sep 17, 2006 at 11:39:53PM +0100, Alexey Melnikov wrote:
>>Here is a strawman syntax for new quoted strings that would only allow
>>for valid UTF-8 sequences, but would also allow for escaped non-UTF-8
>>  new-quoted-string  = "~" DQUOTE new-quoted-text DQUOTE
>You introduce a new lexical token to Sieve, thus changing the Sieve

I have to agree with Michael - changing the basic syntax of sieve is

>I don't think the base spec allows that.
IMHO, if it explicitly prohibit such change, we need to change the base

One of the key strengths of sieve is that the core syntax is both simple
and immutable. This makes it possible to perform syntax checks even if
you don't understand all the extensions a given script uses. Breaking this would IMO be hugely damaging.

>Sieve has a very
>small syntax, moving most stuff to the semantical layer for being very
>extensible, pretty much like LISP does.
>The variables extension, for example, introduces the concept of expanding
>strings, but only at the semantic level.  Syntactically, strings still
>look the same.
The problem is that RFC 3028 said that '\ <octet>' for any <octet> other
than <\> or <"> SHOULD be interpreted as <octet>.
This effectively means that we can't fix quoted strings. So I had to
introduced a new quoted string.

First of all, having an extension that changes the interpretation of, say, \x
followed by two hex digits would IMO be far less damaging than an extension
that changes the core syntax.

Do you have any better suggestions?

There are all sorts of ways to do it. Here's an obvious one: If the
octet-value extension is enabled, any occurances of ${x} where x takes
the form of a space-separated list of decimal values is replaced with
a sequence of octets corresponding to each value. Hex is a bit more
difficult since the sequences could be confused with a variable names, so
if you want values in hex the simplest thing would be to invent a different
escape convention, e.g. something like $%x%.