[Top] [All Lists]

Re: variables draft (draft-homme-sieve-variables-00.txt)

2003-04-09 15:38:22

[Lawrence Greenfield]:

  Formal operational semantics and careful thought on how to make
  various features orthogonal would go a long way so we don't have
  every specification referencing every other specification and
  subtly altering their meanings.

no disagreement in general.  however, variables are usually a
first-class member of a language, so when we add this after the fact,
it is to be expected that most specifications have their functionality
altered (hopefully enhanced).

in my draft, this is done by changing the semantics of the basic type
string.  the alternative seems to be to make variable references a
construct only available when explicitly stated in the grammar for the
action, which in turn requires new actions and tests to be defined
whereever variables might be useful.  I obviously favour the first
approach.  it may be more painful now, but I believe it will pay off
in the long term.

now, how to do it?  in my first attempt at a draft, I changed the
grammar of quoted-string, multi-line-literal and multi-line-dotstuff.
this does seem more invasive, but it may also seem more honest.

basically, it was done like this:

       quoted-string  =  DQUOTE *(CHAR / variable-ref) DQUOTE
        variable-ref  =  "${" variable-name "}"

which is bogus, of course.  doing it right is actually quite hard (at
least for me), I don't see how to support the current syntax in the
draft without a _lot_ of grammar rules.  actually, I don't see how to
do it at all :-)

here's an attempt at a slightly simplified syntax:

  quoted-string = DQUOTE
                  *(qcontent / quoted-pair / variable-ref /
                    verbatim-dollar / "{")

        qcontent  =  %x01-21 / %x23 / %x25-5b / %x5d-%x7a / %x7c-1fffff
                ; all characters except NUL, double-quote, dollar,
                ; backslash and opening brace
     quoted-pair  =  "\" %x01-1fffff
    variable-ref  =  "${" variable-name "}"
   variable-name  =  num-variable / identifier
    num-variable  =  1*DIGIT
 verbatim-dollar  =  1*"$" (qcontent / quoted-pair / variable-ref)

the verbatim-dollar makes sure that "$${foo}" and "$\"bar" are allowed
and work as expected.  since there has to be a non-dollar character in
the verbatim-dollar expansion, strings ending in one or more dollar
characters are handled by an explicit *"$" just before the closing

using this syntax, "${!}" gives a parsing error rather than being left
verbatim.  backslash are processed differently than in my draft, a
backslash will now actually escape the variable reference, which I
think is an improvement.  "\${foo}" will parse as a quoted-pair ("\$")
followed by a "{" and "foo}".

the above syntax can be simplified further by requiring all dollars
outside variable references to be escaped using backslash.

  quoted-string = DQUOTE
                  *(qcontent / quoted-pair / variable-ref)
        qcontent  =  %x01-21 / %x23 / %x25-5b / %x5d-1fffff
                ; all characters except NUL, double-quote, dollar and
                ; backslash

other expansions are the same as above.  strings like "$$$" and
"\\.(.*)$" now yield errors.  most users will use Sieve generators
that probably escape every dollar _anyway_ to keep the implementation
simple, so being non-intrusive in general texts doesn't seem
important.  the regexp gotcha may be the worst.

it might be useful to change string as well:

          string  =  quoted-string / squoted-string / multi-line
  squoted-string  =  "'" *(sqcontent / quoted-pair) "'"
       sqcontent  =  %x01-26 / %x28-5b / %x5d-1fffff
                ; all characters except NUL, single quote and backslash

that way the body extension can mandate the use of squoted-string for
specifying the match pattern.  yes, dependencies are tricky here.


btw, it seems this needs fixing in a Sieve revision:

   quoted-string  =  DQUOTE *CHAR DQUOTE
            CHAR  =  %x01-7F ; from [ABNF]
    CHAR-NOT-DOT  =  (%x01-09 / %x0b-0c / %x0e-2d / %x2f-ff)
           ;; no dots, no CRLFs
and others doesn't include the complete Unicode repertoire.

also, CHAR includes the double quote, so "foo" "bar" could be parsed
as one quoted-string with contents ``foo" "bar''.

Kjetil T.

<Prev in Thread] Current Thread [Next in Thread>