Notes on draft-homme-sieve-variables-04


Hi,

These are a few comments on draft-homme-sieve-variables-04 .  In another
message, I made some comments about variables in general:  specifically
that I'd like to see a time where variables are integrated into the
SIEVE language in a more fundamental way.  However I imagine that even
assuming anyone else agreed with me, that goal would take some time to
reach, and that something like this draft would be more immediately
useful.  In making allowance for that farther goal, I'd want to see:

 - This draft and its capability called something other than "variables"
   (maybe "stringvars")

 - The things that this draft calls "variables" be called something
   else (like, "stringvars").


Other than that, here's some specific stuff about the draft.

   When a string is evaluated, substrings matching variable-ref SHALL be
   replaced by the value of variable-name.  Only one pass through the
   string SHALL be done.  Variable names are case insensitive, so "foo"
   and "FOO" refer to the same variable.  Unknown variables are replaced
   by the empty string.


I'd strongly prefer case sensitivity here.



      variable-ref        =  "${" variable-name "}"
      variable-name       =  num-variable / *namespace identifier
      namespace           =  identifier "."


You might want to mention that "identifier" is as defined in rfc3028
(even though this is an extension- it doesn't hurt to be explicit).

There's not really a lot about namespaces in this draft, other than to
allow for future state variables associated with extensions (or so it
appears).  Maybe point out that this document specifies namespace syntax
only, without addressing anything else about namespaces.  (Also I'd
prefer something other than "." as a namespace suffix.)  The implication
seems to be that namespace-associated variables are read-only; if that's
true, might want to make that explicit.

      num-variable        =  1*DIGIT


Why limit numbered variables to a single digit?  Some languages have had
this restriction to help them with parsing, but that's not the case
here; it's not that much of a trick to distinquish an all-digits
variable name from one that's not all-digits.

Not that I really see a lot of call for more than 9 match references,
but I hate that sort of limit- an implementation ought to be able to
provide more than 9 if it wants to, and limits should be about
resources rather than syntax.

   The expanded string MUST use the variable values which are current
   when control reaches the statement the string is part of.


At least one problem with this has already been brought up on the list:
the interaction between match results and sequential evaluation of
multiple tests.  It's much more expressive to be able to take advantage
of the side effects (the match results) within one test statement;
furthermore it's burdensome on an implementation (especially in terms of
efficiency) to have to freeze the current match results upon entry to a
test statement so that those frozen results will be available by each
step inside that test statement.  I suspect that the goal of this
prescription is to address deferred actions such as "fileinto" -- and
that's a good thing.  However, it does introduce these other real
problems.

   Tests or actions in future extensions may need to access the unex-
   panded version of the string argument and, e.g., do the expansion
   after setting variables in its namespace.  The design of the imple-
   mentation should allow this.


I suppose this is as good a place as any for this comment:  I'm not all
that comfortable with all strings being automagically eligible for
interpolation (once this extension's capability is enabled).  It seems
to me that it would be better to have a syntax that specifically
commands that the string be processed. e.g., with some character before
the opening quote:

    fileinto ?"${1}";

or via other different quoting style.  A bonus would be a syntax that
allows one or more rescannings of the resulting string.

   For ":matches", the list will contain one string for each wildcard
   ("?" and "*") in the match pattern.  Each string holds what the cor-
   responding wildcard expands to, possibly the empty string.  The wild-
   cards expand greedily.


I've lost track of the history: what's the reason for "*" expanding
greedily?  Is it to be compatible with what regex does?  My preference
(both in terms of what I'd expect and in terms of what is efficient to
implement) is the opposite.  For example, let's say you have:

    Subject: [filters] how to search for ']' 

I would want this to work:

    if header :matches "subject" "[*]*" {
        set "prefix" "${1}";
        set "actual_subject" "${2}";
    }

   Numbered variables ${1} through ${9} MUST be supported.  References
   to higher indices than the implementation supports should be treated
   as a syntax error which MUST be discovered at compile-time.


I don't like the mandate that strings have to be inspected at
compile time-- a "MAY" would be preferable.

   The introduction of variables makes advanced decision making easier
   to write, but since no looping construct is provided, all Sieve
   scripts will terminate orderly.


"orderly" is not an adverb :-)

Yours,
mm