Re: Notes on draft-homme-sieve-variables-04

These are a few comments on draft-homme-sieve-variables-04 .  In another
message, I made some comments about variables in general:  specifically
that I'd like to see a time where variables are integrated into the
SIEVE language in a more fundamental way.  However I imagine that even
assuming anyone else agreed with me, that goal would take some time to
reach, and that something like this draft would be more immediately
useful.  In making allowance for that farther goal, I'd want to see:

 - This draft and its capability called something other than "variables"
   (maybe "stringvars")


I could not care less what the capability naame is.

 - The things that this draft calls "variables" be called something
   else (like, "stringvars").


This strikes me as more confusing than useful. It something else comes along it
cann call whatever it defines builtin variables or something similar.

Other than that, here's some specific stuff about the draft.

   When a string is evaluated, substrings matching variable-ref SHALL be
   replaced by the value of variable-name.  Only one pass through the
   string SHALL be done.  Variable names are case insensitive, so "foo"
   and "FOO" refer to the same variable.  Unknown variables are replaced
   by the empty string.

I'd strongly prefer case sensitivity here.


I, OTOH, strongly prefer case insensitivity.



      variable-name       =  num-variable / *namespace identifier
      namespace           =  identifier "."

You might want to mention that "identifier" is as defined in rfc3028
(even though this is an extension- it doesn't hurt to be explicit).


Any time a document inherits ABNF from another document it needs to say so.
So yes, this issue definitely needs to be addressed.

There's not really a lot about namespaces in this draft, other than to
allow for future state variables associated with extensions (or so it
appears).  Maybe point out that this document specifies namespace syntax
only, without addressing anything else about namespaces.  (Also I'd
prefer something other than "." as a namespace suffix.)  The implication
seems to be that namespace-associated variables are read-only; if that's
true, might want to make that explicit.


Pointing out that namespaces are basically a placeholder is fine and a good
idea. I like dot as a separator; there are other characters I could
live with but also a lot I would object to. I don't think stating that
namespace variables are read-only is a good idea at this point.

      num-variable        =  1*DIGIT

Why limit numbered variables to a single digit?


This doesn't impose such a limit. "1*" means "one or more", not "one".
You'd write "1DIGIT" for "one".

Some languages have had
this restriction to help them with parsing, but that's not the case
here; it's not that much of a trick to distinquish an all-digits
variable name from one that's not all-digits.


Right, although perhaps banning leading zeroes in the ABNF would be a good idea.
Something like:

    num-variable = 1DIGIT / (%x31-39 1*DIGIT)

Not that I really see a lot of call for more than 9 match references,
but I hate that sort of limit- an implementation ought to be able to
provide more than 9 if it wants to, and limits should be about
resources rather than syntax.


Again, there's no such limit, although implementations are
allowed to impose a limit if they wish. I wonder if 9 is an
appropriate minimum maximum, however.

   The expanded string MUST use the variable values which are current
   when control reaches the statement the string is part of.

At least one problem with this has already been brought up on the list:
the interaction between match results and sequential evaluation of
multiple tests.  It's much more expressive to be able to take advantage
of the side effects (the match results) within one test statement;
furthermore it's burdensome on an implementation (especially in terms of
efficiency) to have to freeze the current match results upon entry to a
test statement so that those frozen results will be available by each
step inside that test statement.  I suspect that the goal of this
prescription is to address deferred actions such as "fileinto" -- and
that's a good thing.  However, it does introduce these other real
problems.


I agree, I can argue this one either way, but I actually prefer the
"numbered variable changes take effect immediately" model.

   Tests or actions in future extensions may need to access the unex-
   panded version of the string argument and, e.g., do the expansion
   after setting variables in its namespace.  The design of the imple-
   mentation should allow this.

I suppose this is as good a place as any for this comment:  I'm not all
that comfortable with all strings being automagically eligible for
interpolation (once this extension's capability is enabled).  It seems
to me that it would be better to have a syntax that specifically
commands that the string be processed. e.g., with some character before
the opening quote:

    fileinto ?"${1}";

or via other different quoting style.  A bonus would be a syntax that
allows one or more rescannings of the resulting string.


I think having this as a global setting is sufficient. You either 
require variables and get the effect or you don't and the effect is absent.

I also strongly object to making a syntax change to the language in order
to get this functionality. If we're gonna do that we might as well
make variables first class objects and add a concatenation operator.

   For ":matches", the list will contain one string for each wildcard
   ("?" and "*") in the match pattern.  Each string holds what the cor-
   responding wildcard expands to, possibly the empty string.  The wild-
   cards expand greedily.

I've lost track of the history: what's the reason for "*" expanding
greedily?  Is it to be compatible with what regex does?  My preference
(both in terms of what I'd expect and in terms of what is efficient to
implement) is the opposite.  For example, let's say you have:


I'm afraid I don't buy your efficiency claims here. I can write efficient
greedy globbing code and I can write inefficient non-greedy globbing code.

I think the main reason for greedy globbing is exactly what you say: It
is what people have come to expect. I could live with non-greedy
globbing, although I'd have to change things a bit to support it.

   Numbered variables ${1} through ${9} MUST be supported.  References
   to higher indices than the implementation supports should be treated
   as a syntax error which MUST be discovered at compile-time.

I don't like the mandate that strings have to be inspected at
compile time-- a "MAY" would be preferable.


I missed this. I agree 100% with your assessment.

   The introduction of variables makes advanced decision making easier
   to write, but since no looping construct is provided, all Sieve
   scripts will terminate orderly.

"orderly" is not an adverb :-)


Indeed.

                                Ned