Re: WGLC on draft-ietf-sieve-mime-loop


On Mon, Aug 20, 2007 at 06:25:04PM -0400, Mark E. Mallett wrote:



I had intended to get some comments out before the last day (i.e.,
before today) but haven't managed to organize my thoughts well enough.
I'd like to get them out to the list in the next day or two.



I don't know why I'm having such a hard time with this set of comments,
but I've been fretting on it forever, it seems.  Forgive me that it's still
somewhat muddled, but it's going out now :)

Even though I have a bunch of comments, I do like what this is trying to
accomplish and I'll likely end up implementing it however it ends up.

Some general stuff first:

I think I'd like it a bit better if it didn't seem like this was about
loops with some mime stuff thrown in; rather that it be about mime stuff
with loops being one aspect of it.  I realize the title is "MIME part
tests, iteration, extraction, replacement and enclosure."  However the
introductory text is almost all about loops.  I think that the first
introductory text should bring up a new concept of selecting MIME parts,
and how selecting a MIME part affects various Sieve constructs.  e.g.
when a MIME part is selected, the action of header tests becomes
refocused to deal with the selected mime part's header instead of the
top-level message header.  This is a change: that concept is not really
present in the draft so far, but I think it adds consistency and
clarity.  This would not have to be repeated for every section that
introduces new tagged arguments for header tests, as it is (sort of)
now.  This sort of introduction could lead to clearer text in other
areas too; such text could assume that the reader was familiar with the
notion of a selected MIME part, and go from there.  Each section would
not, as it does now, have to describe the special case of what a sieve
command does when it's inside of a loop vs the special case of what it
does when it's not inside of a loop.  I'd much rather get rid of those
special case descriptions in favor of a description of what the command
does, period.

With the concept of a selected MIME part, I'd also prefer it if a loop
left the last matching MIME part selected, even outside of the loop
after the loop has executed.  This would require the addition of at
least one more SIEVE verb, one that reselects the entire message (goes
back to the top).  It would probably also require a clarification of how
nested loops interact when selecting MIME parts, and possibly the
introduction of a scoping construct that preserves the MIME part
selection outside of its block.

My implementation has its own way selecting and looping through MIME
parts, and I've found that one of the great benefits is being able to
locate a part and then leave the focus on that part.

More specific stuff about the draft:

3.  Sieve Loops

   The base Sieve language has no looping mechanism.  Given that
   messages may contain multiple parts, in order to support filters that
   apply to any and all parts, we introduce a new control command:
   "for_every_part", which is an iterator that walks though every MIME
   part of a message, including nested parts, and applies the commands
   in the specified block to each of them.


Perhaps explicitly say that the walking of the MIME structure is
depth-first.  I can't really imagine any other order, but one never
knows.  And I must say.. one thing I have found useful is being able to
step through siblings without descending into child parts; I think
that's a significant lack here.

   The iterator will start with
   the first MIME part (as its current context) and will execute a
   command block (Sieve commands enclosed by { ...}).


I would prefer "will start with the first child of the currently selected
message part (or MIME part)" perhaps with a reminder that initially
the "currently selected part" is the top-level message.


   The iterator can be terminated prematurely by a new Sieve command,
   "break".


This may be way too disruptive of a comment, but..  since the draft
provides for nested loops, perhaps some consideration should be given
for being able to break out of a particular level.  One failing in some
languages is a "break" that only applies to the immediately enclosing
scope.  One might, for example, have an optional tag on the
'for_every_part' and an optional matching tag on the "break":

    for_every_part "A" {
        for_every_part "B" {
            if something
                break "A";
        }
    }

   "for_every_part" commands can be nested inside other "for_every_part"
   commands.  When this occurs, the nested "for_every_part" iterates
   over the MIME parts contained within the MIME part current being
   targeted by the nearest enclosing "for_every_part" command.


Here one could simply say that it processes the message relative to
the currently selected MIME part.  This remains true regardless of
whether the loop is nested.

   If that
   MIME part is a terminal MIME part (i.e. does not contain other MIME
   parts) then the nested "for_every_loop" is simply ignored.


Again, always true.

   Sieve implementations MAY limit the number of nested loops that occur
   within one another, however they MUST support at least one nested
   loop inside another loop.


The way loops are defined in this draft, I'm trying to figure out what
nesting solves, and where the limit came from.  When the nesting came up
previously, I thought the concensus was to to make the maximum nesting
level implementation-defined, with a mandatory minimum nesting level of
1 (i.e., only one outer loop, no inner loops required).  I'm actually in
favor of nested loops, but I'm not sure how much sense they make when
you can't control the way the loops step through the parts, and when the
loop doesn't leave a particular part selected.  e.g. imagine a code
structure like this, only one that isn't so obviously made-up:

    if header :contenttype "content-type" "multipart/report" {
        for_every_child { # process the immediate children only
            if header :contenttype "message/feedback-report" {
                # stuff here
            }
            elsif header :contenttype "text/plain" {
                # other stuff here
            }
            elsif header :contenttype "message/rfc822" {
                # Now descend into this and find a text/plain part
                for_every_part {
                    if header :contenttype "text/plain" {
                        # ...
                    }
                }
            }
        }
    }

(In my implementation I took the tack of adding a generic loop construct
and MIME part navigation primitives, which is a building-block approach
that I tend to favor.)

4.  Changes to Sieve tests

   This specification extends the base Sieve "header", "address" and
   "exists" tests to support targeting those tests at a specific MIME
   part or at all MIME parts in the enclosing scope.


As I mentioned, I think this would be better as part of introductory
text that talks about the notion of a currently selected message part.

4.1.  Test "header"

   The "header" test is extended with the addition of a new ":mime"
   tagged argument, which takes a number of other arguments.

   Usage:  header [:mime] [:anychild] [MIMEOPTS]
      [COMPARATOR] [MATCH-TYPE]
      <header-names: string-list> <key-list: string-list>

   Usage:  The definition of [MIMEOPTS] is:

      Syntax: ":type" / ":subtype" / ":contenttype" /
      ":param" <param-list: string-list>


First, I think that having ":mime" take another tagged argument as a
modifier is confusing and is inconsistent with other Sieve constructs.
Take the "relational" extension, for example.  It isn't:

     header :value :gt "some-header" "3"
but it's
     header :value "gt" "some-header" "3"

and if there's going to be a ":mime" option, I would think it would
emulate that kind of syntax.  Either way, it's unfortunate that ":mime"
here has completely different meaning that ":mime" as it already exists
elsewhere.  But really, I think this ":mime" and its tagged-option
arguments are superfluous.  I think that ":param" can do it all.  Let's
say that the presence of ":param" is only valid for a header field that
has a general metasyntax matching

     field-name:  some-fixed-values 0*<;name=value>

A ":param" option would take a string (or string list) as argument.  If
the string is non-empty, it selects the corresponding value matching the
name in the string.  If the string is empty, it selects the
"some-fixed-values" portion of the header field.  Thus one could say
this to look at the subtype of a "content-type" header:

    header :param "" :matches "content-type" "*/html"

and this to look at the "charset" paremeter of that same header:

    header :param "charset" :is "ISO-8859-1"

No need for :mime or other selectors, and this becomes more general,
to boot.

4.2.  Test "address"

   The "address" test is extended with the addition of a new ":mime"
   tagged argument, which takes a number of other arguments.

   Usage:  address [:mime] [:anychild] [COMPARATOR]
      [ADDRESS-PART] [MATCH-TYPE]
      <header-list: string-list> <key-list: string-list>


I don't really get how :mime is relevant here.
Ditto with "exists"

5.  Action Replace

   Usage:  replace [:mime] [:subject string] [:from string]
      <replacement: string>

   The "replace" command is defined to allow a MIME part to be replaced
   with the text supplied in the command.

   When used in the context of a "for_every_part" iterator, the MIME
   part to be replaced is the "current" MIME part.


Again, I would prefer that this just say that it addresses the currently
selected message part.  Then there is no need to talk about the distinction
between its use inside of or outside of a loop body.

   If the entire message is being replaced, a ":subject" parameter
   specifies a subject line to attach to the message that is generated.
   UTF-8 characters can be used in the string argument; implementations
   MUST convert the string to [RFC2047] encoded words if and only if
   non-ASCII characters are present.  Implementations MUST preserve the
   previous Subject header as an Original-Subject header.


And again, for consistency's sake, why not just make the :subject
parameter work (to modify the "Subject" header field within the currently
selected message part) whether it's a child part that's selected or
whether it's the top part that's selected.  Ditto ":from" .  And for
that matter, why are these needed at all, when ":mime" will do?

7.  Action extract_text

   Usage:  extract_text [MODIFIER] [":first" number] <varname: string>

   The extract_text action may be used within the context of a
   "for_every_part" loop.  It stores at most :first bytes of the current
   MIME body part in the variable identified by varname.  If the :first
   parameter is not present, the whole content of the current MIME body
   part is stored.  In either case the actually stored data MAY be
   truncated to conform to implementation specific limit on variable
   length and/or on MIME body part length.  QUESTION: What do we do if
   the Content-Transfer-Encoding is anything other than 7bit?

   If extract_text is used outside the context of a "for_every_part"
   loop, the action will set the variable identified by varname to the
   empty string.


Why?

Again, I would prefer that this be defined to operate consistently,
acting on the currently selected MIME part regardless of whether it's
inside of a loop body.

Yours, slowly and tediously,
-mm-