Re: Working Group Last Call: draft-ietf-sieve-body-00.txt


On Mon, Mar 14, 2005 at 03:04:09PM -0500, Cyrus Daboo wrote:

I would like to draw your attention to the following draft:

<http://www.ietf.org/internet-drafts/draft-ietf-sieve-body-00.txt>


Nitpick warning as always.


1. Introduction

   > This document reintroduces the "body" test as an extension,
   > and specifies it syntax and semantics.

"its",  not "it"


2. Conventions used.

   > Conventions for notations are as in [SIEVE] section 1.1, including
   > use of [KEYWORDS] and "Syntax:" label for the definition of action
   > and tagged arguments syntax.

I think labels are plural, so it ought to read "... labels for
definitions of ..."


   > The capability string associated with extension defined in this
   > document is "body".

"the extension"



4.1 Body Transform ":raw"

In the example:

        > # This will match a message containing the words "MAKE MONEY FAST"
        > # in body or MIME headers other than the outermost RFC 822 header,
        > # but will not match a message containing the words in a
        > # content-transfer-encoded body.

the description is inexact.  Where it talks about "containing the words" I
for one would get the idea that, well, the test is whether the body
contains those words, rather than that string.  I realize that one should
be reading this in the context of the document, but every little bit of
precision helps.  Also, where it says it will not match in a
content-transfer-encoded body, I beg to differ.  If the body is encoded
quoted-printable, the string "MAKE MONEY FAST" will appear plain as day and
should be matched in raw mode.

[I see that Bob Johannessen also made the above comments, but this is
already in my notes, so read this as "me too"]



4.2 Body Transform ":content"

   > The search for MIME parts matching the :content specification is
   > recursive and automatically descends into multipart and
   > message/rfc822 MIME parts.  Once a MIME part has been identified
   > as suitable for searching, only its direct contents are searched
   > for the key strings.

If a message contains more than one testable part, I assume that the
"body" result is the OR of the tests of all of them, with a
short-circuit exit.  i.e., first match causes the body test to end and
return a true result, whereas a non-match causes the body test to
contine on to the next candidate mime part.  This may seem obvious but
it probably needs to be made explicit, no?  Also, is it worth specifying
the recursion order?


   > For example, a document with "multipart" major content type only
   > directly contains the text in its epilogue and prologue section;
   > all the user-visible data inside it is directly contained in
   > documents with MIME types other than multipart.

I question the term "user-visible."  I'm a user, and the prolog and
epilog stuff is always visible to me in my mail reader.  Maybe just
say "other" ?


   > MIME headers of the containing text MUST NOT be included in the
   > data.

Explicitly provides no way to test the header part of a mime part which,
it seems to me, would be useful.


        > # Save any message with any text MIME part that contains the
        > # worlds "missile" or "coordinates" in the "secrets" folder. 

"words" not "worlds"



5. Interaction with Other Sieve Extensions

   > Regular and wildcard expressions used with "body" are exempt
   > from the side effects described in [VARIABLES].  That is, they
   > do not set numbered variables ${1}, ${2}... to the input
   > values corresponding to wild card sequences in the matched
   > pattern.

I remember that this came up last fall, expressed this way:

  >  QUESTION: Is it okay to have body :matches and
  >  :regex scans not set variables?

and the (small) concensus was a "yes" answer to that question.  I took
that to mean that people thought it was OK for an implementation not to
set the numbered variables-- not that an implementation would be
prohibited from doing so.  This prohibition is unfriendly to
general-purpose match logic.  Also, if it is a prohibition, shouldn't
"MUST NOT" appear there?

Personally I think these match results could be very useful, but
understand if an implementation doesn't want to provide them.
However I'd want to allow an implementation to choose to do so.
(But see the comments on pragmatism below)


8. Acknowledgments

My name jumped out at me- if it's in there, it should be
spelled "Mallett"  :-)


General

Pragmatic/implementation limits?  The body test is easiest to implement
when the entire message, or at least the particular MIME part being
looked at (or its transform result) can be held in memory, but that
can't be guaranteed.  I would favor saying that an implementation may
limit the body test so that it operates against a practical initial
subpart of each mime part's data, as long as that is no less than some
number of bytes.  This may be controversial: does everyone plan on
implementing searches against non-memory-resident data, e.g. that 
must be buffered to disk?

Similarly, a script writer often just wants to see if there's something in
the first few lines of a message, and not needlessly test beyond that --
which, in fact, could yield a false positive.  I would like to see
option(s) that allow one to specify testing some initial subpart.  I
realize it's a bit late to bring that up, though.

If match results are not prohibited, another pragmatic limit would be
the size of such a result.  I would favor something that said this
rather than prohibited the match results from being saved.

Yours,
mm