ietf-mta-filters
[Top] [All Lists]

Re: Collected comments for draft-ietf-sieve-body-00.txt

2005-05-01 18:27:17


My apologies to the working group for the long delay in responding
to the comments from the WGLC on the body and editheader drafts.
My thanks to the chairs for compiling the comments; here are my
responses for the body draft.


Ken Murchison wrote:
Section 4.1, paragraph 2, last phrase says "... and MUST NOT interpret 
or skip MIME headers of enclosed body parts."

I don't think that the intent was to have "MUST NOT" refer to "skip", 
but it reads that way.  I'd recommend that either "or skip" be removed 
(since not interpreting them seems to encompass ignoring them), or 
reword the entire phrase to something like:  "... and either MUST NOT 
interpret MIME headers of enclosed body parts or MUST ignore them 
entirely."

The intent is that under the :raw transformation, the message is
just a series of octets with no special interpretation.  MIME headers
in enclosed body parts are therefore just more pieces of text to
match against.  To make that clearer, I've changed the last clause of
that sentence to read:

   <...> and MUST treat multipart boundaries
   or the MIME headers of enclosed body parts as part of the text
   being matched against instead of as MIME structures to interpret.


Bob Johannessen wrote:

4.1 Body Transform ":raw"

     # This will match a message containing the words "MAKE MONEY FAST"
     # in body or MIME headers other than the outermost RFC 822 header,
     # but will not match a message containing the words in a
     # content-transfer-encoded body.

Wouldn't it be more correct to say that it matches the string, or
even the character sequence "MAKE MONEY FAST"? Also I'm not sure I
understand what is meant by "a content-transfer-encoded body". It
*could* still match the character sequence in a quoted-printable
encoded body, couldn't it?

Correct.  I've changed to comment to read:

        # This will match a message containing the literal text
        # "MAKE MONEY FAST" in body parts (ignoring any 
        # content-transfer-encodings) or MIME headers other than    
        # the outermost RFC 2822 header.


4.2 Body Transform ":content"

  If an implementation does not support conversion of a given
  charset to  UTF-8, it MAY compare against the US-ASCII subset
  of the transfer-decoded character data instead.

Does the above rely on all current and future charsets having
a one-to-one mapping to US-ASCII for all characters with code
points 0-127? Is this a safe assumption? Is it even true of all
existing charsets? Maybe it would be better to explicitly
exclude all parts who can't be converted to UTF-8?

On reflection, I think this was intended to be similar to the
requirements of section 2.7.2 ("Comparisons Across Character Sets")
of the base-spec, but slightly stricter in that implementations
would be required to support UTF-8.


Speaking of which, perhaps that section of the base-spec should be
updated to require support of UTF-8 for header charsets, only falling
back to the weaker "No two strings..." text for charsets other than
UTF-8.


If that change was made, then I think the problem paragraph in the
body draft could be replaced with
        Implementations MUST use the same rules for comparisons
        against body parts in charsets other than UTF-8 as they use
        for comparisons against header fields in such charsets (c.f.
        [SIEVE] section 2.7.2).

(and the SIEVE reference would need to be updated to be against the
revision)


Mark E. Mallett wrote:

4.2 Body Transform ":content"

  > The search for MIME parts matching the :content specification is
  > recursive and automatically descends into multipart and
  > message/rfc822 MIME parts.  Once a MIME part has been identified
  > as suitable for searching, only its direct contents are searched
  > for the key strings.

If a message contains more than one testable part, I assume that the
"body" result is the OR of the tests of all of them,
...
This may seem obvious but it probably needs to be made explicit, no?


Yeah.  To clarify, I've replaced the second/last sentence of that
paragraph with:
        All MIME parts with matching types are searched for the key
        strings.


with a short-circuit exit.
i.e., first match causes the body test to end and
return a true result, whereas a non-match causes the body test to
contine on to the next candidate mime part.

I don't see why short-circuiting needs to be mentioned, as it's
simply an obvious optimization and has no effect on the visibile
behavior.  While the base spec does encourage implementation to
implement short-circuiting in evaluation of string lists, it didn't
seem necessary to mention that they should stop searching within a
header/address/whatever for a given string as soon as a match is
found.


[...]
  > For example, a document with "multipart" major content type only
  > directly contains the text in its epilogue and prologue section;
  > all the user-visible data inside it is directly contained in
  > documents with MIME types other than multipart.

I question the term "user-visible."  I'm a user, and the prolog and
epilog stuff is always visible to me in my mail reader.  Maybe just
say "other" ?

To clarify the matching against multipart and message/rfc822 parts,
I've replaced that paragraph with:

   If the :content specification matches a multipart MIME part,
   only the prologue and epilogue sections of the will be searched
   for the key strings; the contents of nested parts are only
   searched if their respective types match the :content specification.

   If the :content specification matches a message/rfc822 MIME part,
   only the header of the nested message will be searched for the
   key strings; the contents of the nested message body parts are  
   only searched if its content-type matches the :content specification.

and have dropped the "Nevertheless" from the following parenthetical
remark.

Furthermore, I've inserted an elaborate example of these rules,
building from Cyrus's suggestion, described below.


...
"words" not "worlds"
...
My name jumped out at me- if it's in there, it should be
spelled "Mallett"  :-)

Fixed and fixed.


5. Interaction with Other Sieve Extensions

  > Regular and wildcard expressions used with "body" are exempt
  > from the side effects described in [VARIABLES].  That is, they
  > do not set numbered variables ${1}, ${2}... to the input
  > values corresponding to wild card sequences in the matched
  > pattern.

I remember that this came up last fall, expressed this way:

 >  QUESTION: Is it okay to have body :matches and
 >  :regex scans not set variables?

and the (small) concensus was a "yes" answer to that question.  I took
that to mean that people thought it was OK for an implementation not to
set the numbered variables-- not that an implementation would be
prohibited from doing so.  This prohibition is unfriendly to
general-purpose match logic.

An implementation that wanted to support it could enable capturing
from 'body' matches into variables using another extension...


Also, if it is a prohibition, shouldn't
"MUST NOT" appear there?

   Regular and wildcard expressions used with "body" are exempt  
   from the side effects described in [VARIABLES].  That is, they
   MUST NOT set numbered variables ${1}, ${2}... to the input values
   corresponding to wild card sequences in the matched pattern.
   However, if the extension is present, variable references in the   
   key strings or content type strings are evaluated as described
   in the draft.

(That takes into account a suggestion from Nigel Swinson as well).


Nigel Swinson wrote:
7. Security Considerations
I suggest:
-   replacement for a virus or spam filtering system.
+  replacement for a spam, virus or other security related filtering system.

Done.


Cyrus Daboo wrote:
--On March 28, 2005 15:36:30 +0100 Nigel Swinson 
<...comments on the lack of clarity in matching multipart
types and a suggestion of an example...>

I've inserted the following example into section 4.2:
-----
   Example:
        From: Whomever
        To: Someone
        Date: Whenever
        Subject: whatever
        Content-Type: multipart/mixed; boundary=outer

     &  This is a multi-part message in MIME format.
     &
        --outer
        Content-Type: multipart/alternative; boundary=inner

     &  This is a nested multi-part message in MIME format.
     &
        --inner
        Content-Type: text/plain; charset="us-ascii"

     $  Hello
     $
        --inner
        Content-Type: text/html; charset="us-ascii"

     %  <html><body>Hello</body></html>
     %
        --inner--
     &
     &  This is the end of the inner MIME multipart.
     &
        --outer
        Content-Type: message/rfc822

     !  From: Someone Else
     !  Subject: hello request

     $  Please say Hello
     $
        --outer--
     &
     &  This is the end of the outer MIME multipart.


   In the above example, the '&', '$' and '%' characters at the
   start of a line are used to illustrate what portions of the
   example message are used in tests:

   - the lines starting with '&' are the ones that are tested when
     a 'body :content "multipart" :contains "MIME"'
     test is executed.

   - the lines starting with '$' are the ones that are tested when
     a 'body :content "text/plain" :contains "Hello"' test is
     executed.

   - the lines starting with '%' are the ones that are tested when
     a 'body :content "text/html" :contains "Hello"' test is executed.
   - the lines starting with '$' or '%' are the ones that are tested
     when a 'body :content "text" :contains "Hello"' test is executed.

   - the lines starting with '!' are the ones that are tested when
     a 'body :content "message/rfc822" :contains "Hello"' test is
     executed.
----


Cyrus Daboo wrote:
...
Header: Fix alignment of 'Philip Guenther'
1.3 : 'specifies it syntax' -> 'specifies its syntax'
2.2 : 'with extension' -> 'with the extension'
3.1 : reformat syntax
3.4 : 'all "body" tests fail' -> 'all "body" tests return false'
4.2 : reformat syntax
4.2 : the term 'document' is used to refer to a MIME 'part', I would 
      prefer using 'part' in all cases.
Appendix B: missing reference [REGEX]

All done


4.2p6 : 'decoded to prior' -> 'decoded prior'

Done.  I've added text to require support for the 7bit, 8bit, and
binary transfer encodings so that the MAY only applies to
not-yet-standardized encodings:
   MIME parts encoded in "quoted-printable" or "base64" content
   transfer encodings MUST be decoded prior to the match.  MIME
   parts in "7bit", "8bit", "binary" content transfer encodings
   MUST be matched as they are.  MIME parts in content transfer
   encodings other than those MAY be decoded, omitted from the test,
   or processed as raw data.


4.3 : just for completeness add an example.

Added:
   Example:
        require ["body", "fileinto"];

        # Save messages mentioning the project schedule in the
        # project/schedule folder.
        if body :text :contains "project schedule" {
                fileinto "project/schedule";
        }


My comments:

4.2 Body Transform ":content"
[...]
  If an individual content type contains a '/' (slash), it
  specifies a full <type>/<subtype> pair, and matches only
  that specific content type.  If it is the empty string, all
  MIME content types are matched.  Otherwise, it specifies a
  <type> only, and any subtype of that type matches it.

I would like to see ABNF for the content type and some text explaining 
what should be done if the user specified an invalid value here, e.g. 
"/". I suspect the answer to this can be: no runtime error, but no match.

I would rather not drag in ABNF just for this single paragraph.
Indeed, I suspect the result would be more difficult to comprehend
when specified that way.  As the only cases not covered by the
current text are values that begin or end with a slash or contain
multiple slashes, I've added an initial case to specify that they
match no content types:

   If an individual content type begins or ends with a '/' (slash) 
   or contains multiple slashes, it matches no content types.  
   Otherwise, if it contains a slash, then it specifies a full
   <type>/<subtype> pair, and matches only that specific content   
   type.  If it is the empty string, all MIME content types are
   matched.  Otherwise, it specifies a <type> only, and any subtype 
   of that type matches it.



At this point, I think there is only one unresolved issue: what is
required for charset conversion when using :content?  As stated
above, my preference would be to update the base spec revision's
secion 2.7.2 to require support for UTF-8, and then simply refer
to that in section 4.2 of the body I-D.

Opinions?


Philip Guenther