My apologies to the working group for the long delay in responding
to the comments from the WGLC on the body and editheader drafts.
My thanks to the chairs for compiling the comments; here are my
responses for the body draft.
Ken Murchison wrote:
Section 4.1, paragraph 2, last phrase says "... and MUST NOT interpret
or skip MIME headers of enclosed body parts."
I don't think that the intent was to have "MUST NOT" refer to "skip",
but it reads that way. I'd recommend that either "or skip" be removed
(since not interpreting them seems to encompass ignoring them), or
reword the entire phrase to something like: "... and either MUST NOT
interpret MIME headers of enclosed body parts or MUST ignore them
entirely."
The intent is that under the :raw transformation, the message is
just a series of octets with no special interpretation. MIME headers
in enclosed body parts are therefore just more pieces of text to
match against. To make that clearer, I've changed the last clause of
that sentence to read:
<...> and MUST treat multipart boundaries
or the MIME headers of enclosed body parts as part of the text
being matched against instead of as MIME structures to interpret.
Bob Johannessen wrote:
4.1 Body Transform ":raw"
# This will match a message containing the words "MAKE MONEY FAST"
# in body or MIME headers other than the outermost RFC 822 header,
# but will not match a message containing the words in a
# content-transfer-encoded body.
Wouldn't it be more correct to say that it matches the string, or
even the character sequence "MAKE MONEY FAST"? Also I'm not sure I
understand what is meant by "a content-transfer-encoded body". It
*could* still match the character sequence in a quoted-printable
encoded body, couldn't it?
Correct. I've changed to comment to read:
# This will match a message containing the literal text
# "MAKE MONEY FAST" in body parts (ignoring any
# content-transfer-encodings) or MIME headers other than
# the outermost RFC 2822 header.
4.2 Body Transform ":content"
If an implementation does not support conversion of a given
charset to UTF-8, it MAY compare against the US-ASCII subset
of the transfer-decoded character data instead.
Does the above rely on all current and future charsets having
a one-to-one mapping to US-ASCII for all characters with code
points 0-127? Is this a safe assumption? Is it even true of all
existing charsets? Maybe it would be better to explicitly
exclude all parts who can't be converted to UTF-8?
On reflection, I think this was intended to be similar to the
requirements of section 2.7.2 ("Comparisons Across Character Sets")
of the base-spec, but slightly stricter in that implementations
would be required to support UTF-8.
Speaking of which, perhaps that section of the base-spec should be
updated to require support of UTF-8 for header charsets, only falling
back to the weaker "No two strings..." text for charsets other than
UTF-8.
If that change was made, then I think the problem paragraph in the
body draft could be replaced with
Implementations MUST use the same rules for comparisons
against body parts in charsets other than UTF-8 as they use
for comparisons against header fields in such charsets (c.f.
[SIEVE] section 2.7.2).
(and the SIEVE reference would need to be updated to be against the
revision)
Mark E. Mallett wrote:
4.2 Body Transform ":content"
> The search for MIME parts matching the :content specification is
> recursive and automatically descends into multipart and
> message/rfc822 MIME parts. Once a MIME part has been identified
> as suitable for searching, only its direct contents are searched
> for the key strings.
If a message contains more than one testable part, I assume that the
"body" result is the OR of the tests of all of them,
...
This may seem obvious but it probably needs to be made explicit, no?
Yeah. To clarify, I've replaced the second/last sentence of that
paragraph with:
All MIME parts with matching types are searched for the key
strings.
with a short-circuit exit.
i.e., first match causes the body test to end and
return a true result, whereas a non-match causes the body test to
contine on to the next candidate mime part.
I don't see why short-circuiting needs to be mentioned, as it's
simply an obvious optimization and has no effect on the visibile
behavior. While the base spec does encourage implementation to
implement short-circuiting in evaluation of string lists, it didn't
seem necessary to mention that they should stop searching within a
header/address/whatever for a given string as soon as a match is
found.
[...]
> For example, a document with "multipart" major content type only
> directly contains the text in its epilogue and prologue section;
> all the user-visible data inside it is directly contained in
> documents with MIME types other than multipart.
I question the term "user-visible." I'm a user, and the prolog and
epilog stuff is always visible to me in my mail reader. Maybe just
say "other" ?
To clarify the matching against multipart and message/rfc822 parts,
I've replaced that paragraph with:
If the :content specification matches a multipart MIME part,
only the prologue and epilogue sections of the will be searched
for the key strings; the contents of nested parts are only
searched if their respective types match the :content specification.
If the :content specification matches a message/rfc822 MIME part,
only the header of the nested message will be searched for the
key strings; the contents of the nested message body parts are
only searched if its content-type matches the :content specification.
and have dropped the "Nevertheless" from the following parenthetical
remark.
Furthermore, I've inserted an elaborate example of these rules,
building from Cyrus's suggestion, described below.
...
"words" not "worlds"
...
My name jumped out at me- if it's in there, it should be
spelled "Mallett" :-)
Fixed and fixed.
5. Interaction with Other Sieve Extensions
> Regular and wildcard expressions used with "body" are exempt
> from the side effects described in [VARIABLES]. That is, they
> do not set numbered variables ${1}, ${2}... to the input
> values corresponding to wild card sequences in the matched
> pattern.
I remember that this came up last fall, expressed this way:
> QUESTION: Is it okay to have body :matches and
> :regex scans not set variables?
and the (small) concensus was a "yes" answer to that question. I took
that to mean that people thought it was OK for an implementation not to
set the numbered variables-- not that an implementation would be
prohibited from doing so. This prohibition is unfriendly to
general-purpose match logic.
An implementation that wanted to support it could enable capturing
from 'body' matches into variables using another extension...
Also, if it is a prohibition, shouldn't
"MUST NOT" appear there?
Regular and wildcard expressions used with "body" are exempt
from the side effects described in [VARIABLES]. That is, they
MUST NOT set numbered variables ${1}, ${2}... to the input values
corresponding to wild card sequences in the matched pattern.
However, if the extension is present, variable references in the
key strings or content type strings are evaluated as described
in the draft.
(That takes into account a suggestion from Nigel Swinson as well).
Nigel Swinson wrote:
7. Security Considerations
I suggest:
- replacement for a virus or spam filtering system.
+ replacement for a spam, virus or other security related filtering system.
Done.
Cyrus Daboo wrote:
--On March 28, 2005 15:36:30 +0100 Nigel Swinson
<...comments on the lack of clarity in matching multipart
types and a suggestion of an example...>
I've inserted the following example into section 4.2:
-----
Example:
From: Whomever
To: Someone
Date: Whenever
Subject: whatever
Content-Type: multipart/mixed; boundary=outer
& This is a multi-part message in MIME format.
&
--outer
Content-Type: multipart/alternative; boundary=inner
& This is a nested multi-part message in MIME format.
&
--inner
Content-Type: text/plain; charset="us-ascii"
$ Hello
$
--inner
Content-Type: text/html; charset="us-ascii"
% <html><body>Hello</body></html>
%
--inner--
&
& This is the end of the inner MIME multipart.
&
--outer
Content-Type: message/rfc822
! From: Someone Else
! Subject: hello request
$ Please say Hello
$
--outer--
&
& This is the end of the outer MIME multipart.
In the above example, the '&', '$' and '%' characters at the
start of a line are used to illustrate what portions of the
example message are used in tests:
- the lines starting with '&' are the ones that are tested when
a 'body :content "multipart" :contains "MIME"'
test is executed.
- the lines starting with '$' are the ones that are tested when
a 'body :content "text/plain" :contains "Hello"' test is
executed.
- the lines starting with '%' are the ones that are tested when
a 'body :content "text/html" :contains "Hello"' test is executed.
- the lines starting with '$' or '%' are the ones that are tested
when a 'body :content "text" :contains "Hello"' test is executed.
- the lines starting with '!' are the ones that are tested when
a 'body :content "message/rfc822" :contains "Hello"' test is
executed.
----
Cyrus Daboo wrote:
...
Header: Fix alignment of 'Philip Guenther'
1.3 : 'specifies it syntax' -> 'specifies its syntax'
2.2 : 'with extension' -> 'with the extension'
3.1 : reformat syntax
3.4 : 'all "body" tests fail' -> 'all "body" tests return false'
4.2 : reformat syntax
4.2 : the term 'document' is used to refer to a MIME 'part', I would
prefer using 'part' in all cases.
Appendix B: missing reference [REGEX]
All done
4.2p6 : 'decoded to prior' -> 'decoded prior'
Done. I've added text to require support for the 7bit, 8bit, and
binary transfer encodings so that the MAY only applies to
not-yet-standardized encodings:
MIME parts encoded in "quoted-printable" or "base64" content
transfer encodings MUST be decoded prior to the match. MIME
parts in "7bit", "8bit", "binary" content transfer encodings
MUST be matched as they are. MIME parts in content transfer
encodings other than those MAY be decoded, omitted from the test,
or processed as raw data.
4.3 : just for completeness add an example.
Added:
Example:
require ["body", "fileinto"];
# Save messages mentioning the project schedule in the
# project/schedule folder.
if body :text :contains "project schedule" {
fileinto "project/schedule";
}
My comments:
4.2 Body Transform ":content"
[...]
If an individual content type contains a '/' (slash), it
specifies a full <type>/<subtype> pair, and matches only
that specific content type. If it is the empty string, all
MIME content types are matched. Otherwise, it specifies a
<type> only, and any subtype of that type matches it.
I would like to see ABNF for the content type and some text explaining
what should be done if the user specified an invalid value here, e.g.
"/". I suspect the answer to this can be: no runtime error, but no match.
I would rather not drag in ABNF just for this single paragraph.
Indeed, I suspect the result would be more difficult to comprehend
when specified that way. As the only cases not covered by the
current text are values that begin or end with a slash or contain
multiple slashes, I've added an initial case to specify that they
match no content types:
If an individual content type begins or ends with a '/' (slash)
or contains multiple slashes, it matches no content types.
Otherwise, if it contains a slash, then it specifies a full
<type>/<subtype> pair, and matches only that specific content
type. If it is the empty string, all MIME content types are
matched. Otherwise, it specifies a <type> only, and any subtype
of that type matches it.
At this point, I think there is only one unresolved issue: what is
required for charset conversion when using :content? As stated
above, my preference would be to update the base spec revision's
secion 2.7.2 to require support for UTF-8, and then simply refer
to that in section 4.2 of the body I-D.
Opinions?
Philip Guenther