draft-ietf-sieve-spamtestbis-02.txt


Hi,

Regarding draft-ietf-sieve-spamtestbis-02.txt :

Mostly it looks pretty nice (despite the length of this note).  I have
one potential conflict, and then a nitpicky thing (with many instances)
that could be completely ignorable.

The conflict: There is text describing the two forms of result strings
that the underlying implementation provides for testing against.  One
form is a digit string and some optional text, with the digit string's
value used in relational comparisons, and the text used in string
comparisons.  There's a warning against using the string part as it's
non-portable.  The other form is a simple string "untested" that
indicates that no test has been done, and that also is used in string
comparisons.  So basically both forms can be used in string comparisons,
which is kind of ugly and ambiguous.  If the recommendation is not to
use "untested" and use non-:percent instead, why not just drop the
"untested" result?  Either that or you have to have a prohibition
against the underlying implementation returning the string "untested" as
the optional part when a digit string is returned.  Or, perhaps, have
the untested result be "0 untested" vs "0[ anything-else]" for tested and
clear.


The nits:  my big bother is the overloading of the words "spamtest" and
"virustest" to refer to both the new Sieve verbs and the underlying
implementation's analysis (and words about the "result" and the "return"
from the commands).  The Sieve-enabled application interprets the
underlying test results, normalizes it, and gives it as input to the
command, and the command uses that normalized evaluation and applies
some logic to that, and essentially produces a true or false result.
E.g.:

3.1.  General Considerations

   The "spamtest" and "virustest" tests described below can both return
   a string that starts with a numeric value, followed by an optional
   space (%x20) character and optional arbitrary text.


I understand what this means, and it may not even be confusing except to
ultra-literal readers, but still: it talks about what "spamtest" and
"virustest" (the names of the two new commands) return.  The commands
themselves return (or evaluate to) true or false; their input is the
normalized result described in the quoted sentence.  I've mentioned this
before and perhaps some of it has been improved.  But still..

To be specific, here are the places that I think contribute to the
overloading of the terms, and some suggestions.

Abstract

   The SIEVE email filtering language "spamtest", "spamtestplus" and
   "virustest" extensions permit users to use simple, portable commands
   for spam and virus tests on email messages.  Each extension provides
   a new test using matches against numeric 'scores'.  It is the
   responsibility of the underlying SIEVE implementation to do the
   actual checks that result in values returned by the tests.


   It is the responsibility of the underlying SIEVE implementation to do
   the actual checks that result in proper input to the tests.

1.  Introduction and Overview

   The purpose of this document is to introduce two SIEVE tests that can
   be used to implement 'generic' tests for spam and viruses in messages
   processed via SIEVE scripts.  These tests return a string containing
   a range of numeric values that indicate the severity of spam or
   viruses in a message, or a string that indicates the message has not
   passed through any spam or virus checking tools, or provides a direct
   indication of whether the message has been tested for spam or not.
   The spam and virus checks themselves are handled by the underlying
   SIEVE implementation in whatever manner is appropriate, and the
   implementation maps the results of these checks into the numeric
   ranges defined by the new tests.  Thus a SIEVE implementation can
   have a spam test that implicitly checks for third-party spam tool
   headers and determines how those map into the spamtest numeric range.


I would rearrange slightly and disambiguate.  And frankly I would move
some of the details down to the section 3 intro, and leave the overview
more overviewy, e.g. just say here that the new tests relieve the script
writer of knowing the intimate details of the spam tests.

   The purpose of this document is to introduce two SIEVE tests that can
   be used to implement 'generic' tests for spam and viruses in messages
   processed via SIEVE scripts.  The spam and virus checks themselves
   are handled by the underlying implementation in whatever manner is
   appropriate, so that the SIEVE spam and virus test commands can be
   used in a portable way.

And then move the specifics down to 3.1, q.v.

3.1.  General Considerations

   The "spamtest" and "virustest" tests described below can both return
   a string that starts with a numeric value, followed by an optional
   space (%x20) character and optional arbitrary text.  The numeric
   value can be compared to specific values using the SIEVE relational
   [I-D.ietf-sieve-3431bis] extension in conjunction with the "i;ascii-
   numeric" comparator [I-D.newman-i18n-comparator], which will test for
   the presence of a numeric value at the start of the string, ignoring
   any additional text in the string.  The additional text can be used
   to carry implementation specific details about the tests performed
   and descriptive comments about the result.  Tests can be done using
   standard string comparators against this text if it helps to refine
   behaviour, however this will break portability of the script as the
   text will likely be specific to a particular implementation.



    The "spamtest" and "virustest" tests described below evaluate the
    results of implementation-specific spam and virus checks in a
    portable way.  (The implementation may, for example, check for
    third-party spam tool headers and determine how those map into the
    way the test commands are used.)  To do this, the underlying SIEVE
    implementation provides a normalized result string as one of the
    inputs to each test command.  The normalized result string is
    considered to be the value on the left hand side of the test, and
    the comparison values given in the test command are considered to be
    on the right hand side.  [e.g., something like what rfc3431 says.]

    The normalized result string may be provided in one of two formats:

     1. A digit string, with its value being within a range of numeric
        values used in the specific SIEVE command, indicating the
        severity of spam or viruses in a message or whether the check
        was done at all.  This may optionally be followed by a space
        (%x20) character and arbitrary text.  The numeric value will be
        used when a relational test is done.  The optional arbitrary
        text can be used to carry implementation-specific details about
        the tests, or for descriptive comments about the result.  This
        optional text will be used when standard string comparisons are
        used.

     2. A string indicating that the message has not passed through any
        spam or virus checking tools.  This string is used when
        standard string comparisons are used.

3.2.  Test spamtest

  [...]

   The "spamtest" test evaluates to true if the spamtest result matches
   the value.


   The "spamtest" test evaluates to true if the normalized result
   matches the value.

3.2.1.  spamtest without :percent argument

   When the ":percent" argument is not present in the "spamtest" test,
   the result of the test is a string starting with a numeric value in
   the range "0" (zero) through "10", with meanings summarised below:


   When the ":percent" argument is not present in the "spamtest" test,
   the normalized result string provided for the left side of the
   test starts with a numeric value ...

   In this example, any message that has not passed through a spam check
   tool will be filed into the mailbox "INBOX.unclassified".  Any
   message with a spamtest value greater than or equal to "3" is filed
   into a mailbox called "INBOX.spam-trap" in the user's mailstore.


   Any message with a normalized result value ...

3.2.2.  spamtest with :percent argument

   When the ":percent" argument is present in the "spamtest" test, the
   result of the test is a string starting with a numeric value in the
   range "0" (zero) through "100", with meanings summarised below, or


   When the ":percent" argument is present in the "spamtest" test, the
   normalized result string provided for the left side of the test
   starts with a numeric value ...

   In this example, any message that has not passed through a spam check
   tool will be filed into the mailbox "INBOX.unclassified".  Any
   message with a spamtest percentage value greater than or equal to
   "30" is filed into a mailbox called "INBOX.spam-trap" in the user's
   mailstore.


   Any message with a normalized result value greater than or equal..

3.3.  Test virustest

   [...]

   The "virustest" test evaluates to true if the virustest result
   matches the value.


   ... evaluates to true if the normalized result string matches the
   value ...

   The virustest result is a string starting with a numeric value in the
   range "0" (zero) through "5", with meanings summarised below:


   The normalized result string provided for the left side of the
   test starts with a numeric value ...

   In this example, any message that has not passed through a virus
   check tool will be filed into the mailbox "INBOX.unclassified".  Any
   message with a virustest value equal to "4" is filed into a mailbox
   called "INBOX.quarantine" in the user's mailstore.  Any message with
   a virustest value equal to "5" is discarded (removed) and not
   delivered to the user's mailstore.


   Any message with a normalized result value equal to "5" ...

mm