ietf-mta-filters
[Top] [All Lists]

Re: [sieve] Issues with RFC 5260 - Date/Index (was: Issues with specifications)

2009-09-10 08:33:37
Hello!

On Wed, Sep 09, 2009 at 10:47:11AM -0700, Ned Freed wrote:
On Wed, Sep 09, 2009 at 12:04:01PM -0400, Cyrus Daboo wrote:
--On September 9, 2009 6:01:20 PM +0200 Hannah Schroeter
<hannah(_at_)schlund(_dot_)de> wrote:

[...]

I'm stalled with the Date/Index Extensions a bit because of a few things
that are unclear to me.

- Index extension: It seems it is not specified what happens if the
  index is out of range, that is it's greater than the number of headers
  actually found for the header names given to the test.

  if header :index 5 "Received" "..."

  if there are only 4 Received headers, for example.

The test is supposed to fail in this case. I agree the RFC could be clearer
about this.

Thanks for the heads on (I'm not fond of the wording "fail" for "return
false", as people not firm with the language used could more easily
confuse it with "error" or "temporary failure" than if one said "return
false" or "not match", but that's a general language issue with the
Sieve test specifications; sorry for the side note, if it's noise for
you). As you say at the end of your mail, there might be a clarification
or revision on the RFC, anyway, so one could include that.

  For this, I've gone on and made the test fail (not match) - even for
  :contains with an empty key -, but the RFC doesn't explicitly say
  this (in contrast to for example yielding a run-time error).

That's the correct behavior.

- The index extension doesn't specify its behaviour with respect to
  :count tests. (The date extension does.)
  For header, is it meant to yield 1 if the index is in range, 0
  otherwise? And for address, the number of addresses in the one
  selected header line (if the index is in range)?

The meaning of :count is tied to the test being performed. :index only
restricts the applicability of the test to a specific header field occurannce;
it does not change the interpretation of count in any way. So, when :index is
used with header, :count produces a 1 if the specific header with that index
exists, 0 otherwise. Address is more useful; it would return a count of all the
addresses in the header. Date would again be a 0 or 1 depending on the
existance of the header and whether or not it contains a valid date.

I don't see a need for further clarification here.

I do.

On one hand, yes, one can second guess the authors of the RFCs, as you
did, and as I did, too (but still asked). Our common conclusion is the
same, so it looks like a very reasonable second guess, that one can
deduce from the text of the specs,

RFC 5231

   The COUNT match type first determines the number of the specified
   entities in the message and does a relational comparison of the
   number of entities, as defined below, to the values specified in the
   test expression.

And :index modifies the specification of fields, voilà.

So yes, the deduction looks right.

Still, I'm in favor of specs where one doesn't have to use deduction
steps like those in the first place. Even if most would probably come to
the same conclusion, it takes some thought to reach it. People less apt
in English would have a more difficult time with it and one could ease
the issue with a spec that just doesn't leave room for any speculation.

Then, I've got some compiler construction background. And there, it's a
virtue of language specs to be just clear and well-defined and
unambiguous (even if existing specs often don't completely reach up to
that goal, even for "real" programming languages) and, IMO, also without
inducing the need to deduce too much about the intended syntax and
semantics. (Though one might have to deduce much about how to *achieve*,
how to *implement* it, for example if the syntax is difficult to parse
with commonplace techniques - C++ comes to mind - or if the semantic gap
between source and target is large - some functional or logic
programming languages come to mind.)

I'd be for clarification, even if our common result is the most, or even
the only, logical conclusion.

- Date extension:
  * Section 4.2 enumerates the possible values for the date-part argument.
    However, it does not specify any error handling for the case if the
    script uses an invalid value there; especially if that value is
    generated by use of the variables extension, thus possibly making
    static checking impossible.

      Should invalid date-part arguments be an error (static, if possible,
    run-time else)?

Yes, I think so. Our implementation certainly does.

It's not completely clear. There's precedent for tolerant behaviour, e.g.
allowing syntactically invalid header names in tests (2.4.2.2. in the
basic Sieve spec, RFC 5228), making the tests fail (return false), but
not an error, which is opposite to the precedent for strict behaviour
in other places I already cited.

As the precedent is mixed, the conclusion is less clear than in the
above question, so this should in my eyes really be clarified in a
further revision of the RFC.

Does your implementation also flag an error if it can be detected only
during run-time? (Causing an implicit keep and some kind of notification
to the script owner about the error, as 5228 says...)

      A precedent would be the "envelope-part" argument for the envelope
    test where implementations SHOULD consider unknown envelope parts
    an error (but not "MUST"). For comparators (that are given as string),
    it's even stronger: unknown comparators are an error (because they
    are either declared with require and then require causes an error if
    it's not known, or they aren't, then it's an error to use an
    un-require'ed comparator; section 2.7.3 of the base spec, towards
    the end).

I'm not sure why it's a SHOULD as opposed to a MUST, but I have no
problem with it.

  * Section 4.3 tells about comparator vs. date-part interactions.
    Do I read it right and that section is just recommendations for
    users?

Yes, that's all it is.

I.e. the implementation shouldn't reject a script that
    uses i;ascii-numeric in a non-recommended way (e.g. for "date",
    where it will in fact just use the year), but at most warn the user
    that the comparator may lose information (because it strips
    the value at the dash separating year from month)?

I'm not sure how you'd issue such a warning and the specification certainly
doesn't require it.

*nods* We currently have no mechanisms for warnings or style warnings so
we probably wouldn't add one for this case, anyway.

  * Date extraction: The specification (in section 4, before 4.1) says
    the implementation must be able to extract a date from the entire
    field content or from the end of a field, following a semicolon.
    As the obsolete syntax is still not weeded out, even in RFC 5322,
    this extraction may prove difficult.

    Received: [... received-info ...]; Mon, 10 (This is a ridiculous
      comment containing a semicolon: see here: ; ) Aug 2009 ...

    Undoing the folding is probably easy, but finding the *relevant*
    semicolon is difficult if I want to implement/use only a *Date*
    parser and spare myself the work of parsing the structure of the
    Received header too.

I have to say I'm not seeing the difficulty here. A common trick is to parse
from the end backwards. That way all you have to contend with is comments,
which requires nothing but a simple counter.

Yes, that was the thought in the next paragraph "first pass in
*reverse*".

Alternately, you can preprocess the entire header and remove all the comments
first.

Probably more work, as one would have to take possible other quoting
into account, where parens do *not* start comments - while when going
backwards, that isn't an issue, IIRC, as dates don't contain quoted
strings.

    The Date parser will be able to remove/skip
    the obsolete comment, but not the leading received-info. So we'd
    have to do a first pass in *reverse* to skip over CFWS until we hit
    a semicolon (outside the CFWS we skipped).

      Does the spec (RFC 5260) require that or would a simple-minded
    implementation (strip everything up to the last semicolon,
    regardless of structure, which would fail in my contrived example,
    i.e. make the test return false and use a count of zero) fulfill the
    spec too?

It's a border case, but I'd have to say I'd regard it as incompliant.

Okay.

I should also note that we're planning on a revision to the date-index
speification to fix an issue with the example Julian date routines (they return
regular, not modified, Julian dates). We might as well clarify these issues as
well.

Yes, I saw the discussion about the (modified) Julian date code there in
the archives.

                              Ned

Thanks.

Kind regards,

Hannah.
_______________________________________________
sieve mailing list
sieve(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/sieve