Re: draft-freed-sieve-in-xml status?

On Sun, Dec 14, 2008 at 9:59 PM, Ned Freed 
<ned(_dot_)freed(_at_)mrochek(_dot_)com> wrote:

On Sun, Dec 14, 2008 at 6:06 PM, Ned Freed 
<ned(_dot_)freed(_at_)mrochek(_dot_)com> wrote:

i note that the draft describes the infoset rather than defining it in
the standard way. is there a reason for this decision?


I don't know what "the standard way" is you're referring to. Perhaps you
could provide a reference to an RFC where this has been used?

AIUI XML is maintained by w3c (rather than IEFT) so is a
recommendation. http://www.w3.org/TR/xml-infoset/ is the current
document.


Quite true, however, the IETF has its own specification for XML is supposed 
to
be used in RFCs: RFC 3470. And while infosets are mentioned as one approach 
to
specifying things about an XML format, there's no recommendation, let alone
requirement, that they be used.

This document is a little unusual in that it's defining a mapping of, if 
you
will, a non-XML infoset onto XML. As such, the natural approach seemed 
to be to
first discuss the structure of the language being mapped, then explain 
the
mapping, and finish up with additional unique-to-XML semantics.

i agree that most of this arangement is natural. it's just jumping to
a schema seems - to me - a little premature and inflexible.


First of all, the use of XML Schema is in fact too inflexible to be allowed
to continue. The next revision will use Relax instead.

XML schema is flexible but the flexibility comes at the price of
readability. one of relax variants would be a better choice.

however (in my experience) the generative tools commonly used for XML
and web service binding, and editor generation tend not to offer good
relax support. IMO the draft should offer secondary informative XML
Schema or Schemata to assist developers using these tools.


The problem is that the unique particle attribution limitation in XML Schema
effectively precludes using it without some compromises. I am therefore opposed
to continuing to include it.

But I'm sitll a little confused as to what you're asking for here. If you're
asking for removal of the explicit inline XML syntax examples in favor of a
more abstract approach, I'd be fine with that if there's a WG consensus to 
make
such a change.

no - i'm very happy with the syntax examples

i would like to see the approach used in RFC 5023 (and others)
adopted, adding a normative description of the XML and making the
schema only informative.


Personally, I find RFC 5023 approach, like the XOPEN object descriptions it's
similar to, to be almost totally unreadable. Maybe it's the only reasonable way
to do it when the element structure is quite complex, but that's not the case
here.

So, absent some fairly strong support for this from others in the group, I'm
not going to pursue this.

sieve). there is a large and growing requirement for integration
between mail and enterprise systems (typically coding in Java and .NET
but also ruby and python). developers from enterprise backgrounds are
typically strong on web+xml but very weak on mail.


Yep, I've seen a lot of this as well. And the problem emcompasses far more 
than
Sieve: For example, a lot of people who are unfamiliar with email don't
understand very basic concepts such as the separation between envelope and
message content. (This particular issue actually pokes through into Sieve in
the form of whether an envelope or header test is appropriate.)

i beg to differ slightly on this one

some enterprise mail processing may happen during the SMTP transaction
but it is more typical for the mail processing after storage. not all
mail stored arrives through SMTP and so it is typical for any envelope
information to be reduced to simple MIME headers.


Robert, with all due respect, you may have substantial expertise on the XML
front, but your comments here are actually doing little more than illustrate
the validity of my argument that there's a general issue with people not
getting how email works that isn't going to get fixed by anything we do here.
If you want this addressed the place to look is the email architecure
specifications being worked on by Dave Crocker.

And it is NOT typical for envelope information to be stored as headers. There
are several reasons for this:

(1) Envelopes only exist between the time of submission and final delivery.
    Transport actions do record certain bits of envelope information in
    trace header fields and final delivery is supposed to copy some additional
    envelope information into a couple of header fields, but these are NOT
    a message envelope and it is mistake to assume they are.

(2) During the time the envelope exists it is highy mutable, often changing
    form at every hop. This makes header storage of envelope information
    somewhat problematic.

(3) There are several SMTP extension that add to the envelope in various ways,
    requiring negotiation of what envelope information can and cannot be
    passed from one system to another. This tends to interact badly with
    schemes that store envelope information as a static part of the message.

(4) The fact that protocols other than SMTP are used for various email
    operations doesn't necessarily impact header/envelope separability.
    Other protocols maintain this separation and at least one of them, X.400,
    actually has a far greater degree of separation than SMTP does.

(5) Because there are effectively no controls on what ends up in headers, it
    is fairly easy for the separation between "header" headers and "envelope"
    headers to get lost. Among other things, this can create serious
    security vulnerabilities.

Now, this is not to say there aren't various ad-hoc schemes in use where active
envelope information ends up getting stuffed into the header. Such schemes date
back to BITNET's use of X-Envelope-To: to work around the 8x8 limit and
probably long before. But in my experience at least these things invariably
fail to provide a full and correct mapping for all of the possible information
that can exist in an SMTP (or X.400) envelope. And as a consequence they
invariably cause problems because of their inability to truly express envelope
semantics.

Indeed, if you have to capture envelope information in a static form - the main
current use-case for this is compliance archiving - in most cases you're better
off NOT using header-based schemes. We even have a standard format defined for
this: Batch SMTP. Although the format that's probably used the most is the one
Exchange generates that they call "envelope journaling", which puts the
envelope in the first text part of a MIME multipart. (On a side note, if anyone
knows where there's a precise and complete specification of the syntax used for
envelope journaling, I've appreciate a pointer.)

most developers in
these mail processing environments do not need to understand the
difference between envelope and message content because - for them -
there is no difference.


Yeah, that's what a lot of them think. The problem is they're quite simply
wrong, and it is isn't a harmless thing to be wrong about. I get plenty of
support calls from customers who got screwed by this lack of understanding.

And it is NOT a minor detail when someone sets up a compliance archiving system
that ends up in many cases not being able to determine who actually sent or
received a given message. (I only wish I was making up this example.)

Sieve works very well as a general MIME document processing language.


Actuallly that's not Sieve's purpose at all and it isn't something Sieve is
currently good at. In fact we've only recently taken the first fairly tentative
step down the MIME processing path with the MIME loops extension and possibly
the convert extension. We'll see how well that turns out, but I have to say I'm
not optimistic that it will replace existing ad-hoc MIME processing facilities
like MIMEdefang.

the envelope tests are - in many ways - peculiar since the rest of the
specification really isn't mail specific. there are potentially some
very interesting applications in this area so it would be a shame - i
think - for the expert group to focus too strongly on SMTP at the
expense of other IMHO equally valid Sieve use cases.


I don't object to the use of Sieve in other contexts - in principle. But the
devil is in the details. A good example of this is the use of Sieve in an IMAP
server as defined in draft-ietf-lemonade-imap-sieve-05.txt. This doesn't seem
like too much of a stretch from existing usage, but when I reviewed this
document a while back I found all sorts of semantic mismatches, some of them
quite serious.

But here's the dilemma: This stuff is complicated and in some cases fairly
subtle. This in turn means that the reiteration of even a subset of the
underlying design principles that implementors need to know takes up a lot 
of
space and will still fall short of the mark of giving the necessary 
guidance.
But it may lead to the belief that reading this specification (or for that
matter this one and RFC 5228) is in fact sufficient to understand how to use
Sieve. It quite simply isn't.

again, i beg to differ

sieve is very similar structurally to the guerrilla standards used in
enterprise mail system for more than 5 years now. for most mail
processing applications, only the container builders need to have a
good understanding of the protocols. application developers are
offered a safe environment and an OOP interface. i see no reason why
sieve should be any different.


Understanding of the protocols isn't necessary, but I'm very much afraid  that
there's no avoiding an understanding of email semantics if you want things to
work properly. We may wish it were otherwise, but it just isn't.

                                Ned