Re: Cullen's DISCUSS on draft-ietf-sieve-3028bis-12.txt, take 2

After talking to Cullen Lisa thinks that one or two of the following
changes should address Cullen's DISCUSS (I assume that the text I've
suggested earlier should be included in some form into 3028bis anyway,
this is in addition to this):

1). Limit applicability of SIEVE, e.g. to deployments where
administrators can disable accounts abusing scripts.
2). Add text with helpful guidance about script analysis or  forking
detection.
3). Add requirements for use of mail headers to more reliably detect
forking.

=======
So here is some background on the three choices and are my comments on
them (strictly as an individual contributor).

1). This is really no brainer. Any deployed mail system should have a
way to disable accounts.


Of course they do. It's an absolute requirement for any real world system for
many reasons having nothing to do with Sieve. And most systems provide more
than one way to do it - ours has well over a dozen. I can even recall
discussions of the best ways to disable incoming email for specific accounts,
or for that matter how to disable .forward files specifically, dating back to
the early 80s.

I would suggest adding the following:

  Sieve implementations MUST provide facilities to allow administrators
to disable accounts abusing scripts.


I certainly have no problem with adding such a statement.

2). Ned wrote in a separate email about # 2:

> Script analysis is one of those tri-state things. It can conclude that:
>
> (1) A script is harmless.
> (2) A script is harmful.
> (3) The script cannot be analyzed.
>
> Now, in practice the _overwhelming_ majority of actual scripts will fall into
> one of the first two categories. This is especially true when scripts are
> created by a GUI - GUIs tools tend to construct straightforward scripts 
without
> any of the complexities that hinder analysis.
>
> And even when the conclusion is (3), that actually tells you something. A
> really sophisticated system might even note the presence of a highly
> complicated script and watch even more carefully for abuse.
>
> Heck, even a very naive analysis can be useful. For example, to the extent
> redirect offers capabilities beyond those of a .forward file, they only arise
> when the address redirect sends the message to can be controlled by the 
message
> itself. For that you really need Sieve variables (and hence this is out of
> scope for the Sieve base specification). So one very simple thing you can do 
is
> look for the use of variables and the presence of ${} constructs in redirects.
> A setup that allows users to configure arbitrary sieves might want to check 
for
> this combination and either disallow or flag it in some way.

I don't think the discussion about looking for variables in the redirect
address belongs to the 3028bis, because 3028bis itself has no variables.
Apart from that something along the lines of Ned's text can be included.


Agreed. I can work on a cut down version if you want.

3). [I hope that the following is not too cryptic for people to
understand. If it is, I can try to provide a more detailed explanation
later.] Lisa has suggested in a private email a new header field
(Mail-Fork-Estimation) that can be used to convey the estimated total
number of redirects that happened so far as estimated at the previous
hop. The value of this header is multiplied by the number of redirects
that happens at the current hop and if it is bigger than 100, the
message is going to be dropped at the next hop. So this is similar to
counting Received headers, but compensates for multiple redirects.


First, given the plethora of autoforwarding mechanisms in near-universal use
today, any such mechanism has to be defined as a general email requirement in
order to be in any way effective. Simply requiring Sieve implementations, which
despire being widely deployed nevertheless account for only a tiny fraction of
the autoforwarding capabilities available to end users, isn't going to cut it.

Now let me remind everyone that our purpose here is to work on Sieve, not to
define general mechanisms for email as a whoie. And even if defining this for
Sieve only made sense, here's what our charter says about 3028bis:

(1) Revise the base sieve specification, RFC 3028, with the intention
   of moving it to draft standard. Substantive additions or revisions to
   the base specification are out of scope of this working group. However,
   the need to loosen current restrictions on side effects of tests as well
   as the need for a normative reference to the newly-defined comparators
   registry may necessitate a recycle at proposed.

It is quite clear that this this would be substantive change to the base
specification and as such is clearly out of scope.

Now, as to the question of whether or not it makes sense to define this for
email in general: My main concern is that this mechanism deals with the
necessarily incomplete information a single autoforwarder has by trying figure
out an upper bound on the number of recipients that have been added. The
problem with this is that it is incredibly easy for the upper bound to end up
being much too large. For example, suppose you have one autoforwarder that
spits out 11 recipients, one of which is itself an autforwarder with 10
recipients. Despite the fact that the message is only going to be sent to 20
people at most this case is going to trip the detector and result in the loss
of messages.

The only way to address this is to bump or disable the limit, but the
multiplicative nature of the estimate means that the limit will have to be set
pretty darned high (likely well over 1000) before it ceases to cause trouble.
And at that point the limit is probably not going to be effective at stopping
actual abuse.

The multiplicative nature of the overestimate may even make it possible for
users to game the system to cause deliberate loss of message without themselves
being blamed. For example, suppose I need to send mail to A and B but
I want to insure that B's copy gets lost somewhere along the way. I happen
to know that B has autoforwarding in place to direct his mail to both his
real account and to his pager. All I have to do is set up an autoforwarder
with 50 dummy addresses in additiona to A and B's address. I send the mail
and B's copy exceeds the limit and is silently dropped.

Similar fun and games can actually be had with Received: line limits. The
difference is Received: line counts are additive, making the cutovers between
successful delivery and failure much harder to predict and control.
Multiplcative limits increase much more quickly and are therefore much easier
to target.

The mechanism as currently defined also fails to take into account the very
real case where complete information for multiple autoforwarding hops actually
is available. For example, large enterprises routinely employ complex
mutlilevel autoforwarders with a high degree of overlap between the various
recipient lists. They then depend on the fact that all levels are evalauted in
tandem so duplicate addresses can be eliminated. (I've seen real world cases
where a bug in duplicate elimination logic caused single recipients to receive
around a hundred copies of each message - that's how much overlap there can
be.)

So for this to work in such cases the count either has to be based on the
actual number of recipients that pop out of alias expansion, which is going to
be very difficult to get at in some implementations, or else there has to be a
way to "back out" some number of recipients from the total when it is
determined that they don't actually exist. This may argue for representing the
value as a series of separate fanout estimates you mulitply together to check
rather than collapsing it down to a single number. (Or it may not - I haven't
thought through all the implications and implementation issues fully.)

And there are also privacy implications to consider. Having this header in the
message unavoidably reveals information about the extent to which the message
has been forwarded/redirected. And since that information has to appear in all
copies at ang given point, it can reveal aspects of one recipient's forwarding
activities to another. As it happens I just got off a call regarding a large
customer of ours where exactly this issue came up - this exposure would be a
MAJOR no-no for them. One way to address this would be for this to be done with
an SMTP extension rather than a header field, but that brings along its own
deployment issues.

Another argument against having this as a header is that what's beingproposed is actually a pretty serious layering violation. Headers are supposed

to be the province of user agents, and while MTAs routinely prepend header
fields to messages they aren't supposed to be mucking around with existing
headers. Of course this can be avoided by using the SMTP extensions approach,
but another way to do it would be to use multiple header fields. Each
autoforwarder would then add an additional fork-count field and the check
would be to multiply the values in all the fields together and see if they
exceed the limit.

And once you get to the idea of using multiple fields for this, this becomes a
variant of another idea that has been around for a long time: Something X.400
calls DLExpansionHistory. In fact if memory serves while the X.400 variant of
this didn't have a way to track the number of recipients added by each
expansion operation, the Message Router variant did have it. So another
question that really should be asked and answered is why there has been
essentially no interest in adapting this approach to Internet mail up to this
point.

The bottom line is what I suspect people are thinking of as a simple addition
to Sieve specifically is in fact a fairly complex matter with lots of
ramifications and implications for email in general. It is therefore totally
inappropriate to consider defining this in the present context.

While I think this might be a neat research idea and should be published
as a draft, I have a problem with mandating this in 3028bis:


As do I - see above.

1). If this is mandated in 3028bis, this would take some time to deploy.
This would also render *all* existing implementations non-compliant,
which is not a good thing. While I can see that this might help in the
future, I don't think it is necessary and 2 other approaches (script
analysis and abuse detection+script disabling) already work quite
reasonably


Agreed.

2). This proposal changes how redirect has to be implemented by many
implementations.


And not just redirect. Again, for this to be in any way effective it would have
to be part of every autoforwarder, not just Sieve.

Currently for some implementations a redirect can be a
fire-and-forget operation (after adding it to logs). The actual message
submission is performed asynchronously. Your proposal will require that
all redirects for all users (a single message can be destined to
multiple recipients and all of them can have Sieve scripts) are to be
performed by an MTA at one moment, because all of them will have to have
the same Mail-Fork-Estimation header field. In Sieve WG we have many
different implementations and we were trying vary carefully not to put
unnecessary requirements on implementations.


I actually don't think this objection is valid as stated. The entire point of
using the multiplicative overestimate is to deal with the fact that scripts
unavoidably operate in isolation and cannot see what is going on elsewhere.
Even if couple the results of multiple sieves on the same system, this does
nothign to address the case where the sieves are evaluated on different
systems.

The one exception here is the case where autoforwarders are intentionally
operated in tandome in order to get proper duplicate elimination between
recipient lists with high degrees of overlap. In such cases the the estimate
has to reflect the results of duplicate elimiantion as I discussed above. But
this really isn't a Sieve redirect issue, if for no other reason than it isn't
practical to use Sieve redirect to build such setups - redirect's definition is
sufficiently loose that it allows changes to be made to messages that would
preclude duplicate elimation.

====
To summarize: I personally disagree that # 3 is a solution suitable for
3028bis. I agree that some combination of 1 and 2 is reasonable. I would
need some help to come up with text covering # 2.


I agree with this assessment as well and would be happu to help with text
for 2.

                                Ned