Re: Mail Data termination




--On Saturday, August 20, 2011 16:19 -0700 Bill McQuillan
<McQuilWP(_at_)pobox(_dot_)com> wrote:

On Sat, 2011-08-20, Paul Smith wrote:
...

I'm with Hector on this. I really don't like the idea of a
sender keeping a connection open 'just in case'.

If it keeps it open for 10 seconds 'just in case', and no new
mail arrives so it has to close the connection, what has that
achieved other than extra load on the receiver? Why wouldn't
it then say 'well, another message may arrive in the next few
seconds, so I'll keep it open just a bit longer', and so on.

If the sender doesn't like closing and opening connections,
then surely it's just as beneficial for the sender to wait 10
seconds before  starting to send a message, then if another
message to the same  destination arrives within that 10
seconds, it can batch them up without adding unnecessary load
to the receiver.

...

It seems to me that the measure is NOT the "number of clients"
but rather the number of messages handled.

Each client will have a certain number of messages to transmit
to the receiver over a time period, regardless of the number of
sessions used. When a client saves a session teardown and
reestablishment by delaying QUIT, so does the receiver. So it
would seem that a delay on the order of that time IS
warranted. I don't have a good feel for the amount of that
overhead, but I would guess that a second or two is reasonable
and ten seconds is approaching too long.
...


Folks, while this discussion started as a question about the
requirements of SMTP, it seems to be edging past even one about
operational optimizations and into a philosophical debate.  Let
me made one more observation, after which I'm dropping out the
thread.

Optimizations in queuing theory are a complex business,
especially if one tries to be really analytical about them.  In
a perfect world for the optimizer, a sending system would keep
track of traffic (both number and size of messages) to each
destination over a relatively long period of time, progressively
de-weighting older data (rather than just discarding them).  The
patterns of those data would then be used to build Bayesian
predictions of whether a new message to a given destination was
likely to be an isolated event or the beginning of a flow (to
that destination or others) with predictable characteristics.
Those predictions could, in turn, be used to condition decisions
about whether, and for how long, it was useful to keep a
connection open after sending the last queued message (noting
that the discussion has slid from "a few seconds" to "ten
seconds" and that difference, in todays's world, comes pretty
close to setting up and attacking a strawman).

One might use much the same data to predict whether delivery for
a message that is newly-arrived for sending should be tried
immediately (or as close to that as is feasible) or queued.  It
is almost obvious optimization that systems that try immediate
delivery of new message and succeed should drain any messages
queued for that destination in the same connection, but there
are scenarios (albeit usually unlikely ones) that actually make
that a bad strategy.  At least in the absence of deliberate
malicious attacks, one could diagnose or predict those
pathologies from an adequate database and set of predictive
procedures.  In addition, being able to drain the set of
messages queued for a particular destination implies that one's
queuing mechanisms are arranged to permit that: if, for example,
there is only a single queue and it is purely sequential, with
no destination threading or indexing, the amount of time and
effort required to find the messages relevant to a particular
destination and send them while the connection is open may not
be worthwhile.

Now I don't know any mail systems that carry that optimization
effort to an extreme that would amuse the theorists as close to
the possible limits of prediction.  Maybe there is one out
there, but I doubt it because I doubt that the resources that
would go into database-keeping and computationally-intensive
rolling predictions would be nearly justified by the performance
improvements they would yield.  So, in the real world,
implementers make guesses as to what optimizations will buy them
enough to be worth the trouble.  Those guesses are informed by
the observation that, possible specialized submission servers
and authors of spam-sending bots aside, almost no one builds an
SMTP client who is not also building SMTP servers: I believe
that it is fairly well understood in practice that writing
abusive senders that servers see as DoS attackers is, at best, a
really short-term optimization.

So, to the extent it is useful and appropriate, let's share what
we know about things that work and things that don't.  But let's
do so with the understanding that systems differ widely in how
they are organized, what optimizations are possible, and what
resources they can reasonably assume are available when they are
being operated.

     john