Re: DATA Pipelining

On 2 Jan 2010, at 22:00, Tony Finch wrote:

On Sat, 26 Dec 2009, Sabahattin Gucukoglu wrote:

As I see it, there is no reason not to pipeline both the message text
lines and the final dot, because nothing is expected from the server and
until the dot is received there is no change of state other than the
accumulation of the DATA buffer.


You can pipeline more than that. You can send one message per RTT if you
pipeline one message's data with the next message's envelope, i.e.
stream out all of headers, body, dot, RSET, MAIL, RCPT, DATA; then wait
for the server's replies; then repeat.

Sorry, I meant unextended SMTP.  I was wanting to know whether, after the
354, I could send multiline data buffers followed by the dot, only single
lines, or lines then just the dot.


Yes you can. TCP is a _stream_ protocol, after all.

The difference is simply in how much gets put into the send buffer, and the
rationale is avoiding silly bugs that can neither be specified nor assumed
nonexistent.


Attempting to avoid such bugs on the client side is an exercise in futility,
due to the nature of TCP and IP underneath it. You cannot in general prevent
packet fragmenation, reassembly, rebuffering, and all sort of other
entertaining stuff from accurring, so your attempts to align your buffering to
accomodate a broken received that doesn't handle its buffers properly are going
to fail - the only question is when, not if.

If the server supports the BDAT command then your client's sending and
receiving pipelines can be completely asynchronous. However if you do this
you massively increase the risk of message duplication caused by lost
connections. There's some discussion of this in
http://www-uxsup.csx.cam.ac.uk/~fanf2/hermes/doc/qsmtp/draft-fanf-smtp-rfc1845bis.html
However it's probably easier in practice to fix this problem using some
server-side deduplication cleverness without protocol extensions.

This got me thinking on the precise nature of data duplication due to long
delays between the final dot of the DATA block and the 250.  Given that the 
250
is the first send from the server after each client wait, would it not be
feasible for the server to treat as undelivered any transaction for which it
could not send the final acknowledgment,


No. There are cases where an error is returned but enough of the data made it
for the client to consider the message as having been processed.

Checkpointing, server session hashes, and similar tricks are the way to solve
the duplication problem. But it seems the market has decided that the benefits
of doing this stuff don't outweigh the costs.

thus sympathising with the client's view of things?  Are there any
difficulties, for instance, is there the possibility that the server could 
send
the acknowledgement but it not be delivered in spite of a lack of error from
the underlying transport?


It is actually pretty common, when the client has given up and gone away, for
the server ack to go through without an error and the next read is what fails.

But I also know there are implementations out there which don't like it
when a command is incomplete, and try to respond to whatever is in their
TCP buffer, and wondered if anybody knew how that might work for the
final dot if it were stuffed at the end of some text.


Such implementations are not allowed to advertise PIPELINING.

The worst example I have seen recently was a Cisco PIX or ASA with SMTP
fuxup mode turned on. It replaces unrecognized SMTP commands with XXXX,
but if a known command is split between packets it cannot recognize it so
XXXXs it out. (More details at http://fanf.livejournal.com/102206.html).
Perhaps that is what you are thinking of. This is just one of many bugs in
the PIX SMTP implementation. Avoid it at all costs.

Sounds terrible!  But no, I was talking about *commercial MTAs*, usually for
Windows, which if you connect to them with telnet in stream mode will happily
disgorge "500 Unrecognised" whenever you send them a character.  Yes, they're
out there, really!


Command processing is actually a rather different case from data processing,
mostly because commands tend to be small and therefore don't run into 
fragmentation and other TCP/IP stuff. This allows broken implementations that
mishandle commands enough leeway to work well enough to survive in the field,
which is why PIPELINING is an extension rather than being an optimization
technique.

                                Ned