I have a somewhat different take on this. First of all, I have always thought
the admonition that you MUST minimize the amount of time spent before
responding to the trailing dot to the greatest extent possible was, well, bunk.
(It's also an effectively unenforceable MUST - who can say you've done all you
can or not? - which is bad in its own right.) While it is important not to
spend too much time, the difference between a millisecond delay and a 2 second
delay is, in this situation, not worth worrying about.
What *is* worth worrying about is deferring too much processing until later. Do
too much of that and you can end up accepting messages faster than you can
process them. This may not be a big deal at low transaction rates, but when
you're operating in the many hundreds of messages a second per MTA regime, it's
can be a real killer.
It can also lead, in a variety of ways, to generating excessive amounts of
Accordingly, my mantra has always been to do as much processing as possible
during the SMTP dialogue (not just after the trailing dot either - spread out
is better), but to avoid anything that could result in excessive (as in more
than a second or so) delays.
This approach has worked well for us. The timeout issues we've had have almost
always been the result of externalities we cannot control. For example, even if
our MTA used no resources at all, we cannot make up for someone who likes to
put the MTA's queues on an NFS mount connected over a network what appears to
be constructed of tin cans and string.
In more recent times we've mostly traded hardware performance issues for
spam/virus checker performance issues. Mind you, there are AS/AV solutions
available that are amply performant, but not everyone uses them.
As a result, duplicates were such a staggering huge problem. To mitigate
that, Postini computed a SHA-1 hash for every message, and if there was
any evidence that the receiver had sent the 250 OK, but the sender never
got it, the hash was saved in a database. If the sender resent the
message, the proxy (knowing it had already been delivered) quietly ate it.
When that database broke, the customers noticed immediately. The effect
was that significant.
First of all, in the interests of full and proper disclosure, I should mention
that this particular trick is patented: US Patent #7080123. (I'm the author,
but Oracle owns the patent.)
There was a time when this trick was clearly of significant benefit at some
sites, but I honestly don't know to what extent it benefits us now. (As I type
this the database of these entries on my home system contains 979 entries, but
almost all of them are likely associated with spammers that blast and
disconnect without waiting for the dot response. Of course they're just hashes
so I have no way of knowing which ones are legit and which ones aren't.)
One final point. This is a situation where it's useful to distinguish between
submission and relay. In the case of submission, there's usually a person
sitting there waiting for the message to be sent, so a shorter timeout not only
makes sense, you're going to have very little luck convincing client authors to
use a longer one. Relays, OTOH, tend to be much more tolerant of delayed
responses. There are of course all sorts of ways to make submit operations
responsive than relay operations.