Re: Proposal for Adjusted DATA Timeout

Robert A. Rosenberg wrote:
> One way to eliminate the duplicates if the SMTP server has accepted
> and sent the message but the MUA did not hang around to get the
> acceptance ACK (or did not send the QUIT) is to save the Message-ID of
> messages that the MUA did this for (I assume the retransmission will
> have the same MID. If/when it attempts reinjection, compare the
> supplied MID against the saved list and reject the attempt with a 5xx
> status code (and text saying "You submitted this already and I
> accepted and forwarded it"). You can prune the entry after a
> designated wait time or when the message is resubmitted (the latter
> possibly before the wait time expiration).

Our server does something like this - it actually stores a set of data
of message-id, date, sender & recipient(s) to (a) handle the situation
where the same message comes in twice, once for one recipient and once
for another, both with the same message-id, and (b) to try to handle the
rare (but not impossible) possibility of a duplicate message-id. It
doesn't reject the message if it's seen it before, but it silently
accepts it then discards it. (Rejecting the message would cause
confusion for the sender who might then think the message hadn't got
through).


Our server has the ability to do this as well, but we're quite a bit more
cautious. We hash the entire transaction, not just some subset of it - there
are just too many corner cases where ids get duplicated, where messages are
very close but nevertheless distinct, and so on.

There are various ways to speed up the hashes by using two level hashes,
or reusing hashes done for other purposes, that I won't bother to get into
here.

The other problem is hash storage. In an environment that handles hundreds of
messages a second or more (the  kind at most risk of timeouts and
resubmissions) keeping all those hashes around for all received messages is
problematic in and of itself. We therefore only bother when there's an
indication that the client actually timed out and gave up - the status write
failed, or the subsequent read failed. A client that then proceeeds to send a
QUIT, waits for a response, then closes the connection, is unlikely to be one
that resends unnecessarily. And this is another case where distinguishing
between initial submission and relay may be important.

This approach has basically eliminated the duplicate message problem for us -
when we're the server. But none of this works when you  don't control the
server. Hector's proposal is mainly for when you have admin control over the
client but not server. After all, if you control both ends, why not just coax
the server into presenting as little latency as possible and then adjust the
client timeouts to deal with whatever is left? (This IMO is why none of the
checkpointing schemss have caught on - if you control both ends you can usually
adjust things os they aren't necessary, if you don't then it doesn't help since
hte guy on the other end is unlikely to implement checkpointing for you.)

I must say that, personally, I've never come across a situation where a
10 minute timeout has caused a problem, but far too many will timeout at
the end of DATA even at 1 minute, and 5 minutes is not unusual.


We have run into quite a few servers where 10 minutes wasn't enough - usually
wierd gateways of one sort or another. But yes, the issue of clients with
overly short timeouts is much more common.

Personally, I'd prefer either a 'keep-alive' or an analogue to the NNTP
'IHAVE', so the sending MTA can say to the receiving MTA 'I have message
ID xxxxx' and the receiving MTA can say 'I've already seen that, don't
bother sending it again'. The latter would be more complex, but would
reduce resource consumption on the sending MTA.


I think this is worth considering, but I'm not sure it will help as much as you
might think. Leaving aside the DOS attack  potential (you can usually
accomplish much the same thing now with a server that talks R E A L L Y  S L O
W L Y), I worry that the way people will implement this on the server side now
that threads are readily available is by having a background thread that sits
there throwing out these responses. Done badly this could easily lead to
situations where the main thread is stuck in an infinite loop but the keepalive
thread keeps on saying "I'm here".

                                Ned

P.S. In the intersts of full disclosure, the main reason why I have never
botherred to write up the approach we use for server-side duplicate suppression
is that Sun has a patent on it: 7,080,123.