Re: After a 450, queue or try next MX?

----- Original Message -----
From: "Alex van den Bogaerdt" <alex(_at_)ergens(_dot_)op(_dot_)het(_dot_)net>
To: <ietf-smtp(_at_)imc(_dot_)org>
Sent: Wednesday, August 30, 2006 2:57 PM
Subject: Re: After a 450, queue or try next MX?

So I remove this particular host from the list to try and
end up trying something different _in the same sequence_.

So:
1: in that round I stop trying that host and continue at another host
2: I think I can even try this host again in the next round

I'm sorry but this doesn't convince me.

If this would be what I asked for, it would have read
something like "the SMTP client is discouraged from
continuing to attempt delivery of the message."

Keep in mind in how you expand your MX list.  The remote system will
probably randomized it, but you might want to keep track if a particular
host has been failing too.


Both hosts and domains can be tracked, preferably separately.

In our system, we only go to the next record when there is a connection
failure. Otherwise we follow the wishes of 45x or 550.  45x to try again
LATER, not within the same transaction attempt where you have a list of MX
to try.


We used to do things this way. We found that there were too many cases where
you could connect but get nothing, so we switched to the approach of trying
until we either succeeded or we got a permanent error. This also proved to be
problematic for a variety of reasons, inccluding the one (first MX gives 4yz,
second gives bogus 5yz error) that started this thread. And as coordinated
greylists get more popular I expect to see this approach become inceasingly
problematic.

We eventually settled on an intermediate strategy. Temporary failures before
the MAIL FROM caiuse us to retry using the next MX, failures at or after the
MAIL FROM do not. We have found that this works pretty well overall.

We track the bad connections(host) and if its reaches a certain point, then
the transaction is stopped and blocked. Future messages for the same host
have a "lets try again 1 time logic" otherwise it is permanently blocked
until cleared by the admin.


We don't do the permanent admin block (at least not automatically), but
we do all the rest.

In short, there are no real rules on how a system molds it retries. But
there are some common strategies.  See RFC 1123. It talks about strategies
that I think are pretty common.


Agreed.

                                Ned