Case 1 - 4xx errors
Case 2 - connection failure
Well, I have to say that all of the modern MTAs I've seen have managed to find
strategies for handling this stuff that work very well in practice, so the
specifications for this seem to be "good enough". And if it ain't broke... I
have a hard time believing that it is worth investing the significant time that
would be needed to "clarify" our specifications in this regard.
I can add that for our own MTA, we use the connection failure as a
trigger to try the next MX expanded/priority sorted IP list entry. A
4xx will re-queue the message for the next send attempt cycle based on
the admin's local policy frequency table. A 5xx, of course, stops the
transaction attempt altogether and initializes a bounce process.
A few things come to mind:
a) For scalability reasons, there is an inherent presumption that a
receiver farm are just workers for the same remote backend server
serving the target recipient account. So a 5xx/4xx response for one
receiver would yield the same for the other receivers. I think it would
be unpractical to presume that you will get a different 4xx result by
simply going to the next IP during the same message queue attempt. I
think a remove server who believes a MTA will immediately try the next
IP based on a 4xx, is making a wrong generalization.
b) With the help of DNS, for larger MX systems it is prudent they make
sure that a random round robin of IPs are presented for their own sake
of maximizing efficient system availability and throughput.