Re: SMTP delivery orders

Hoi folx,

due to the discussion on another list about SMTP delivery orders I have
reread some parts of rfc2821 and IMHO 821/2821/2821bis are very unclear
about what is meant to happen and how it should happen. IMHO some parts are
even in contradiction to each other.


I haven't seen any outright contradictions myself, at least not  within a
single document, but there's no doubt the advice given is a bit vague. I'm
dubious as to the wisdom of trying to tighten it up, however - different
strategies make varying amounts of sense in different contexts, and trying to
come up with rules for exactly how this should work in various use cases
strikes me as being very difficult if not impossible. At best there could be
some additional general guidelines, and I expect that even those would be hard
to get consensus on.

Also there was some fundamental controverse discussion about whether
"cannot connect" (i.e. no TCP connect) should be treated identically to
a 4xx error in the SMTP dialogue.


This in turn begs the question of whether all 4xy errors should be handled the
same. IMO they should not - I believe a 4xy error late in the dialogue should
be considered to be a delivery attempt and some time should elapse before
trying again. A 4xy as a banner, or as a response to EHLO, IMO warrants moving
on to other MXes immediately. The exact breakover point betweenn the two cases
is debatable, but (again IMO) probably should around the time the transaction
starts (i.e., MAIL FROM).

Let's assume that there is a (external) mail from jane(_at_)example(_dot_)net 
to
joe(_at_)example(_dot_)com(_dot_) example.com has the following setup:

    example.com.      IN      MX      10 a.mx.example.com.
                      IN      MX      10 b.mx.example.com.
                      IN      MX      20 c.mx.example.com.
                      IN      MX      30 d.mx.example.com.
    a.mx.example.com. IN      A       10.0.0.1
    b.mx.example.com. IN      A       10.0.10.1
                      IN      A       10.0.10.2
    c.mx.example.com. IN      A       10.0.20.1
    d.mx.example.com. IN      A       10.0.30.1

Case 1:
-------
Let's assume that each of the listed mailservers/adresses will return a
4xx error as an answer to the
    RCPT TO: <joe(_at_)example(_dot_)com>
command.

IMHO the sending MTA should
    connect 10.0.0.1
    connect random(10.0.10.1, 10.0.10.2) or preferred(10.0.10.1, 10.0.10.2)
    backoff for some time


I believe the MTA is marginally better off only trying once in this case, but
trying a bit harder is certainly acceptable.

Case 2:
-------
Let's assume that each of the listed mailservers/adresses gives a
"connection failure".
IMHO the sending MTA should
    connect 10.0.0.1
    connect random(10.0.10.1, 10.0.10.2) or preferred(10.0.10.1, 10.0.10.2)
    connect 10.0.20.1
    connect 10.0.30.1
    backoff for some time


The question here is whether or not to try all the b level MXes. Connection
failures can be due to path-specific issues, so I think it is marginally better
to try at least two A records. You might even want to be clever and try one
that's on a different subnet.

Yes, I know about the pros/cons of connecting to all addresses of multihomed
hosts. For a little simplicity let's assume "cons" now ;)

Opinions?


Well, I have to say that all of the modern MTAs I've seen have managed to find
strategies for handling this stuff that work very well in practice, so the
specifications for this seem to be "good enough". And if it ain't broke...  I
have a hard time believing that it is worth investing the significant time that
would be needed to "clarify" our specifications in this regard.

                                Ned