Ned Freed <NED(_at_)innosoft(_dot_)com> writes:
Such problems can remain dormant for years before some type of outage
brings them to light. This proposal has the effect of increasing the
number of cases where fallback routing will be used, which will inevitably
result in an increase in this sort of behavior.
I'm having a hard time understanding your position. Some sites
implement MX records incorrectly, so you want to discourage the use of
MX records?
Most MX record problems show up immediately. But some do not. This change has
the effect of making certain more subtle problems much more severe. I'm not
sure that's a good thing -- it definitely offsets the gain in reliability
this change gives us.
As for my position, I have no problem with explicitly making it legal for hosts
to try a secondary MX in the event of a dialogue failure, or even recommending
such behavior. However, if we are going to recommend such things, there needs
to be a comprehensive discussion of the implications of doing so. I have
yet to see acceptable text for this.
I've had problems where the first-contacted MX for a domain was issuing
4XX replies for some reason and clients were not rolling over to some
other MX (either co-primary or secondary).
There is also the added complexity involved in caching errors.
This problem exists independently of MX fallbacks.
I don't understand. Sure, the issue of how to cache errors exists without
MX fallbacks. But MX fallbacks makes it substantially more complex, since
before you simply had one basic error (host is not reachable or not
responding) and now you have a whole spectrum of errors, many of which
seem to demand somewhat different handling. A 4xx in response to a HELO,
for example, would seem to be something you want to cache, while a 4xx
in response to a trailing dot may or may not be -- a "spool is full" error
would be cached, but a "user is over quota" would not be.
Without MX
fallbacks, the mail gets delayed instead of suboptimally delivered.
See my previous message. Another possibility is that one message is delayed
while others go through much quicker.
I have some possible advice on this subject (don't cache anything
you get after issuing a MAIL command) but that's more informed
supposition than implementation experience.
This goes to the heart of the matter. I am dubious about recommending things
that we do not fully understand the implications of.
If we're going to run around fixing these old versions to handle
this right we need to take care of their problems handling proper
pruning of MX lists.
Sendmail 8 already falls over to the next MX when it gets a 4XX reply
code. It also does MX randomization and MX piggybacking. It does do
proper pruning of MX lists and I believe it has done so for quite some
time.
There are (obviously) more mailers in the world than sendmail, to say nothing
of the versions of sendmail out there that are being upgraded independent of
the changes Eric makes.
Ned