I'm not an DNS server expert so I am not sure if this mostly a configuration
issue, but my research reveals different information.
Currently our SMTP outgoing mail client is designed to check each DNS server
provided when doing a QUERY except for NXDOMAIN results.
I assume you mean you try the first server on your list, and if it has a
problem you move on to the second. Either a succesful return or a NXDOMAIN
response should result in no further queries.
This seems like a reasonable strategy to me.
This week, I ran across a particular new customer setup with an email domain
that he hadn't created a MX record yet. He was using a sub domain,
db.usinterlink.com.
FWIW, www.dnsreport.com says that the timeouts associated with the DNS entries
for this domain are too high. This sort of thing makes it difficult to add such
an MX record and get it recognized quickly.
Of course, the SMTP client should:
- Do a MX query
- if none result, do a A record query
The problem is that I found while some other DNS servers return NOERROR, my
DNS server returned SERVFAIL to the MX query.
This sort of thing happens all the time. I'd guess it comes up in a customer
query for us about once every month or two on average, and has done so for well
over a decade. The usual cause is that some sort of infrastructural change has
occured - a server's IP address has changed, some glue record has changed,
whatever - and some server in your setup has managed to cache some bogus
information that prevents it from completing the query.
Another situation you sometimes see is that there is now a configuration
error but some other server managed to cache the information before the
error was made. So the effect it is works but your server doesn't.
My usual advice is to try clearing all the DNS caches you have access to.
In many cases this makes the problem go away.
Tracking this stuff down is very difficult, especially since these sorts of
problems tend to be transient and by the time you have all your debugging ducks
in a row the problem may have vanished. I also have to say I haven't found the
tools available for debugging these problems to be all that sharp.
Here is what I found on the net as four different answers:
1) SendMail Configuration/New Behavior
http://www.brandonhutchinson.com/host_map__lookup_(domain)__deferred.html
However, if the A or MX record lookup for the domain returns a
"SERVFAIL," Sendmail will queue the message, believing it has
encountered a transient DNS problem. For example, if a domain has a
valid A record but returns a "SERVFAIL" when queried for an MX record
(instead of "NOERROR" with an empty answer section), Sendmail will
queue the message. You should contact the remote name server
administrator in order to fix these problems."
IMO this is the correct thing to do.
2) Use Multiple DNS server.
An IBM solution was to suggest to make sure you have additional DNS servers
to query.
Well, sure. Having at least one secondary DNS server is pretty much essential,
and they need to be geographically separate. I note in passing that
interlink.com has two servers and they appear to be on separate networks.
ns5.ecsecure.com. [208.56.100.1] [TTL=86400]
ns6.ecsecure.com. [216.147.1.227] [TTL=86400]
But this really has nothing to do with how SERVFAIL is handled by an SMTP
client.
Sadly, one of most irrational actions taken in the fight against spam has been
to go after people hosting secondary DNS servers when there's a problem with
some machines associated with the domain. For example, I used to provide
secondary DNS service for a university. They got careless and ended up with a
number of zombies on their network and I got blacklisted as result, even though
all I did was provide secondary DNS service.
I don't know how widespread a practice this is or to what extent this has had a
chilling effect on people being willing to provide secondary DNS service, but
it certainly made me more reluctant to provide such services.
3) Lame Delegation
I saw other comments pointing it to be mostly a DNS configuration issue,
Lame Delegation?
One lame delegation out of several is another case that can cause these sorts
of mixed results. If your server happens to follow the working delegation path
you win, if not you lose. And which one you get may be more or less random. And
once things are cached...
But this again is a cause, not a solution.
4) Ignore SERVFAIL?
Some just said that the SMTP client should be looking at SERVFAIL as a
NXDOMAIN, etc.
Bad idea IMO. Configuration glitches happen, and when they do you don't want to
bounce mail to the domain unnecessarily. Most of the time these problems get
fixed and the mail goes on through with only a small delay.
Ned