Re: Call for action vs. lost opportunity (Was: Re: Renumbering)

To my small mind, forcing a new DNS lookup in the event of a
TCP session failure and restart would be a good thing.

perhaps, but it won't work reliably as long as there can be more than
one host associated with a DNS name, nor will it work as long as DNS
name-to-address mapping is used to distribute load over a set of hosts.


      We already have the DNS hooks to distingish services from
      hosts.  We had them for the last 8 years.

Yes but SRV records weren't really meant to handle this case either. 
And they actually can make applications less reliable because they
introduce a new dependency on DNS (another lookup that can fail, in a
different zone and potentially on a different server, another piece of
configuration data that can be incorrect.)  What we'd really need is a
RR type specifically intended to map service names onto instance
ID+address pairs, and also a special query type that wasn't defined to
return all of the matching RR records, but would instead return a random
subset or a subset based on heuristics, and finally an instance ID to
address mapping service.  But arguably DNS isn't the right place to do
that at all - there should instead be a generic referral service at
layer 3 or 4.

Of course, part of the reason that people started using A records to
refer to multiple hosts was that a number of applications "just worked"
when they did that.  And I remember when people used to object loudly to
such things, and insist that a DNS name and a host name had to be the
same thing.  Anyway, this kind of overloading of A records has been such
a widespread practice for so long that I don't see it changing.  And
it's not as if we came up with a better way of doing things for IPv6
addresses.

in other words, doing another DNS lookup of the original DNS name only
looks like a good way to solve the problem if you don't look very deep.
 
now if you somehow got a host-specific (or narrower) identifier as a
result of setting up the initial connection (maybe via a TCP option),
and you had a way to map that host-specific identifer to its current IP
address (assume for now that you're using DNS, though there are still
other problems with that) - then you could do a different kind of lookup
to get the new IP address and use that to do a restart.

even then, it wouldn't help the numerous applications which don't have a
way to cleanly recover from dropped TCP connections.  (remember,  TCP
was supposed to make sure data were retransmitted as necessary and that
duplicated data were sorted out, provide a clean close, that sort of
thing.   once you expect apps to handle dropped connections they have to
re-implement TCP functionality at a higher layer.)


      Applications need to deal with TCP connections breaking for
      all sorts of reasons.  Renumbering should be a relatively
      infrequent event compared to all the other possible ways a
      TCP connection can fail.

Mumble.  Seems like the whole point of TCP was to recover from such
failures at a lower level.  And I remember how people used to say that
TCP was better than X.25 VCs (in part) because TCP would recover from
temporary network outages that would cause hangups in X.25.

I also don't have a lot of faith in "should be", not when I've seen DHCP
servers routinely refuse to renew leases after very short times, nor
when I've heard people say that a site should be able to renumber every
day.  

I used to try to get people to specify a minimum amount of time that a
non-deprecated address should be expected to be valid - say a day.  Then
application writers and application protocol designers would have an
idea about whether they needed a strategy for recovery from a
renumbering event, and what kind of strategy they needed.  But the only
people who seemed to like this idea were application area people.

      Until applications deal nicely with the other failure modes,
      complaints about renumbering causing problems at the
      application level are just noise.

in other words, one design error can be used to justify another?  sort
of like the blind leading the blind?

I see a significant difference between a design flaw in a particular
application that cripples that application, and a design flaw in a lower
layer that cripples all applications.

Keith


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf