Re: [dnsop] [dean(_at_)av8(_dot_)com: Mismanagement of the DNSOP list]

I'm rather reticent to add real technical discussion to the issue of list 
mismangement. 

On Tue, 27 Sep 2005, Bill Sommerfeld wrote:

On Tue, 2005-09-27 at 10:06, Robert Elz wrote:

    Date:        Mon, 26 Sep 2005 15:41:56 -0400 (EDT)
    From:        Dean Anderson <dean(_at_)av8(_dot_)com>
    Message-ID:  
<Pine(_dot_)LNX(_dot_)4(_dot_)44(_dot_)0509261531270(_dot_)32513-100000(_at_)cirrus(_dot_)av8(_dot_)net>

  | It is not DNSSEC that is broken.

I have not been following dnsop discussions, but from this summary, there
is nothing broken beyond your understanding of what is happening.


It's worse.  The reasoning is broken on other points, as well.

In these arguments, RFC 1812 has been cited repeatedly as a
specification for load-splitting.  By my reading, 1812 is extremely
vague about the topic, and does not require a specific spreading
algorithm.


Yes. It gives the implementor tremendous lattitude. But plainly, it is 
appropriate to do (as Cisco did), per packet load balancing, where successive 
packets can be expected to take different paths.

Its strongest recommendation is that there be a way to turn
it off if it doesn't work for you, which should by itself be a clue that
load-spreading should be used with caution; it also cautions that that
load-splitting was an area of active research at the time 1812 was
published.


And now there are implementations and users that use it. 

But to make anycast work with TCP or large UDP and fragments, one needs to
guarantee that two successive packets (actually an entire session) uses exactly
the same path.  No load balancing (or very course grained load balancing) is
required.  The prescription given in RFC1546 needs to be changed:

RFC1546 page 5:
---------------------------------------------------
How UDP and TCP Use Anycasting

   It is important to remember that anycasting is a stateless service.
   An internetwork has no obligation to deliver two successive packets
   sent to the same anycast address to the same host.
---------------------------------------------------

RFC1546 also gives a prescription for alterations to TCP so that TCP can work
with Anycast and with the condition on successive packets above. So far as I
know, no one has implemenated this prescription in a TCP stack.

Moreover, load-splitting which results in the sort of flow-shredding
which would disrupt multi-packet anycast exchanges also causes
significant difficulties for unicast.  To quote from rfc2991 section 2:


RFC2991 is a Informational, and is wrong in some of its assertions. This was
discussed on the GROW list.

   Variable Path MTU
         Since each of the redundant paths may have a different MTU,
         this means that the overall path MTU can change on a packet-
         by-packet basis, negating the usefulness of path MTU discovery.


This is not a real problem. The MTU is reduced to the smallest MTU of any path. 
 
If PMTUD is turned off (an option rarely used) the DF bit is also turned off and
so packets will be fragmented.  While the smaller packet size might be
sub-optimal on the larger MTU paths, this is just a (tiny) performance
consideration.

It is not the case that the usefulness of path MTU is negated.

   Variable Latencies
         Since each of the redundant paths may have a different latency
         involved, having packets take separate paths can cause packets
         to always arrive out of order, increasing delivery latency and
         buffering requirements.

         Packet reordering causes TCP to believe that loss has taken
         place when packets with higher sequence numbers arrive before
         an earlier one.  When three or more packets are received before
         a "late" packet, TCP enters a mode called "fast-retransmit" [6]
         which consumes extra bandwidth (which could potentially cause
         more loss, decreasing throughput) as it attempts to
         unnecessarily retransmit the delayed packet(s).  Hence,
         reordering can be detrimental to network performance.


RFC2991 also mis-states the TCP issue. RFC2581 describes the Fast retransmit
behavior as follows:

   "The TCP sender SHOULD use the "fast retransmit" algorithm to detect
   and repair loss, based on incoming duplicate ACKs.  The fast
   retransmit algorithm uses the arrival of 3 duplicate ACKs (4
   identical ACKs without the arrival of any other intervening packets)
   as an indication that a segment has been lost.  After receiving 3
   duplicate ACKs, TCP performs a retransmission of what appears to be
   the missing segment, without waiting for the retransmission timer to
   expire.

RFC2991 mis-states this as follows:

         When three or more packets are received before
         a "late" packet, TCP enters a mode called "fast-retransmit"

This is not the case. [However, if it were the case, it would still only affect
6% of the packets.] A fast retransmit is made after 4 idential ack packets are
received, which means that 4 packets have to be received before the late packet.

A more thorough reading of RFC2581 reveals when an ACK should be sent:

   A TCP receiver SHOULD send an immediate duplicate ACK when an out-
   of-order segment arrives.  The purpose of this ACK is to inform the
   sender that a segment was received out-of-order and which sequence
   number is expected.  From the sender's perspective, duplicate ACKs
   can be caused by a number of network problems.  First, they can be
   caused by dropped segments.  In this case, all segments after the
   dropped segment will trigger duplicate ACKs.  Second, duplicate ACKs
   can be caused by the re-ordering of data segments by the network (not
   a rare event along some network paths [Pax97]).

While out-of-order packets could trigger the fast retransmit, it occurs
just 3% of the time.  So just 3% of packets are unnecessarilly
retransmitted.  Not a great performance impact.

But again, at worst, this is merely a performance issue that may be more than
compensated for by the additional performance and availability of multiple
diverse links.

But lets not forget the benefits of load balancing over diverse paths:

For example, when a path fails, it can be immediately removed from the routers
FIB, and another path can be immediately used without waiting for routing
processes to select the next best route and add it to the FIB. [no more
blackholes until next BGP scan after link failure]. While little benefit to
SMTP, This greatly benefits VOIP and streaming audio and video.

VOIP RTP buffers have no such performance issues with multipath. As long
as each packet arrives before it is to be consumed, it does not matter
what order they arrive in.  PPLB would greatly improve VOIP performance
characteristics.

And folks I know who build gear which does load-splitting seem to be
scrupulously careful to avoid these sorts of problems.


The equipment cannot do anything to avoid these problems. Except turn off load
balancing if necessary.

                --Dean


-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 344 9000   







_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf