I reviewed draft-ietf-sipping-overload-reqs-02 at the request of the transport
area directors. Note that my area of expertise is TCP, congestion control and
bulk data transport. I am not a SIP expert, and have not been following the
SIP documents.
I have serious concerns about this document because it explicitly excludes the
only approach for coping with overload that is guaranteed to be robust under
all conditions. Although I know it is considered bad form to describe
solutions while debating requirements, I think a sketch of a solution will
greatly clarify the discussion of the requirements.
The only robust overload signal is the natural implicit signal - silently
discarding excess requests. Explicit overload messages (code 503) should be
optional, and must have an explicit rate limit. The error message may be
cached (e.g. in proxies, etc) but must not be required to be cached. All
retransmissions in all parts of the protocol must back off exponentially
(which I am told is already true for SIP).
Sending additional messages to explicitly indicate overload is intrinsically
fragile. If the overload management mechanism consumes any shared resource
that might be needed to complete other calls, then there exists some operating
point where any additional requests will cause a decline in the number of
successfully completed calls. This is likely to be regenerative, with each
successive error using more resources and preventing more calls, until the
throughput crashes to zero. This phenomena was readily apparent in all of the
plots shown in the tsvwg meeting at IETF 71.
Note that if the explicit overload management mechanism is very complicated,
the situation that triggers this failure might also be very complicated.
Asserting that this hazard does not exist is probably equivalent to proving
that explicit overload notifications never cause additional calls to fail, for
all combinations of implementations under all operating conditions. It would
not be an easy task to prove that the standards are sufficient to guarantee
this for all possible implementations.
My specific objections to the document are as follows: Requirement 6 calls for
explicit overload messages and forbids silently discarding requests, since
they are not unambiguous in their meaning. Requirement 15 seems to provide a
loophole (allowing complete failures) but seems to forbid using it as the
preferred mechanism. Requirement 8 does not make sense without explicit
notification. Requirements 7, 8 and 9 should note that they can be (are
already?) equivalently satisfied by properly structured exponential
retransmission backoff timers in SIP itself.
I would like to point out that TCP, IP and several other transport protocols
have evolved in the same direction as I am advocating for SIP: the only robust
indication that an error has occurred is connection failure. Error messages
are cached and sometimes accelerate timers (e.g. retransmit now, or go to the
next IP address now), but do not change basic protocol behavior. Error
messages are most often rate limited at the sender and the saved error codes
are used to provide a clue why something failed, but the fact that it failed
most likely comes from a timer, not the message itself. The number of error
massages that are required for correct operation is declining (note that 4821
makes ICMP can't fragment optional), and may be zero.
Rate limiting all errors messages and treating them as advisory improves
robustness in several ways: fraudulent messages have less impact, error
messages can not be used an DDOS attack magnifiers, and overload is addressed
implicitly by silently discarding requests.
Note that the normal, non-crisis, behavior has not changed significantly:
error message are sent, cached and reported to the application. However, in a
crisis, the error reporting degrades gracefully, while the throughput goes
flat, without any negative slope. This is where SIP (and all other protocols)
should strive to be.
Treating all errors as soft should have been an Internet Architectural
Principle.
Thanks,
--MM--
-------------------------------------------
Matt Mathis http://staff.psc.edu/mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
IETF mailing list
IETF(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf