Re: Last Call: draft-ietf-rmt-bb-norm-revised (Multicast Negative-Ackn

Pekka,

I appreciate your comments here.  I plan to issue a new version of 
the draft that addresses these to the extent I can.  I have some 
questions about your concerns with comments in-line below:

At 11:01 AM +0300 4/7/08, Pekka Savola wrote:

On Thu, 3 Apr 2008, The IESG wrote:

The IESG has received a request from the Reliable Multicast Transport WG
(rmt) to consider the following document:

- 'Multicast Negative-Acknowledgment (NACK) Building Blocks '
  <draft-ietf-rmt-bb-norm-revised-04.txt> as a Proposed Standard

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send substantive comments to the
ietf(_at_)ietf(_dot_)org mailing lists by 2008-04-17. Exceptionally,
comments may be sent to iesg(_at_)ietf(_dot_)org instead. In either case, 
please
retain the beginning of the Subject line to allow automated sorting.


Meta-level comments
-------------------

Looking at the document, my main question is, "is this ripe for standards
track?".  Looking at it, my inclination would be to say "probably not, at
least in parts *)".  As Section 6 says, there have not been 
substantial changes
since the preceding experimental RFC 3941 was published in 2004.  All the
cited material (research etc.) predates RFC 3941.  So it seems that either
1) there has not been significant experience since the Experimental document
was published, 2) the experiences have been fully aligned with the earlier
document ("the document was already good enough"), or 3) the lessons learned
have not been reflected in this document revision.



I will see what updated references I can find.  As you mention later 
below, there are definitely updated references on SSM that would be 
better!

I think one aspect here is the fact that this is one of our RMT 
"Building Block" documents and the _general_ techiques that it 
describes with respect to NACK-based reliable multicast protocol 
design have stood the test of time including some that predates the 
Experimental (RFC 3941) specification.  There have been some long 
term deployment of these protocols, one of which I mention later 
below.  I think it is accepted within the RMT community that these 
are mature techniques.  I think your #2 above "the experiences have 
been fully aligned with the earlier document" is the case.  We 
actually had some considerable history with these types of protocols 
even prior to Experimental RFC publication.


The one thing I'd have been interested in seeing is an applicability
statement of reliable multicast and its different bits and pieces (beyond
what's in Section 3.11) but that seems out of scope of this document.
For example, it is not obvious to me which (if any) RMT mechanisms would be
applicable in a context where I want to distribute video or voice where it
isn't acceptable to buffer the stream too long to accommodate for data
resends; it seems this NACK mechanism is geared towards bulk file transfer
where this is not applicable.

*) the parts I'm mostly concerned with are router assistance and 
security (also touching the protocol/ops aspects when some receivers 
are misconfigured or behind slow links).



The _focus_ of the current RMT protocols was purposefully scoped to 
address "bulk transfer".   I think this is described in the RFC3048 
(which this proposed document _should_ but fails to reference). 
While "bulk transfer" was the focus here, the Nack-Oriented Reliable 
Multicast (NORM) protocol (RFC 3941) does (RFC 3941 is a "Protocol 
Instantiation" that was derived from the earlier version of this 
"Building Block" document), in fact, provide for a _optional_ 
"stream" support that we have used for video and voice streams.  This 
feature was made "optional" so that one could have a compliant 
implementation that solely provided "bulk transfer" capability.


I agree that "router assistance" has not been followed through.  It 
was originally part of the RMT WG charter but we were unable to 
sustain activity in that area.

My _personal_ opinion is that since this is a "Building Block" 
document that it would not have been complete to fail to mention the 
_potential_ of intermediate system "assistance" (along the lines that 
was discussed in the working group at one point) to improve 
scalability and/or performance of NACK-based reliable multicast.  In 
fact, I have some work in progress in the context of wireless 
networks to re-examine what sort of "assistance" to end-to-end 
reliable flows that intermediate systems may be able to provide.  But 
this is certainly not a "fully-baked" area.  So this discussion 
_could_ be removed ... I suppose it will persist in the RFC 3941 for 
historical purposes until there is further interest in the area?

Similarly, since this is a "Building Block" document, it does not do 
an extremely deep dive on security solutions.  We have another 
document in development within the RMT WG that may serve to describe 
security vulnerabilities, etc for RMT protocols.  And, we (under the 
good guidance of our area director)  have strived in the revised 
"Protocol Instantiation" documents (which are detailed protocol 
specifications) to fully address security, providing a description of 
how to secure the protocols with IPsec, etc.


Substantial
-----------

I was expecting to see some discussion of MTU and application framing issues
with multicast.  Specifically, in a big multicast tree with dynamic
membership, it could very well happen that when a new member joins, the
lowest common denominator MTU decreases.  How is this scenario expected to
be handled?   It may be that this issue has already been discussed somewhere
else as it isn't specific to this document.


I think MTU discovery for multicast was not in the scope of the RMT 
working group.  In my personal opinion, I do think MTU discovery for 
multicast in general has not been well addressed, and there is more 
work that could be done here by someone.  Practical deployment tends 
to count on multicast apps to be properly preconfigured for MTU  (The 
RMT protocols do allow for configurable packet payload sizes to 
accommodate MTUs of different deployments).

It is _possible_ that the scalable feedback mechanisms described here 
_could_ be applied to find the lowest MTU (the techniques are used to 
get scalable feedback of group-wide minima/maxima for purposes of 
congestion control, NACK suppression, etc as mentioned in this 
document and some of the other RMT building blocks (e.g., TFMCC)).


I doubt router/intermediate system assistance has seen very wide deployment
and I don't think it is very feasible to expect to see that.  As this
document is moving to Standards Track I would very much like to remove any
recommendations for router assistance because I don't see those being
implemented in any significant router implementation.  That means removing
and rewording e.g. sections 2.7, 2.4, 3.10 and some others.



See my comments above on this.  This change could be made.


   The sender's transmissions SHOULD make good utilization
   of the available capacity (which may be limited by the application
   and/or by congestion control).

How do you figure out what is the available capacity?  Are you 
referring to the capacity on sender's uplink or the collective 
capacity of the receivers or both?  Do you adapt to the lowest 
common denominator of all receivers (e.g., document previously 
quoted 56Kbit/s modems..)?  Does this have security impact? (Similar 
comment would apply to MTU/application framing aspects already 
mentioned above.)


The TFMCC (congestion control) building block addresses automated 
rate adjustment.  We have made congestion control distinct from 
reliability.  It is a "single-rate" scheme and is subject to the 
lowest common denominator of all receivers.  If this is not 
sufficient or acceptable for an application, additional mechanisms to 
eject poor-performing receivers from the group may be needed.

The intent of the sentence above is that the protocol should strive 
to not have "dead-air" time to the extent possible.  In the past, 
some reliable multicast protocols (incl. NACK based) have had 
sender/receiver interaction conducted in distinct "rounds" (i.e. the 
sender sends some data and then waits for feedback before continuing 
or some variants of that) and has resulted in poor goodput.

I agree that some clarification of that statement above is needed to 
make this point clear.


   In absence of a group size determination mechanism
   a default group size value of 10,000 is RECOMMENDED for reasonable
   management of feedback given the scalability of expected NACK-based
   reliable multicast usage.

What is the impact of this recommendation?  Is it safer to recommend 
too small or big?  Given that this would likely be close to a world 
record in production multicast group size, I'm not sure if this 
recommendation is reasonable; if it is deemed reasonable, it would 
be nice to have a citation justifying the number.  This is one area 
where figures based on experimentation would have helped. However, 
if recommending too big doesn't cause a problem even when the 
typical default size would be 10, 50 or 100 receivers, then it would 
be OK.



With the timer-based feedback suppression mechanism described, the 
"group size" estimate doesn't have to be very accurate to work and it 
is "safer" (with respect to impact on the network) to err on a larger 
group size.

In retrospect, the 10,000 value that was recommended was based on 
closer to the maximum group size that these protocols may be useful 
for.

In fact, the U.S. Postal Service has used a NACK-based protocol to 
deliver bulk data content to a group of 10,000 - 20,000 receivers in 
a single multicast group over a fairly limited IP-based VSAT delivery 
system.  This system was (and still is as far as I know) been used 
operationally on a daily basis for more than 5 years.

One of the references for this document is some work I did to assess 
(and to predict) the volume of feedback of these types of protocols 
with group sizes through this scale.

I can probably provide more clarification on the impact ... erring on 
the large size may add some extra latency to the NACK-based 
reliability process and require more buffering in the implementation 
to maintain state.


   NACK-based reliable multicast is compatible with IP security (IPsec)
   authentication mechanisms [RFC4301] that are RECOMMENDED for
   protection against session intrusion and denial of service attacks.

The details how one might apply IPsec to the unicast channel are absent.
I'm not commenting on the multicast delivery part because that is somewhat
covered though details are fuzzy.  Unicast has two major issues that I did
not see clearly addressed:

 1) malicious, misconfigured or under-performing (beyond small capacity
    links etc.) receivers.  Is there even a way to differentiate between
    these classes of receivers?  When these send a lot of NACK feedback,
    progress of the stream is deterred.  How do you deal with this issue
    (this is partly operations, protocol, and security problem)?



This is an issue.  I did try to point out that (but perhaps still too 
subtly) in the "Security Considerations" section.  The idea in the 
text there was to point out that SSM operation eliminates direct 
receiver<->receiver messaging, simplifying security such that only 
the sender need to authenticate/trust receiver operations.  For the 
case of IPsec, that means the sender implementation may have alot of 
Security Association state depending upon group size.  But I thought 
it beyond the scope of this "Building Block" document to go into the 
details of this.  It should more thoroughly addressed in any 
"Protocol Instantiations" that are made.


 2) receiver authentication for the feedback back-channel; how could you
    do it?  This seems unfeasible in practise if the expected default
    group sizes (e.g. the recommended default of 10,000 receivers) would
    be realized.


There indeed may be practical limitations on group size due to 
security.  I suppose it is comparable to a server that would need 
maintain alot of simultaneous secure TCP connections?  But again I am 
not sure it is in the scope of the NACK Building Block document to 
predict scalability limits of IPsec implementations?  But I suppose 
that some language could be added to point out these issues.  Would 
that address your concerns here?


editorial
---------

The document header should have "Obsoletes: 3941" or similar; likewise in
abstract/introduction.

[McastModel] refers to a (good) SSM PhD dissertation, but I'd say reference
to either RFC3569 or RFC4607 is probably more readily available and more
appropriate in the IETF context.

   1.  Multicast Sender Transmission

   2.  NACK Repair Process

   3.  Multicast Receiver Join Policies

   1.  Node (member) Identification
...

In section 3, the building block numbering wraps around; there are two
instances of building blocks 1-3.



I will fix _all_ of these.



-- 
Brian
__________________________________
Brian Adamson
<mailto:adamson(_at_)itd(_dot_)nrl(_dot_)navy(_dot_)mil>
_______________________________________________
IETF mailing list
IETF(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

Re: Last Call: draft-ietf-rmt-bb-norm-revised (Multicast Negative-Acknowledgment (NACK) Building Blocks) to Proposed Standard