Re: Review of draft-ietf-teas-gmpls-resource-sharing-proc-06

Hi Dale,

Thank you for the detailed review of the document. 

Please see the updated document and diff files attached in this email that 
addresses your comments.

Have addressed your comments as following <RG> ….


On 2017-01-12, 4:29 PM, "Dale Worley" <worley(_at_)ariadne(_dot_)com> wrote:

    Reviewer: Dale Worley
    Review result: Ready with Nits
    
    I am the assigned Gen-ART reviewer for this draft.  The General Area
    Review Team (Gen-ART) reviews all IETF documents being processed
    by the IESG for the IETF Chair.  Please treat these comments just
    like any other last call comments.
    
    For more information, please see the FAQ at
    <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
    
    Document:  draft-ietf-teas-gmpls-resource-sharing-proc-06
    Reviewer:  Dale R. Worley
    Review Date:  12 Jan 2017
    IETF LC End Date:  17 Jan 2017
    IESG Telechat date:  2 Feb 2017
    
    Summary:
    
           This draft is basically ready for publication, but has nits
           that should be fixed before publication.
    
    There are various places where the wording of the draft is unclear.
    The draft would benefit from a careful editing for clarity.
    Particularly, there are a considerable number of places where the use
    of "the" and "a" and of plurals is not standard or leaves the text
    somewhat uncertain.
    
    There are various references to ASSOCIATION objects,
    SESSION_ATTRIBUTE
    objects, etc.  The text leaves it unclear where these objects live;
    it
    talks as if they exist in an abstract sense.  I think I managed to
    track down what is going on in RFC 4872, which is that the Path
    message that sets up an LSP contains an array of objects and all of
    the objects described are parts of the respective LSP setup Path
    messages.
    
    I also suspect that the Path message objects are retained by the
    various nodes as permanent information about the LSPs that they have
    configured, so one can speak unambiguously of "the ASSOCIATION object
    of the LSP" long after the LSP is set up.
    
    If all of this is correct, it would help the naive reader if this was
    spelled out at the beginning of the document and/or the wording was
    changed in places provide this context.  E.g.,
    
       GMPLS LSPs can share resources during LSP setup if they have
    Shared
       Explicit (SE) flag set in their SESSION_ATTRIBUTE objects and:
    
    could be clarified as
    
       GMPLS LSPs can share resources during LSP setup if they have
    Shared
       Explicit (SE) flag set in the SESSION_ATTRIBUTE objects in the
    Path
       messages that create them and:
    

<RG> Edited the document to clarify (at multiple places by using suggested text 
above).

    There are a number of terms that are unclear to me.  It's possible
    that they have very standard meanings in GMPLS or traffic
    engineering,
    though.  Is there a terminology section in a referenced RFC that
    could
    be pointed to to define these various words?
    
<RG> Added Section 2. [RFC4427] defines terminology for the GMPLS recovery 
(protection and restoration).


    1.  Introduction
    
       to setup Label Switched Paths (LSPs) in non-packet transport
    
    The form "set up" is a verb, whereas "setup" is a noun (naming an
    instance of the action of setting up) or an adjective (specifying
    that
    something has to do with setting up).  So in this instance, the
    wording should be "set up".  Other uses of "setup/set up" should be
    checked also.

<RG> Edited at multiple places.
    
       As described in [RFC6689], an ASSOCIATION object can be
       used to identify the LSPs for restoration using Association Type
    set
       to "Recovery" [RFC4872] and also identify the LSPs for resource
       sharing using Association Type set to "Resource Sharing" [RFC4873].
    
    
    The ordering of the phrases in this sentence is somewhat confusing
    because "using Association Type set to xxx" is a qualifier of "an
    ASSOCIATION object", yet the phrase "can be used to yyy" is between
    them.  Clearer to say:
    
       As described in [RFC6689], an ASSOCIATION object with Association
       Type "Recovery" [RFC4872] can be used to identify the LSPs for
       restoration.  Also, an ASSOCIATION object with Association Type
       "Resource Sharing" [RFC4873] can be used to identify the LSPs for
       resource sharing.
    

<RG> Edited.

    --
    
       Generally GMPLS end-to-end recovery schemes have the restoration
    LSP
       signaled after the failure has been detected and notified on the
       working LSP.
    
    Is "signaled" used here in a standard way for GMPLS?  It seems that
    "the LSP is signaled" is to mean "the LSP is set up", but it took me
    some time to realize that.  I am used to "X is signaled" meaning "a
    signal is sent to X".  (There are many instances of this usage.)
    
<RG> Used term “set up” at most places to be consistent.

    It would also be useful for the reader to know the difference between
    "protection", "restoration", and "recovery".  I think that
    "protection" is anti-failure paths set up *before* any failure,
    "restoration" is anti-failure paths set up *after* a failure, and
    "recovery" includes both "protection" and "restoration".  Is this
    standard terminology withing GMPLS, or should the reader be warned
    about it?
    
<RG> Added Section 2. [RFC4427] defines terminology for the GMPLS recovery 
(protection and restoration).


       In non-packet transport networks, as
       working LSPs are typically signaled over a nominal path, 
    
    What is the meaning of "nominal" here?  ("nominal" has a number of
    meanings, some of which are largely contradictory.)
    
       can be reverted to the nominal path when the failure is repaired
    

<RG> Replaced nominal with preferred.

    In this context, the meaning of "reverted" is made clear by the
    clause
    "when the failure is reparied..." -- as opposed to other uses of
    "reverted".
    
       In this document, procedures are reviewed for
    
    It's probably better to say "we review procedures for...".
    
<RG> Edited.

       o  When using end-to-end recovery with revertive mode, methods for
          LSP reversion and resource sharing are summarized in this
          document.
    
    A definition of "revert/revertive/reversion" would be useful.
    

<RG> This is now elaborated in Section 3.2. RFC4427, section 11 has details.

    2.  Overview
    
       The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
       being considered in this document, "fully dynamic rerouting
    switches
       normal traffic to an alternate LSP that is not even partially
       established only after the working LSP failure occurs.  The new
       alternate route is selected at the LSP head-end node, it may reuse
       resources of the failed LSP at intermediate nodes and may include
       additional intermediate nodes and/or links".
    
    It is awkward to visually coordinate the quotation marks in this
    paragraph.  If it is important that the text is quoted from RFC 4872,
    given its length, it should be presented as a block-quote.  If not,
    the quotation marks should be omitted and just the reference given.
    
    If the intention is to quote this text, it should be corrected so
    that
    it matches the passage from RFC 4872.  In particular, the difference
    between "fully dynamic rerouting" (in the draft) and "Full LSP
    rerouting (or restoration)" needs to be resolved, as there might be a
    difference in meaning.
    
    The grammar does not join "The GMPLS end-to-end recovery scheme ..."
    and "... fully dynamic rerouting switches normal traffic".
    
    Perhaps something like:
    
       The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
       being considered in this document, switches
       normal traffic to an alternate LSP that is not even partially
       established only after the working LSP failure occurs.  The new
       alternate route is selected at the LSP head-end node, it may reuse
       resources of the failed LSP at intermediate nodes and may include
       additional intermediate nodes and/or links.
    

<RG> Edited the text.

    --
    
       Two examples, 1+R and 1+1+R are described in the following
    sections.
    
    At this point in the text, it's not clear what category these items
    are examples *of*.  They aren't single recovery situations, as one
    would expect of something labeled "example".  They seem to be
    sub-categories of "The GMPLS end-to-end recovery scheme".  So it
    would
    be better to use phrasing like "Two forms of end-to-end recovery,
    ...,
    are described in the following sections." or "Two end-to-end recovery
    schemes/situations ...".
    
    I assume that other variants of end-to-end recovery exist, and this
    draft is applicable to some/many/all of them.  To guard against
    misunderstanding, it would be worth saying so by adding something
    like
    "Many other forms of end-to-end recovery exist, many of which [or
    whatever] can use these RSVP-TE signaling techniques."
    
<RG> Edited text with above suggestions.


    Given that sections 2.1 and 2.2 form a pair of examples, it might be
    useful to distinguish them from "Resource Sharing By Restoration LSP"
    (which is not an example, and is not somehow an alternative to 1+R
    and
    1+1+R) by renumbering the sections to:
    
        2.  Overview
        2.1.  Examples
        2.1.1.  1+R Restoration
        2.1.2.  1+1+R Restoration
        2.2.  Resource Sharing By Restoration LSP
    
    In that case, the introductory sentence "Two examples..." would move
    to the new section 2.1.
    
<RG> Updated sections.


    Where do the names "1+R" and "1+1+R" come from and do they have
    meaning beyond being arbitrary labels?
    
<RG> It is defined in this document. 


    Also, given that the 1+1+R case is split into four sub-cases, it's
    not
    clear that the split between 1+R and 1+1+R is fundamental.  It seems
    that there is an array of semi-independent choices:  whether there is
    an ongoing protection LSP, how many restoration LSPs may be
    established (no more than the number of ongoing LSPs), how many
    failures of original LSPs must happen before restoration LSPs are
    established; various combinations of these choices yield various
    restoration techniques.
    
    Looked at that way, it might be worth combining both examples into
    one.  But that has the problem that figure 2 looks considerably
    different from figure 1.
    
    OTOH, figure 2 isn't particularly accurate for the situation with two
    restoration LSPs, and perhaps those two cases should be split into
    another section with its own figure.

<RG> Created section 3.1.2.1 and moved text there.

    
    2.1.  1+R Restoration
    
       Unlike a protection LSP, a restoration LSP is signaled per need
       basis.
    
    Is "restoration" a standard word in this field?  If not, there should
    be some sort of terminology section that states clearly the
    difference
    between "protection" and "restoration".
    
<RG> Yes as per [RFC4427].

    2.2.  1+1+R Restoration
    
    This paragraph could use rewording to be clearer:
    
       After a failure detection and
       notification on a working LSP or protecting LSP, a third LSP on
    path
       A-H-I-J-Z is established as a restoration LSP.
    
    Since the working LSP has already been described, this should be "the
    working LSP".
    
<RG> Edited the text.

       The restoration LSP
       in this case provides protection against a second order failure. 
    
    It would probably be better to explain what the "second order
    failure"
    is:
    
       The restoration LSP in this case provides protection against
       failure of both the working and protecting LSPs.
    
<RG> Edited the text.

    --
    
       During failure switchover with 1+1+R recovery scheme, in general,
       failed LSP resources are not released so that working, protecting
    and
       restoration LSPs coexist in the network.  Nonetheless, a
    restoration
       LSP with the working LSP it is restoring as well as a restoration
    LSP
       with the protecting LSP it is restoring can share network
    resources. 
    
    For ease of reading, better to split the two cases apart, and not use
    "it is restoring" as we haven't introduced "restore" as a transitive
    verb:
    
       The restoration LSP can share network resources with the working
       LSP, and it can share network resources with the protecting LSP.
    

<RG> Edited the text.

    --
    
       Typically, restoration LSP is torn down when the failure on the
       original (working or protecting) LSP is repaired and the traffic
    is
       reverted to the original LSP.
    
    Strictly,
    
       Typically, the restoration LSP is torn down when both the working
       and protecting LSPs are repaired and the traffic is reverted to
    the
       original LSP.
    
    Except that's not correct, either.  Probably the practice is that a
    restoration LSP is torn down when enough original LSPs are repaired
    to
    bring the failure count below the threshold that triggered the
    setting
    up of the restoration LSP (which varies among the four models).  But
    that's awkward to write, even though that is the correct statement.

<RG> Edited the text.
    
    --
    
       In all models discussed, if the restoration LSP also fails, it is
       torn down and a new restoration LSP is signaled.
    
    You can't say "the restoration LSP" because some of the models have
    more than one.  Better
    
       In all these models, if a restoration LSP also fails, it is torn
       down and a new restoration LSP is signaled.
    
<RG> Edited the text.

    2.3.  Resource Sharing By Restoration LSP
    
       it allows for resource sharing when the LSP
       traffic is dynamically restored after the link failure
    
    The significance of this phrase isn't clear to me.  One possible
    sense
    is that since the failure that is being discussed is the C-D link
    failure, then necessarily the resources from A to C can be reused.
    But that meaning doesn't work well here, because we haven't
    introduced
    what the failure is.  (Also, you use the phrase "the link failure"
    before introducing what the link failure is.)
    
    It seems like the potential for resource sharing is a property of the
    LSP that it might not have, but the text doesn't point that out
    clearly as an assumption of the example.  Perhaps
    
       Using the network shown in Figure 3 as an example, LSP1
    (A-B-C-D-E)
       is the working LSP, and assume it allows for resource sharing when
    the LSP
       traffic is dynamically restored.
    

<RG> Edited the text.

    --
    
       In this case, A-B-C-F-G-E is
       chosen as the restoration LSP path and the resources on the path
       segment A-B-C are re-used by this LSP when the working LSP is not
       torn down (e.g. in 1+R recovery scheme).
    
    "when" isn't the right word here, because the re-using the resources
    doesn't wait for the working LSP to be not torn down.  Perhaps:
    
       In this case, A-B-C-F-G-E is
       chosen as the restoration LSP path and the resources on the path
       segment A-B-C are re-used by this LSP.  The working LSP is not
       torn down.
    
<EG> Edited the text.

    3.1.  Restoration LSP Association
    
       For example, when a restoration
       LSP is signaled for a failed working LSP, the ASSOCIATION object
    in
       the restoration LSP contains the Association ID and Association
       Source set to the Association ID and Association Source signaled
    in
       the working LSP for the "Recovery" Association Type.
    
    As a general question, where does the association object live?
    Clearly it isn't "in the restoration LSP".  It would be useful to
    mention this for readers who aren't fully familiar with the
    background:
    
       For example, when a restoration LSP is signaled for a failed
       working LSP, the ASSOCIATION object in the Path message that
       establishes the restoration LSP contains ...
    
<RG> Edited the text at multiple places.

    3.2.  Resource Sharing-based Restoration LSP Setup
    
       As described in [RFC3209], Section 2.5, the purpose of
    make-before-
       break is "not to disrupt traffic, or adversely impact network
       operations while TE tunnel rerouting is in progress".  In
    non-packet
       transport networks, the label has a mapping into the data plane
       resource used and the nodes along the LSP need to send triggering
       commands to data plane for setting up cross-connections
    accordingly
       during the RSVP-TE signaling procedure.  Due to the nature of the
       non-packet transport networks, a node may not be able to fulfill
    this
       purpose when sharing resources in some scenarios.
    
    I can understand this paragraph, but I think it could benefit from a
    number of edits.  The first is to remove the quotation marks, since
    the purpose is not to emphasize that RFC 3209 said those words, but
    rather that 3209 stated the same concept.  And I think some of the
    explanation can be omitted without losing clarity.
    
       As described in [RFC3209], Section 2.5, the purpose of
    make-before-
       break is not to disrupt traffic, or adversely impact network
       operations while TE tunnel rerouting is in progress.  In
    non-packet
       transport networks during the RSVP-TE setup procedure, the
       nodes along the LSP set up cross-connections accordingly.  Because
    a
       cross-connection cannot simultaneously connect a shared resource
    to
       different resources in two alternative LSPs, nodes may not be able
    to
       fulfill this promise when LSPs share resources.
    
<RG> Edited the text.

    --
    
      
    ---------+---------------------------------------------------------
       Category |       Node Behavior during Restoration LSP Setup
      
    ---------+---------------------------------------------------------
          C1    + Reusing existing resource on both input and output
                + interfaces (nodes A & B in Figure 3).
                +
                + This type of node needs to book the existing 
                + resources and no cross-connection setup 
                + command is needed.
      
    ---------+---------------------------------------------------------
    
    This would be prettier if most of the +'s were turned into |'s:
    
<RG> Edited the table.
      
    ---------+---------------------------------------------------------
       Category |       Node Behavior during Restoration LSP Setup
      
    ---------+---------------------------------------------------------
          C1    | Reusing existing resource on both input and output
                | interfaces (nodes A & B in Figure 3).
                |
                | This type of node needs to book the existing 
                | resources and no cross-connection setup 
                | command is needed.
      
    ---------+---------------------------------------------------------
    
    Note that the items in the second column of the table are composed of
    two parts:  The first part is condition that defines which nodes are
    in that category, and the second part is the actions that will be
    taken by such nodes.  Ideally, these would be broken out as separate
    columns.  (The current first column provides the labels C1, C2, and
    C3, but those aren't references anywhere in the document, and could
    be
    omitted to save space.)  That revises the table to look like this:
    
      
    ------------------------------------+------------------------------
           Situation                       |     Actions
      
    ------------------------------------+------------------------------
       Reusing existing resources          | Book the existing resources.
       on both input and output interfaces | No cross-connection setup 
    is
       (nodes A & B in Figure 3).          | needed.
      
    ------------------------------------+------------------------------
       Reusing existing resource only on   | Book the resources.
       one of the interfaces (either input | Re-configure the
    cross-connection
       or output) and uses new resource on | to connect the re-used
    resource
       the other interface.                | to the new resource.
       (nodes C & E in Figure 3).          |
      
    ------------------------------------+------------------------------
       Using new resources on both         | Book the new resources.
       interfaces.                         | Send the cross-connection
    setup 
       (nodes F & G in Figure 3).          | command on both interfaces.
      
    ------------------------------------+------------------------------
    
<RG> Edited the table.


    Is the meaning of "book" well-known?  I find no use of it elsewhere
    in
    this document or in any of the references.
    
<RG> Replaced “book” with “reserve”.

       Depending on whether the resource is re-used or not, the node
       behaviors differ.
    
    Of course, the different behavior is only because we are here
    optimizing the establishment of the new LSP.  A node could send a
    command to cross-connect two resources that are already connected.
    
       This deviates from normal LSP setup since some
       nodes do not need to re-configure the cross-connection, and it
    should
       not be viewed as an error.
    
    Why would this (not sending a command to connect things that are
    already connected) be considered an error under any circumstances?
    
<RG> Removed the line to avoid confusion.


    3.3.  LSP Reversion
    
    Is "reversion" a standard term?
    
<RG> Yes. RFC4427, Section 4.11.


       If the end-to-end LSP recovery is revertive, as described in
       Section 2 ...
    
    I'm not sure how the phrase "If the end-to-end LSP recovery is
    revertive" works.  "Recovery" seems to be a general term for
    techniques to recover from link failures and the like.  Is this
    describing a "revertive" recovery method, or is it describing an
    instance of recovery which is somehow "revertive"?
    
    Compare to "revert", which seems to be the action of putting the
    traffic back on the original/protection LSP once its functionality is
    restored.  I would expect that behavior to be universal.
    
<RG> Edited the text.

       1. Make-while-break Reversion, where resources associated with a
          working or protecting LSP are reconfigured while removing
          reservations for the restoration LSP.
    
    It's not clear to me what sort of reconfiguring is being discussed.
    Assuming that "reversion" means "when the working/protecting LSP
    starts working again, traffic is restored to that path", its not
    clear
    what sort of reconfiguration would be needed, as the
    working/protecting LSP already exists.
    
    I suspect that this issue shows up when the working/protecting LSP
    shares resources with the restoration LSP, and moving traffic to the
    restoration LSP may require reconfiguring resources, and so moving
    traffic back to working/protecting LSP may require reversing that
    reconfiguration.  But the initial reconfiguration has not been
    mentioned.  Should some sort of general description be put in
    "Resource Sharing By Restoration LSP" of the possible need to
    reconfigure when moving traffic to or from a restoration LSP?
    
    (This is all rather obvious, but it would help if it was clearly
    described.)


<RG> Added text in Section 3.2.
    
    3.3.1.  Make-while-break Reversion
    
       Removing reservations for restoration LSP
       triggers reconfiguration of resources associated with a working or
       protecting LSP on every node where resources are shared.
    
    Could you add an explanation or pointer why this is so?  It seems
    that
    for this to be true, the reservation process must broadcast an
    explicit prioritization between the new (restorative) reservation and
    the old (working) reservation, because the node that is reconfigured
    has to remember both reservations, and revert to the working one when
    the restorative one is deleted.  It'd be useful for the naive reader
    to know where in RSVP-TE that information is broadcast and/or how
    RSVP-TE specified that nodes have to remember that information.
    
<RG> Added text to state that working LSP states not torn down.

       Deletion of restoration LSPs is not a revertive process.
    
    What is the meaning of "revertive process" here?  It doesn't seem to
    match the sense of "revertive" as used elsewhere.
    
<RG> Removed this line to avoid confusion.


       In
       particular, if RSVP packets are lost due to nodal or DCN failures
    it
       is possible for an LSP to be only partially deleted.
    
    "nodal" should probably be "node".
    
    What is "DCN"?  I can't find it in any of the referenced RFCs.  Does
    "link" work as a replacement?
    
<RG> Corrected the text.

    3.3.2.  Make-before-break Reversion
    
       Instead of relying on deletion of
       restoration LSP, the head-end chooses to establish a new LSP to
       reconfigure resources on the working or protection LSP path, and
    uses
       identical ASSOCIATION and PROTECTION objects from the LSP it is
       replacing.
    
    This could be made clearer by consistently labeling the enw LSP as
    the
    "reversion" LSP.  Also, state explicitly that its resources exactly
    duplicate the resources of the working/protection LSP that is being
    reverted:
    
       Instead of relying on deletion of the
       restoration LSP, the head-end chooses to establish a new
       "reversion" LSP that duplicates the configuration of the
       resources on the working or protection LSP, and uses
       identical ASSOCIATION and PROTECTION objects for that LSP.
    
<RG> Edited the text.

    --
    
       Reversion LSP is sharing resources both with working and
       restoration LSPs.
    
    Better
    
       The reversion LSP shares all of the resources of the
    working/protection
       LSP and may share resources with the restoration LSP.
    
<RG> Edited the text.

    --
    
       Hence, after reversion LSP
       is created, data plane configuration essentially reflects working
    or
       protecting LSP reservations.
    
    It seems like "essentially" is not needed, because the data plane
    configuration will *exactly* reflect the working/protecting LSP
    reservations.  Or are there minor variations in how reservations are
    done that may not be exactly duplicated by the reversion LSP?

<RG> Edited the text.    


       After "make" part is finished, working and restoration LSPs are
    torn
       down.
    
    Perhaps emphasize "the original working/protection and restoration
    LSPs are torn down", as the reversion LSP becomes the new
    working/protection LSP.
    
<RG> Edited the text.

       o  Rollback
    
       If "make" part fails, (existing) restoration LSP will still be
    used
       to carry existing traffic.  Same logic applies here as for any MBB
       operation failure.
    
    The reasoning here is not clear to me.  If the "make" operation
    fails,
    some of the nodes may be configured for the restoration LSP, while
    others will be configured for the restoration LSP.  Or is it implicit
    that creating LSPs is an atomic operation network-wide, that
    incomplete LSP creations will be completely purged from the network?
    
    If the latter is true, then the core of this discussion is that
    creating LSPs is atomic across the network, but *deleting* LSPs is
    not
    (and so make-while-break can fail to work).  If that difference is
    true, it should be said explicitly somewhere near the beginning of
    section 3.3, as that fact is what is driving the whole discussion.
    
<RG> This is because the original restoration LSP is not torn down in this 
(MBB) case (as opposed the MWB).  But yes, the node will need to be 
reconfigured if needed. 


Thanks,
Rakesh (for authors and contributors)



    [END]
draft-ietf-teas-gmpls-resource-sharing-proc-07.txt
Description: draft-ietf-teas-gmpls-resource-sharing-proc-07.txt
<<< text/html; name="Diff_ draft-ietf-teas-gmpls-resource-sharing-proc-06.txt - draft-ietf-teas-gmpls-resource-sharing-proc-07.html": Unrecognized >>>