Hi Dale,
Thank you for the detailed review of the document.
Please see the updated document and diff files attached in this email that
addresses your comments.
Have addressed your comments as following <RG> ….
On 2017-01-12, 4:29 PM, "Dale Worley" <worley(_at_)ariadne(_dot_)com> wrote:
Reviewer: Dale Worley
Review result: Ready with Nits
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please treat these comments just
like any other last call comments.
For more information, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
Document: draft-ietf-teas-gmpls-resource-sharing-proc-06
Reviewer: Dale R. Worley
Review Date: 12 Jan 2017
IETF LC End Date: 17 Jan 2017
IESG Telechat date: 2 Feb 2017
Summary:
This draft is basically ready for publication, but has nits
that should be fixed before publication.
There are various places where the wording of the draft is unclear.
The draft would benefit from a careful editing for clarity.
Particularly, there are a considerable number of places where the use
of "the" and "a" and of plurals is not standard or leaves the text
somewhat uncertain.
There are various references to ASSOCIATION objects,
SESSION_ATTRIBUTE
objects, etc. The text leaves it unclear where these objects live;
it
talks as if they exist in an abstract sense. I think I managed to
track down what is going on in RFC 4872, which is that the Path
message that sets up an LSP contains an array of objects and all of
the objects described are parts of the respective LSP setup Path
messages.
I also suspect that the Path message objects are retained by the
various nodes as permanent information about the LSPs that they have
configured, so one can speak unambiguously of "the ASSOCIATION object
of the LSP" long after the LSP is set up.
If all of this is correct, it would help the naive reader if this was
spelled out at the beginning of the document and/or the wording was
changed in places provide this context. E.g.,
GMPLS LSPs can share resources during LSP setup if they have
Shared
Explicit (SE) flag set in their SESSION_ATTRIBUTE objects and:
could be clarified as
GMPLS LSPs can share resources during LSP setup if they have
Shared
Explicit (SE) flag set in the SESSION_ATTRIBUTE objects in the
Path
messages that create them and:
<RG> Edited the document to clarify (at multiple places by using suggested text
above).
There are a number of terms that are unclear to me. It's possible
that they have very standard meanings in GMPLS or traffic
engineering,
though. Is there a terminology section in a referenced RFC that
could
be pointed to to define these various words?
<RG> Added Section 2. [RFC4427] defines terminology for the GMPLS recovery
(protection and restoration).
1. Introduction
to setup Label Switched Paths (LSPs) in non-packet transport
The form "set up" is a verb, whereas "setup" is a noun (naming an
instance of the action of setting up) or an adjective (specifying
that
something has to do with setting up). So in this instance, the
wording should be "set up". Other uses of "setup/set up" should be
checked also.
<RG> Edited at multiple places.
As described in [RFC6689], an ASSOCIATION object can be
used to identify the LSPs for restoration using Association Type
set
to "Recovery" [RFC4872] and also identify the LSPs for resource
sharing using Association Type set to "Resource Sharing" [RFC4873].
The ordering of the phrases in this sentence is somewhat confusing
because "using Association Type set to xxx" is a qualifier of "an
ASSOCIATION object", yet the phrase "can be used to yyy" is between
them. Clearer to say:
As described in [RFC6689], an ASSOCIATION object with Association
Type "Recovery" [RFC4872] can be used to identify the LSPs for
restoration. Also, an ASSOCIATION object with Association Type
"Resource Sharing" [RFC4873] can be used to identify the LSPs for
resource sharing.
<RG> Edited.
--
Generally GMPLS end-to-end recovery schemes have the restoration
LSP
signaled after the failure has been detected and notified on the
working LSP.
Is "signaled" used here in a standard way for GMPLS? It seems that
"the LSP is signaled" is to mean "the LSP is set up", but it took me
some time to realize that. I am used to "X is signaled" meaning "a
signal is sent to X". (There are many instances of this usage.)
<RG> Used term “set up” at most places to be consistent.
It would also be useful for the reader to know the difference between
"protection", "restoration", and "recovery". I think that
"protection" is anti-failure paths set up *before* any failure,
"restoration" is anti-failure paths set up *after* a failure, and
"recovery" includes both "protection" and "restoration". Is this
standard terminology withing GMPLS, or should the reader be warned
about it?
<RG> Added Section 2. [RFC4427] defines terminology for the GMPLS recovery
(protection and restoration).
In non-packet transport networks, as
working LSPs are typically signaled over a nominal path,
What is the meaning of "nominal" here? ("nominal" has a number of
meanings, some of which are largely contradictory.)
can be reverted to the nominal path when the failure is repaired
<RG> Replaced nominal with preferred.
In this context, the meaning of "reverted" is made clear by the
clause
"when the failure is reparied..." -- as opposed to other uses of
"reverted".
In this document, procedures are reviewed for
It's probably better to say "we review procedures for...".
<RG> Edited.
o When using end-to-end recovery with revertive mode, methods for
LSP reversion and resource sharing are summarized in this
document.
A definition of "revert/revertive/reversion" would be useful.
<RG> This is now elaborated in Section 3.2. RFC4427, section 11 has details.
2. Overview
The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
being considered in this document, "fully dynamic rerouting
switches
normal traffic to an alternate LSP that is not even partially
established only after the working LSP failure occurs. The new
alternate route is selected at the LSP head-end node, it may reuse
resources of the failed LSP at intermediate nodes and may include
additional intermediate nodes and/or links".
It is awkward to visually coordinate the quotation marks in this
paragraph. If it is important that the text is quoted from RFC 4872,
given its length, it should be presented as a block-quote. If not,
the quotation marks should be omitted and just the reference given.
If the intention is to quote this text, it should be corrected so
that
it matches the passage from RFC 4872. In particular, the difference
between "fully dynamic rerouting" (in the draft) and "Full LSP
rerouting (or restoration)" needs to be resolved, as there might be a
difference in meaning.
The grammar does not join "The GMPLS end-to-end recovery scheme ..."
and "... fully dynamic rerouting switches normal traffic".
Perhaps something like:
The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
being considered in this document, switches
normal traffic to an alternate LSP that is not even partially
established only after the working LSP failure occurs. The new
alternate route is selected at the LSP head-end node, it may reuse
resources of the failed LSP at intermediate nodes and may include
additional intermediate nodes and/or links.
<RG> Edited the text.
--
Two examples, 1+R and 1+1+R are described in the following
sections.
At this point in the text, it's not clear what category these items
are examples *of*. They aren't single recovery situations, as one
would expect of something labeled "example". They seem to be
sub-categories of "The GMPLS end-to-end recovery scheme". So it
would
be better to use phrasing like "Two forms of end-to-end recovery,
...,
are described in the following sections." or "Two end-to-end recovery
schemes/situations ...".
I assume that other variants of end-to-end recovery exist, and this
draft is applicable to some/many/all of them. To guard against
misunderstanding, it would be worth saying so by adding something
like
"Many other forms of end-to-end recovery exist, many of which [or
whatever] can use these RSVP-TE signaling techniques."
<RG> Edited text with above suggestions.
Given that sections 2.1 and 2.2 form a pair of examples, it might be
useful to distinguish them from "Resource Sharing By Restoration LSP"
(which is not an example, and is not somehow an alternative to 1+R
and
1+1+R) by renumbering the sections to:
2. Overview
2.1. Examples
2.1.1. 1+R Restoration
2.1.2. 1+1+R Restoration
2.2. Resource Sharing By Restoration LSP
In that case, the introductory sentence "Two examples..." would move
to the new section 2.1.
<RG> Updated sections.
Where do the names "1+R" and "1+1+R" come from and do they have
meaning beyond being arbitrary labels?
<RG> It is defined in this document.
Also, given that the 1+1+R case is split into four sub-cases, it's
not
clear that the split between 1+R and 1+1+R is fundamental. It seems
that there is an array of semi-independent choices: whether there is
an ongoing protection LSP, how many restoration LSPs may be
established (no more than the number of ongoing LSPs), how many
failures of original LSPs must happen before restoration LSPs are
established; various combinations of these choices yield various
restoration techniques.
Looked at that way, it might be worth combining both examples into
one. But that has the problem that figure 2 looks considerably
different from figure 1.
OTOH, figure 2 isn't particularly accurate for the situation with two
restoration LSPs, and perhaps those two cases should be split into
another section with its own figure.
<RG> Created section 3.1.2.1 and moved text there.
2.1. 1+R Restoration
Unlike a protection LSP, a restoration LSP is signaled per need
basis.
Is "restoration" a standard word in this field? If not, there should
be some sort of terminology section that states clearly the
difference
between "protection" and "restoration".
<RG> Yes as per [RFC4427].
2.2. 1+1+R Restoration
This paragraph could use rewording to be clearer:
After a failure detection and
notification on a working LSP or protecting LSP, a third LSP on
path
A-H-I-J-Z is established as a restoration LSP.
Since the working LSP has already been described, this should be "the
working LSP".
<RG> Edited the text.
The restoration LSP
in this case provides protection against a second order failure.
It would probably be better to explain what the "second order
failure"
is:
The restoration LSP in this case provides protection against
failure of both the working and protecting LSPs.
<RG> Edited the text.
--
During failure switchover with 1+1+R recovery scheme, in general,
failed LSP resources are not released so that working, protecting
and
restoration LSPs coexist in the network. Nonetheless, a
restoration
LSP with the working LSP it is restoring as well as a restoration
LSP
with the protecting LSP it is restoring can share network
resources.
For ease of reading, better to split the two cases apart, and not use
"it is restoring" as we haven't introduced "restore" as a transitive
verb:
The restoration LSP can share network resources with the working
LSP, and it can share network resources with the protecting LSP.
<RG> Edited the text.
--
Typically, restoration LSP is torn down when the failure on the
original (working or protecting) LSP is repaired and the traffic
is
reverted to the original LSP.
Strictly,
Typically, the restoration LSP is torn down when both the working
and protecting LSPs are repaired and the traffic is reverted to
the
original LSP.
Except that's not correct, either. Probably the practice is that a
restoration LSP is torn down when enough original LSPs are repaired
to
bring the failure count below the threshold that triggered the
setting
up of the restoration LSP (which varies among the four models). But
that's awkward to write, even though that is the correct statement.
<RG> Edited the text.
--
In all models discussed, if the restoration LSP also fails, it is
torn down and a new restoration LSP is signaled.
You can't say "the restoration LSP" because some of the models have
more than one. Better
In all these models, if a restoration LSP also fails, it is torn
down and a new restoration LSP is signaled.
<RG> Edited the text.
2.3. Resource Sharing By Restoration LSP
it allows for resource sharing when the LSP
traffic is dynamically restored after the link failure
The significance of this phrase isn't clear to me. One possible
sense
is that since the failure that is being discussed is the C-D link
failure, then necessarily the resources from A to C can be reused.
But that meaning doesn't work well here, because we haven't
introduced
what the failure is. (Also, you use the phrase "the link failure"
before introducing what the link failure is.)
It seems like the potential for resource sharing is a property of the
LSP that it might not have, but the text doesn't point that out
clearly as an assumption of the example. Perhaps
Using the network shown in Figure 3 as an example, LSP1
(A-B-C-D-E)
is the working LSP, and assume it allows for resource sharing when
the LSP
traffic is dynamically restored.
<RG> Edited the text.
--
In this case, A-B-C-F-G-E is
chosen as the restoration LSP path and the resources on the path
segment A-B-C are re-used by this LSP when the working LSP is not
torn down (e.g. in 1+R recovery scheme).
"when" isn't the right word here, because the re-using the resources
doesn't wait for the working LSP to be not torn down. Perhaps:
In this case, A-B-C-F-G-E is
chosen as the restoration LSP path and the resources on the path
segment A-B-C are re-used by this LSP. The working LSP is not
torn down.
<EG> Edited the text.
3.1. Restoration LSP Association
For example, when a restoration
LSP is signaled for a failed working LSP, the ASSOCIATION object
in
the restoration LSP contains the Association ID and Association
Source set to the Association ID and Association Source signaled
in
the working LSP for the "Recovery" Association Type.
As a general question, where does the association object live?
Clearly it isn't "in the restoration LSP". It would be useful to
mention this for readers who aren't fully familiar with the
background:
For example, when a restoration LSP is signaled for a failed
working LSP, the ASSOCIATION object in the Path message that
establishes the restoration LSP contains ...
<RG> Edited the text at multiple places.
3.2. Resource Sharing-based Restoration LSP Setup
As described in [RFC3209], Section 2.5, the purpose of
make-before-
break is "not to disrupt traffic, or adversely impact network
operations while TE tunnel rerouting is in progress". In
non-packet
transport networks, the label has a mapping into the data plane
resource used and the nodes along the LSP need to send triggering
commands to data plane for setting up cross-connections
accordingly
during the RSVP-TE signaling procedure. Due to the nature of the
non-packet transport networks, a node may not be able to fulfill
this
purpose when sharing resources in some scenarios.
I can understand this paragraph, but I think it could benefit from a
number of edits. The first is to remove the quotation marks, since
the purpose is not to emphasize that RFC 3209 said those words, but
rather that 3209 stated the same concept. And I think some of the
explanation can be omitted without losing clarity.
As described in [RFC3209], Section 2.5, the purpose of
make-before-
break is not to disrupt traffic, or adversely impact network
operations while TE tunnel rerouting is in progress. In
non-packet
transport networks during the RSVP-TE setup procedure, the
nodes along the LSP set up cross-connections accordingly. Because
a
cross-connection cannot simultaneously connect a shared resource
to
different resources in two alternative LSPs, nodes may not be able
to
fulfill this promise when LSPs share resources.
<RG> Edited the text.
--
---------+---------------------------------------------------------
Category | Node Behavior during Restoration LSP Setup
---------+---------------------------------------------------------
C1 + Reusing existing resource on both input and output
+ interfaces (nodes A & B in Figure 3).
+
+ This type of node needs to book the existing
+ resources and no cross-connection setup
+ command is needed.
---------+---------------------------------------------------------
This would be prettier if most of the +'s were turned into |'s:
<RG> Edited the table.
---------+---------------------------------------------------------
Category | Node Behavior during Restoration LSP Setup
---------+---------------------------------------------------------
C1 | Reusing existing resource on both input and output
| interfaces (nodes A & B in Figure 3).
|
| This type of node needs to book the existing
| resources and no cross-connection setup
| command is needed.
---------+---------------------------------------------------------
Note that the items in the second column of the table are composed of
two parts: The first part is condition that defines which nodes are
in that category, and the second part is the actions that will be
taken by such nodes. Ideally, these would be broken out as separate
columns. (The current first column provides the labels C1, C2, and
C3, but those aren't references anywhere in the document, and could
be
omitted to save space.) That revises the table to look like this:
------------------------------------+------------------------------
Situation | Actions
------------------------------------+------------------------------
Reusing existing resources | Book the existing resources.
on both input and output interfaces | No cross-connection setup
is
(nodes A & B in Figure 3). | needed.
------------------------------------+------------------------------
Reusing existing resource only on | Book the resources.
one of the interfaces (either input | Re-configure the
cross-connection
or output) and uses new resource on | to connect the re-used
resource
the other interface. | to the new resource.
(nodes C & E in Figure 3). |
------------------------------------+------------------------------
Using new resources on both | Book the new resources.
interfaces. | Send the cross-connection
setup
(nodes F & G in Figure 3). | command on both interfaces.
------------------------------------+------------------------------
<RG> Edited the table.
Is the meaning of "book" well-known? I find no use of it elsewhere
in
this document or in any of the references.
<RG> Replaced “book” with “reserve”.
Depending on whether the resource is re-used or not, the node
behaviors differ.
Of course, the different behavior is only because we are here
optimizing the establishment of the new LSP. A node could send a
command to cross-connect two resources that are already connected.
This deviates from normal LSP setup since some
nodes do not need to re-configure the cross-connection, and it
should
not be viewed as an error.
Why would this (not sending a command to connect things that are
already connected) be considered an error under any circumstances?
<RG> Removed the line to avoid confusion.
3.3. LSP Reversion
Is "reversion" a standard term?
<RG> Yes. RFC4427, Section 4.11.
If the end-to-end LSP recovery is revertive, as described in
Section 2 ...
I'm not sure how the phrase "If the end-to-end LSP recovery is
revertive" works. "Recovery" seems to be a general term for
techniques to recover from link failures and the like. Is this
describing a "revertive" recovery method, or is it describing an
instance of recovery which is somehow "revertive"?
Compare to "revert", which seems to be the action of putting the
traffic back on the original/protection LSP once its functionality is
restored. I would expect that behavior to be universal.
<RG> Edited the text.
1. Make-while-break Reversion, where resources associated with a
working or protecting LSP are reconfigured while removing
reservations for the restoration LSP.
It's not clear to me what sort of reconfiguring is being discussed.
Assuming that "reversion" means "when the working/protecting LSP
starts working again, traffic is restored to that path", its not
clear
what sort of reconfiguration would be needed, as the
working/protecting LSP already exists.
I suspect that this issue shows up when the working/protecting LSP
shares resources with the restoration LSP, and moving traffic to the
restoration LSP may require reconfiguring resources, and so moving
traffic back to working/protecting LSP may require reversing that
reconfiguration. But the initial reconfiguration has not been
mentioned. Should some sort of general description be put in
"Resource Sharing By Restoration LSP" of the possible need to
reconfigure when moving traffic to or from a restoration LSP?
(This is all rather obvious, but it would help if it was clearly
described.)
<RG> Added text in Section 3.2.
3.3.1. Make-while-break Reversion
Removing reservations for restoration LSP
triggers reconfiguration of resources associated with a working or
protecting LSP on every node where resources are shared.
Could you add an explanation or pointer why this is so? It seems
that
for this to be true, the reservation process must broadcast an
explicit prioritization between the new (restorative) reservation and
the old (working) reservation, because the node that is reconfigured
has to remember both reservations, and revert to the working one when
the restorative one is deleted. It'd be useful for the naive reader
to know where in RSVP-TE that information is broadcast and/or how
RSVP-TE specified that nodes have to remember that information.
<RG> Added text to state that working LSP states not torn down.
Deletion of restoration LSPs is not a revertive process.
What is the meaning of "revertive process" here? It doesn't seem to
match the sense of "revertive" as used elsewhere.
<RG> Removed this line to avoid confusion.
In
particular, if RSVP packets are lost due to nodal or DCN failures
it
is possible for an LSP to be only partially deleted.
"nodal" should probably be "node".
What is "DCN"? I can't find it in any of the referenced RFCs. Does
"link" work as a replacement?
<RG> Corrected the text.
3.3.2. Make-before-break Reversion
Instead of relying on deletion of
restoration LSP, the head-end chooses to establish a new LSP to
reconfigure resources on the working or protection LSP path, and
uses
identical ASSOCIATION and PROTECTION objects from the LSP it is
replacing.
This could be made clearer by consistently labeling the enw LSP as
the
"reversion" LSP. Also, state explicitly that its resources exactly
duplicate the resources of the working/protection LSP that is being
reverted:
Instead of relying on deletion of the
restoration LSP, the head-end chooses to establish a new
"reversion" LSP that duplicates the configuration of the
resources on the working or protection LSP, and uses
identical ASSOCIATION and PROTECTION objects for that LSP.
<RG> Edited the text.
--
Reversion LSP is sharing resources both with working and
restoration LSPs.
Better
The reversion LSP shares all of the resources of the
working/protection
LSP and may share resources with the restoration LSP.
<RG> Edited the text.
--
Hence, after reversion LSP
is created, data plane configuration essentially reflects working
or
protecting LSP reservations.
It seems like "essentially" is not needed, because the data plane
configuration will *exactly* reflect the working/protecting LSP
reservations. Or are there minor variations in how reservations are
done that may not be exactly duplicated by the reversion LSP?
<RG> Edited the text.
After "make" part is finished, working and restoration LSPs are
torn
down.
Perhaps emphasize "the original working/protection and restoration
LSPs are torn down", as the reversion LSP becomes the new
working/protection LSP.
<RG> Edited the text.
o Rollback
If "make" part fails, (existing) restoration LSP will still be
used
to carry existing traffic. Same logic applies here as for any MBB
operation failure.
The reasoning here is not clear to me. If the "make" operation
fails,
some of the nodes may be configured for the restoration LSP, while
others will be configured for the restoration LSP. Or is it implicit
that creating LSPs is an atomic operation network-wide, that
incomplete LSP creations will be completely purged from the network?
If the latter is true, then the core of this discussion is that
creating LSPs is atomic across the network, but *deleting* LSPs is
not
(and so make-while-break can fail to work). If that difference is
true, it should be said explicitly somewhere near the beginning of
section 3.3, as that fact is what is driving the whole discussion.
<RG> This is because the original restoration LSP is not torn down in this
(MBB) case (as opposed the MWB). But yes, the node will need to be
reconfigured if needed.
Thanks,
Rakesh (for authors and contributors)
[END]
draft-ietf-teas-gmpls-resource-sharing-proc-07.txt
Description: draft-ietf-teas-gmpls-resource-sharing-proc-07.txt
<<< text/html; name="Diff_ draft-ietf-teas-gmpls-resource-sharing-proc-06.txt - draft-ietf-teas-gmpls-resource-sharing-proc-07.html": Unrecognized >>>