ietf
[Top] [All Lists]

Review of draft-ietf-pals-endpoint-fast-protection-04

2016-12-05 18:30:33
Reviewer: David Black
Review result: Ready with Issues

I've reviewed this document as part of TSV-ART's ongoing effort to
review key IETF documents. These comments were written primarily for
the transport area directors, but are copied to the document's authors
for their information and to allow them to address any issues raised.
When done at the time of IETF Last Call, the authors should consider
this review together with any other last-call comments they receive.
Please always CC tsv-art(_at_)ietf(_dot_)org if you reply to or forward this
review.

This draft specifies local pseudowire (PW) repair mechanisms to
quickly react to PW egress failures by rerouting traffic around the
failure until slower-to-react repair mechanisms at larger scope are
able to effect longer term repairs, e.g., via network topology
changes.

-- TSV-ART review comments:

I found a couple of minor transport-related issues, both of which
should be resolvable with modest amounts of additional explanation:

* ECMP: The ECMP discussion in Section 4.1 on Applicability takes a
conservative approach to avoiding packet reordering by recommending
(SHOULD) that the entire ECMP set be rerouted as part of local repair.
 It's not clear what sort of ECMP is involved, as that acronym is used
without a reference (or even expansion), so I'd suggest citing a
reference.   If the ECMP used is flow-aware so that reordering across
ECMP branches within an ECMP set does not cause reordering within any
of the flows involved, then it ought to be safe from a reordering
perspective to reroute an ECMP branch or set of branches that are less
than the full ECMP set, although such partial rerouting could cause
potentially undesirable forwarding latency differences within the ECMP
set.  This ought to be discussed, as situations in which rerouting the
entire ECMP bundle is overly conservative seem likely to arise in
practice.

* Traffic Engineering: Considering the intended speed of local repair,
"order of tens of milliseconds" in the abstract, the bandwidth used by
the repair paths has to be provisioned in advance of any failure that
causes repair path usage - traffic engineering is a likely means of
provisioning that bandwidth.  I see "TE domain," "TE metric" and "TE
path," which I assume refer to Traffic Engineering, but that TE
acronym is not expanded, and I did not find text requiring traffic
engineering and/or advance (bandwidth) provisioning of repair paths. 
I assume that this advance bandwidth provisioning of repair paths is
intended as part of local repair, as not doing that invites immediate
repair path failure due to lack of forwarding resources, which is
definitely not desired.  A sentence or two ought to be added to point
this bandwidth provisioning requirement out, possibly in Section 4.1
(Applicability).  Adding that text would also reinforce the conclusion
in the Security Considerations section that local repair reroutes are
not a security threat, as the new text would add the rationale that
local repair reroutes are anticipated and planned for by the network
operator's traffic engineering.

--  Other comments:

* Having found two acronyms that were not expanded, I'd suggest a
general look for such acronyms.   OTOH, this is an area of network
technology where many acronyms are in common use, and hence expansion
of every acronym on first use may be excessive - among the ways of
avoiding this could be citation of a reference at the start of Section
3 where commonly used PW terms and acronyms are defined.


<Prev in Thread] Current Thread [Next in Thread>