Hi Lionel
Many thanks for a very thorough review. I'm picking up this thread and
replying as PCE working group chair, as the authors are unavailable. I
apologise for the delay.
Please see my proposed resolutions inline below, marked with "Jon>"
Best regards
Jon
-----Original Message-----
From: Lionel Morand [mailto:lionel(_dot_)morand(_at_)orange(_dot_)com]
Sent: 16 March 2017 09:46
To: ops-dir(_at_)ietf(_dot_)org
Cc: draft-ietf-pce-stateful-pce(_dot_)all(_at_)ietf(_dot_)org;
pce(_at_)ietf(_dot_)org; ietf(_at_)ietf(_dot_)org
Subject: Review of draft-ietf-pce-stateful-pce-18
Reviewer: Lionel Morand
Review result: Has Issues
I have reviewed this document as part of the Operational directorate's ongoing
effort to review all IETF documents being processed by the IESG. These
comments were written with the intent of improving the operational aspects of
the IETF drafts. Comments that are not addressed in last call may be included
in AD reviews during the IESG review. Document editors and WG chairs should
treat these comments just like any other last call comments.
I've pointed out some "issues" that might not be real issues for people
involved in PCE but that could be at least clarified for readers of this
document. Detailed comments are provided below.
Regards,
Lionel
********************
2. Terminology
This document uses the following terms defined in [RFC5440]: PCC,
PCE, PCEP Peer, PCEP Speaker.
This document uses the following terms defined in [RFC4655]: TED.
This document uses the following terms defined in [RFC3031]: LSP.
This document uses the following terms defined in
[I-D.ietf-pce-stateful-pce-app]: Stateful PCE, Passive Stateful PCE,
Active Stateful PCE, Delegation, LSP State Database.
[LM] I didn't find a clear definition of the LSP state, except through the
definition of the LSP State database in ietf-pce-stateful-pce-app, i.e.
The second is the LSP
State Database (LSP-DB), in which a PCE stores attributes of all
active LSPs in the network, such as their paths through the network,
bandwidth/resource usage, switching types and LSP constraints.
As this document heavily relies on this LSP state concept, it could be useful
to (re-)define it in this document or to find a correct reference for it.
Jon> The definition of LSP State Database given by draft-ietf-pce-stateful-app
(now RFC 8051) does contain a pretty good description of the LSP State ("paths
through the network, bandwidth/resource usage, switching types and LSP
constraints "). I know it's written informally ("such as...") but I think it's
good enough.
=====
3.2. Objectives
The objectives for the protocol extensions to support stateful PCE
described in this document are as follows:
[...]
o Allow a PCC to delegate control of its LSPs to an active
stateful
PCE such that a given LSP is under the control of a single PCE
at
any given time. A PCC may revoke this delegation at any time
during the lifetime of the LSP. If LSP delegation is revoked
while the PCEP session is up, the PCC MUST notify the PCE about
the revocation. A PCE may return an LSP delegation at any
point
during the lifetime of the PCEP session.
[LM] I'm assuming that the PCE returning the delegation has also to
notify the PCC. If so, maybe:
"the If LSP delegation is returned by the PCE while the PCEP
session is up, the PCE MUST notify the PCE about the
revocation."
[LM] the bullet point could be then splitted into two bullets, one for
PCC, one for PCE.
Jon> ACK. This would probably be best as two sub-bullets.
====
5.4. Capability Advertisement
During PCEP Initialization Phase, PCEP Speakers (PCE or PCC)
advertise their support of stateful PCEP extensions. A PCEP
Speaker
includes the "Stateful PCE Capability" TLV, described in
Section 7.1.1, in the OPEN Object to advertise its support for
PCEP
stateful extensions. The Stateful Capability TLV includes the
'LSP
Update' Flag that indicates whether the PCEP Speaker supports LSP
parameter updates.
The presence of the Stateful PCE Capability TLV in PCC's OPEN
Object
indicates that the PCC is willing to send LSP State Reports
whenever
LSP parameters or operational status changes.
The presence of the Stateful PCE Capability TLV in PCE's OPEN
message
indicates that the PCE is interested in receiving LSP State
Reports
whenever LSP parameters or operational status changes.
[LM] is it "willing/interested for" or just a capability indication?
It is not the same thing from a procedure point of view.
Jon> It is "willing / interested for" as written - it's not just a capability.
If the TLV is included in each OPEN message then the LSP state is synchronized
automatically. If a device has the capability but does not want to do it for
some reason then it omits the TLV.
=====
The PCEP extensions for stateful PCEs MUST NOT be used if one or
both
PCEP Speakers have not included the Stateful PCE Capability TLV in
their respective OPEN message. If the PCEP Speaker on the PCC
supports the extensions of this draft but did not advertise this
capability, then upon receipt of PCUpd message from the PCE, it
MUST
generate a PCErr with error-type 19 (Invalid Operation),
error-value
2 (Attempted LSP Update Request if the stateful PCE capability was
not advertised)(see Section 8.5) and it SHOULD terminate the PCEP
session. If the PCEP Speaker on the PCE supports the extensions
of
this draft but did not advertise this capability, then upon
receipt
of a PCRpt message from the PCC, it MUST generate a PCErr with
error-
type 19 (Invalid Operation), error-value 5 (Attempted LSP State
Report if active stateful PCE capability was not advertised) (see
Section 8.5) and it SHOULD terminate the PCEP session.
[LM] why the recommendation for closing the session if an explicit
error is sent back? The session could remain open i.e. except stateful
PCEP extensions,everything else would be fine. If the session
termination is the right thing to do, as we are in a clear error case
(as the PCEP speaker should not send the report or the update), the
SHOULD should probably be a MUST hee.
Jon> As you say, the session could limp on if the implementation deems it
necessary, but this might lead to lots of spurious messages and error logs at
both ends, so it is probably not a good idea. We don't want to outright forbid
this, hence the SHOULD.
=====
[LM] Is there any reason to use a specific error in such a case? The
PCE could just behave as a PCE not supporting the feature. Why is it
important for the supporting PCEP speaker to make the difference? The
generic behavior described in RFC5440 is not enough?
Jon> I think it's just that the specific error code may aid in diagnosing the
problem quickly.
=====
[LM] active/passive mode are not advertized in PCEP.
s/if active stateful PCE capability was not advertised/if stateful PCE
capability was not advertised
Jon> ACK
=====
From RFC5440: "6.9. Reception of Unknown Messages
A PCEP implementation that receives an unrecognized PCEP message
MUST
send a PCErr message with Error-value=2 (capability not
supported).
If a PCC/PCE receives unrecognized messages at a rate equal or
greater than MAX-UNKNOWN-MESSAGES unknown message requests per
minute, the PCC/PCE MUST send a PCEP CLOSE message with close
value="Reception of an unacceptable number of unknown PCEP
message".
A RECOMMENDED value for MAX-UNKNOWN-MESSAGES is 5. The PCC/PCE
MUST
close the TCP session and MUST NOT send any further PCEP messages
on
the PCEP session."
[LM] If it is the case, the error handling would be the same between a
PCEP speaker supporting the feature but not advertizing it and a PCEP
speaker not supporting the feature. And it would not be possible
possible to use the specific error responses to infer the capability
supported by the other node.
Jon> I'm not sure which part of the draft you are commenting on, but I disagree
with the premise. If a PCEP speaker supports these extensions but does not
advertise the capability, then the messages are not unknown to it, it just does
not want to see them. So it is free to handle the error differently to the
default RFC 5440 case.
=====
Note that even if the update
capability has not been advertised, a PCE can still accept LSP
Status
Reports from a PCC and build and maintain an up to date view of
the
state of the PCC's LSPs.
[LM] I don't undersand. Is it not in contradiction with
"If the PCEP Speaker on the PCE supports the extensions of
this draft but did not advertise this capability, then upon
receipt
of a PCRpt message from the PCC, it MUST generate a PCErr with
error-
type 19 (Invalid Operation), error-value 5 (Attempted LSP State
Report if active stateful PCE capability was not advertised) (see
Section 8.5) and it SHOULD terminate the PCEP session."
Or does it mean that there is another way than PCRpt message for the
PCC to send LSP status reports to the PCE?
Jon> ACK. I think that the statement in the draft is bogus and I propose to
delete this sentence from it.
=====
[LM] In this section, it is not described how non-supporting PCEP
speaker will hanlded the new TLV. It could be said that unrecognized
TLVs will be ignored (as per RFC5440) and OPEN messages will be
handled as per RFC5440 (if the comment above is accepted).
Jon> But I don't think we need to specify how a non-supporting speaker will
behave. RFC 5440 already does that.
=====
[LM] Would it be useful to discover (using another TLV) whether the
PCE is an active/passive stateful PCE, as in IGP-based capabilities
discovery mechanism?
Jon> This can be inferred immediately from the U flag in the
STATEFUL-PCE-CAPABILITY TLV. Passive mode is synonymous with not sending /
handling PCUpd messages.
=====
5.5. IGP Extensions for Stateful PCE Capabilities Advertisement
When PCCs are LSRs participating in the IGP (OSPF or IS-IS), and
PCEs
are either LSRs or servers also participating in the IGP, an
effective mechanism for PCE discovery within an IGP routing domain
consists of utilizing IGP advertisements.
[...]
Note that while active and passive stateful PCE capabilities may
be
advertised during discovery, PCEP Speakers that wish to use
stateful
PCEP MUST negotiate stateful PCEP capabilities during PCEP session
setup, as specified in the current document. A PCC MAY initiate
stateful PCEP capability negotiation at PCEP session setup even if
it
did not receive any IGP PCE capability advertisements.
[LM] As said above, an as stated in section 5.4:
"The PCEP extensions for stateful PCEs MUST NOT be used if one or
both
PCEP Speakers have not included the Stateful PCE Capability TLV in
their respective OPEN message."
What is the real added-value of this IGP-based mechanism? Only to be
able to identify active PCE/passive PCE mode?
Jon> Yes, correct. The PCC may prefer to use an active stateful PCE, or may
want to avoid using one.
=====
5.6. State Synchronization
[LM] "State" could be misleading. "LSP State Sync" would be more
approriate/accurate I think.
Jon> But the very first sentence below clears this up immediately, so I don't
think it's worth changing now.
=====
The purpose of State Synchronization is to provide a
checkpoint-in-
time state replica of a PCC's LSP state in a PCE. State
Synchronization is performed immediately after the Initialization
phase ([RFC5440]).
During State Synchronization, a PCC first takes a snapshot of the
state of its LSPs state, then sends the snapshot to a PCE in a
sequence of LSP State Reports. Each LSP State Report sent during
State Synchronization has the SYNC Flag in the LSP Object set to
1.
The set of LSPs for which state is synchronized with a PCE is
determined by advertised stateful PCEP capabilities and PCC's
local
configuration (see more details in Section 9.1).
[LM] It seems that :
"The set of LSPs for which state is synchronized"
should be:
"The set of LSPs for which state is to be synchronized"
Jon> To my mind they carry the same meaning.
=====
[LM] I don't see how the "advertised stateful PCEP capabilities" have
an effect on the set of LSP to synchronize. The capabilities adv is
only used to discover stateful PCE capabilities and see if the related
PCEP extensions can be used. The identification of the LSPs to
synchronize seems to be only based on policies configured in the PCC.
Jon> Agree this is confusing. Other documents do define capabilities that
affect the set of LSP to synchronize e.g.
[I-D.ietf-pce-stateful-sync-optimizations] which allows a subset to be resynced
if state is retained across a PCEP session reset. How about we change this:
OLD
The set of LSPs for which state is synchronized with a PCE is
determined by advertised stateful PCEP capabilities and PCC's local
configuration (see more details in Section 9.1).
NEW
The set of LSPs for which state is synchronized with a PCE is
determined by the PCC's local configuration (see more details in Section 9.1)
and MAY also be determined by stateful PCEP capabilities defined
in other documents, such as [I-D.ietf-pce-stateful-sync-optimizations].
END NEW
=====
The end of synchronization marker is a PCRpt message with the SYNC
Flag set to 0 for an LSP Object with PLSP-ID equal to the reserved
value 0 (see Section 7.3). In this case, the LSP Object SHOULD
NOT
include the SYMBOLIC-PATH-NAME TLV and SHOULD include the LSP-
IDENTIFIERS TLV with the special value of all zeroes. The PCRpt
message MUST include an empty ERO as its intended path and SHOULD
NOT
include the optional RRO object for its actual path.
[LM] What is the reason for the use of "SHOULD" here, instead of
"MUST"?
If the PCC has
no state to synchronize, it SHOULD only send the end of
synchronization marker.
[LM] Is it when there is no active LSP? If it is, it could be useful
to clarify it. And it is maybe not so related to the text just before.
It could be clarified in a separate paragraph.
Jon> As discussed above, local policy plus capabilities may affect the set of
LSPs that are synchronized. So I think "no state to synchronize" is more
accurate that "no active LSPs".
====
A PCE SHOULD NOT send PCUpd messages to a PCC before State
Synchronization is complete. A PCC SHOULD NOT send PCReq messages
to
a PCE before State Synchronization is complete. This is to allow
the
PCE to get the best possible view of the network before it starts
computing new paths.
[LM] it is obviously assumed that this requirement is true for PCC/PCE
explicitly advertizing support of stateful PCE capability.
If the PCC encounters a problem which prevents it from completing
the
state transfer, it MUST send a PCErr message with error-type 20
(LSP
State Synchronization Error) and error-value 5 (indicating an
internal PCC error) to the PCE and terminate the session.
[LM] s/state transfer/LSP state synchornization
Jon> ACK
=====
5.7.1. Delegating an LSP
A PCC delegates an LSP to a PCE by setting the Delegate flag in
LSP
State Report to 1. If the PCE does not accept the LSP Delegation,
it
MUST immediately respond with an empty LSP Update Request which
has
the Delegate flag set to 0. If the PCE accepts the LSP
Delegation,
it MUST set the Delegate flag to 1 when it sends an LSP Update
Request for the delegated LSP (note that this may occur at a later
time). The PCE MAY also immediately acknowledge a delegation by
sending an empty LSP Update Request which has the Delegate flag
set
to 1.
[LM] if the delegation is not aceepted by the PCE, I assume that the
PCC should not set the Delegate flag in subsequent LSP state report.
If it is, this could be clarified somewhere in this section.
Jon> I don't think this means the delegation is permanently refused - I the PCC
could still retry the delegation.
=====
[..]
Note that for an LSP to remain delegated to a PCE, the PCC MUST
set
the Delegate flag to 1 on each LSP Status Report sent to the PCE.
[LM] s/each LSP Status Report/each LSP State Report for this LSP.
Jon> ACK
=====
5.7.2.1. Explicit Revocation
When a PCC decides that a PCE is no longer permitted to modify an
LSP, it revokes that LSP's delegation to the PCE. A PCC may
revoke
an LSP delegation at any time during the LSP's life time. A PCC
revoking an LSP delegation MAY immediately clear the LSP state
provided by the PCE, but to avoid traffic loss, it SHOULD do so in
a
make-before-break fashion.
[LM] After PCE delegation revocation, is there really a requirement
for LSP state clean-up? The revocation is used to remove the
authorization of LSP state update to the PCE but I don't see why the
PCC would clear LSP state after revocation. The LSP state can be
updated using operator-defined default parameters, right?
Jon> Correct, this text means to say that the PCC "MAY remove the updated
parameters provided by the PCE and revert to the operator-defined parameters".
I will update the text.
=====
If the PCC has received but not yet acted
on PCUpd messages from the PCE for the LSP whose delegation is
being
revoked, then it SHOULD ignore these PCUpd messages when
processing
the message queue. All effects of all messages for which
processing
started before the revocation took place MUST be allowed to
complete
and the result MUST be given the same treatment as any LSP that
had
been previously delegated to the PCE (e.g. the state MAY be
immediately cleared).
[LM] If I correctly understood the text above, after revocation of the
PCE delegation,
- any PCUpd should be rejected and not ignored
- the LSP state(s) that was delegated to the PCE cannot be changed
until all the messages in the message queue have been processed.
If I'm correct, the text above could be clarified.
Jon> No, the statement is:
- any PCUpd which the PCC started processing before revocation took place
SHOULD be processed to completion before the PCE-provided state is removed
- all other PCUpd SHOULD be ignored.
=====
Any further PCUpd messages from the PCE are
handled according to the PCUpd procedures described in this
document.
[LM] Not understood. Further PCUpd "will result in the PCC sending a
PCErr message" as said after the figure.
Jon> You're correct.
=====
5.7.2.2. Revocation on Redelegation Timeout
When a PCC's PCEP session with a PCE terminates unexpectedly, the
PCC
MUST wait the time interval specified in Redelegation Timeout
Interval before revoking LSP delegations to that PCE and
attempting
to redelegate LSPs to an alternate PCE. If a PCEP session with
the
original PCE can be reestablished before the Redelegation Timeout
Interval timer expires, LSP delegations to the PCE remain intact.
[LM] In case of PCEP session interruption, is there a requirement for
the PCE to update delegated LSP states in order to ensure the
synchronization between states in the PCE and in the PCC?
Jon> Once the PCEP session is re-established, the PCC will audit all current
state to the PCE, and then the PCE is free to update any delegated LSP if it
finds a mismatch between its desired state and the current LSP state.
=====
Likewise, when a PCC's PCEP session with a PCE terminates
unexpectedly, the PCC MUST wait for the State Timeout Interval
before
flushing any LSP state associated with that PCE.
[LM] it should be clarified that the "flushing" applies only if there
is no other PCE. Otherwise, see section 5.7.4
Jon> ACK
=====
Note that the State
Timeout Interval timer may expire before the PCC has redelegated
the
LSPs to another PCE, for example if a PCC is not connected to any
active stateful PCE or if no connected active stateful PCE accepts
the delegation.
[LM] Not sure to understand. Is it "if a PCC is not connected to any
OTHER active stateful PCE or if no OTHER connected active stateful PCE
accepts the delegation?
Jon> Yes that's right, but the word "other" seems redundant to me. The
original session has disconnected already so it is automatically excluded by
this sentence.
======
In this case, the PCC MUST flush any LSP state set
by the PCE upon expiration of the State Timeout Interval and
revert
to operator-defined default parameters or behaviors. This
operation
SHOULD be done in a make-before-break fashion.
[LM] I think it is important to make the distinction between PCC with
only one PCE and PCC with N PCE. The "Note that" could be put in a
separate section.
Jon> This sentence applies to both cases - if the LSP can't be redelegated
(either because there was only 1 PCE or because the other N-1 PCEs have refused
the delegation) then this is what the PCC must do. I don't see why we need to
distinguish these cases.
=====
The State Timeout Interval MUST be greater than or equal to the
Redelegation Timeout Interval and MAY be set to infinity (meaning
that until the PCC specifically takes action to change the
parameters
set by the PCE, they will remain intact).
[LM] reading "the State Timeout Interval timer may expire before the
PCC has redelegated the LSPs to another PCE" could be understood as
STI timer < RTI timer. And here, it is stated STI timer >= RTI timer.
Depending on the clarification on the previous comment, this text
could be also clarified (or not)
Jon> The PCC starts redelegating the LSP when the RTI timer expires but this is
not instantaneous. It is possible that the STI timer expires before
redelegation has succeeded. So I don't think there's a contradiction.
=====
5.7.3. Returning a Delegation
In order to keep a delegation, a PCE MUST set the Delegate flag to
1
on each LSP Update Request sent to the PCC. A PCE that no longer
wishes to update an LSP's parameters SHOULD return the LSP
delegation
back to the PCC by sending an empty LSP Update Request which has
the
Delegate flag set to 0. If a PCC receives an LSP Update Request
with
the Delegate flag set to 0 (whether the LSP Update Request is
empty
or not), it MUST treat this as a delegation return.
+-+-+ +-+-+
|PCC| |PCE|
+-+-+ +-+-+
| |
|---PCRpt, Delegate=1--->| LSP delegated
| . |
| . |
| . |
|<--PCUpd, Delegate=0----| Delegation returned
| |
|---PCRpt, Delegate=0--->| No delegation for
LSP
| |
Figure 6: Returning a Delegation
If a PCC cannot delegate an LSP to a PCE (for example, if a PCC is
not connected to any active stateful PCE or if no connected active
stateful PCE accepts the delegation), the LSP delegation on the
PCC
will time out within a configurable Redelegation Timeout Interval
and
the PCC MUST flush any LSP state set by a PCE at the expiration of
the State Timeout Interval.
[LM] same comment: is it "for example, if a PCC is not connected to
any OTHER active stateful PCE or if no OTHER connected active stateful
PCE accepts the delegation"?
Jon> As before, yes, and I don't think the word "other" adds anything.
=====
[LM] "the PCC MUST flush any LSP state set" should be completed by
"and revert to operator-defined default parameters or behaviors",
right?
Jon> Correct, will update.
=====
5.7.5. Redelegation on PCE Failure
[...]
If the PCE crashes but recovers within the Redelegation Timeout,
both
the delegation state and the LSP state are kept intact.
[LM] "Recover" here seems to be associated with the PCEP session (as
linked to the Redelegation Timeout), not with the LSP states
maintained by the PCE.
Jon> Correct
=====
[LM] How can the PCC be ensured that the PCE has not been impacted by
the stop/restart and that LSP states are intact? Is there any need for
a sanity check?
Jon> The sanity check happens when the PCC audits all its LSP state to the PCE
after the session restarts.
=====
If the PCE crashes but does not recover within the Redelegation
Timeout, the delegation state is returned to the PCC. If the PCC
can
redelegate the LSPs to another PCE, and that PCE accepts the
delegations, there will be no change in LSP state. If the PCC
cannot
redelegate the LSPs to another PCE, then upon expiration of the
State
Timeout Interval, the state set by the PCE is flushed, which may
cause change in the LSP state.
[LM] the last sentence is difficult to understand. How to understand
"flush the state MAY cause change in the LSP state"? It would like
talking about two different states in the PCC.
Jon> I think it just means that the LSP reverts to operator configured
parameters, as discussed elsewhere. I'll clarify it.
=====
[LM] about "flushed", should we add "and revert to operator-defined
default parameters or behaviors"?
Jon> Yes.
=====
[...]
If there is a standby PCE, the Redelegation Timeout may be set to
0
through policy on the PCC, causing the LSPs to be redelegated
immediately to the PCC, which can delegate them immediately to the
standby PCE. Assuming the State Timeout Interval is greater than
the
Redelegation Timeout, and assuming the standby PCE takes similar
decisions as the failed PCE, the LSP state will be kept intact.
[LM] the first "assuming" is not an assumption but a requirement,
according to "The State Timeout Interval MUST be greater than or equal
to the Redelegation Timeout Interval"
Jon> We could have STI = RTI, or STI very close to RTI, in which case it is a
race between redelegation completing and state timeout expiring.
To make this clearer I will change it to "Assuming that the PCC can redelegate
the LSP to the standby PCE within the State Timeout Interval, and..."
=====
5.8.1. Passive Stateful PCE Path Computation Request/Response
+-+-+ +-+-+
|PCC| |PCE|
+-+-+ +-+-+
| |
1) Path computation |----- PCReq message --->|
request sent to | |2) Path computation
PCE | | request received,
| | path computed
| |
|<---- PCRep message ----|3) Computed paths
| (Positive reply) | sent to the PCC
| (Negative reply) |
4) LSP Status change| |
event | |
| |
5) LSP Status Report|----- PCRpt message --->|
sent to all | . |
stateful PCEs | . |
| . |
6) Repeat for each |----- PCRpt message --->|
LSP status change| |
| |
Figure 7: Passive Stateful PCE Path Computation Request/Response
[LM] in the figure, please use "state" instead of "status"
Jon> ACK
=====
7.1. OPEN Object
This document defines one new optional TLVs for use in the OPEN
Object.
s/one new optional TLVs/one new optional TLV
Jon> ACK
=====
7.2. SRP Object
The SRP (Stateful PCE Request Parameters) object MUST be carried
within PCUpd messages and MAY be carried within PCRpt and PCErr
messages. The SRP object is used to correlate between update
requests sent by the PCE and the error reports and state reports
sent
by the PCC.
[LM] Should we add the following text at the end of the section?
"Optional TLVs may be included within the SRP object body. The
specification of such TLVs is outside the scope of this document."
Jon> Yes - will add it with a "MAY".
=====
7.3. LSP Object
[...]
TLVs that may be included in the LSP Object are described in the
following sections.
[LM] I think that it is possible to add extra optional TLV, not
defined in this document. This should be clarified if I'm correct.
Jon> Yes.