Re: Review of draft-ietf-pce-monitoring-04.txt

Dear Matt,

On Apr 28, 2009, at 5:44 AM, Matt Mathis wrote:

I've reviewed draft-ietf-pce-monitoring-04.txt as part of thetransport area directorate's ongoing effort to review key IETFdocuments. These comments were written primarily for the transportarea directors, but are copied to the document's authors for theirinformation and to allow them to address any issues raised. Theauthors should consider this review together with any other last-call comments they receive. Please always CC tsv-dir(_at_)ietf(_dot_)org if youreply to or forward this review.
draft-ietf-pce-monitoring-04.txt describes procedures and extensionsto the Path Computation Element Protocol (PCEP) for monitoring thestate of the path computation chain for troubleshooting andperformance monitoring purposes.
It is designed specifically to carry information about PCE liveness,processing time and congestion.
However this draft does not define any of these metrics.
As a transport person, I have several comments about the congestionmetric.
First it wasn't clear from the document if "congestion" wasreferring to the PCE itself or the corresponding LSPs. For clarityof discussion, I will assume LSP congestion. Even if that is notcorrect, my comments are general and there are equivalent problemsfor PCE case.


This is, in fact, the wrong assumption. The congestion metric refers
to the congestion of the PCE itself.

We will add a clarification of this point to the top of section 4.4 asfollows:


Note that "congestion" as indicated by this object refers to the
processing state of the PCE and its ability to handle new PCEP
requests.

Second, there is not a universal definition of congestion. Therelevant feature of congestion is that it perturbs transit flows, bycausing some sort of back-pressure. This back-pressure generallycomes in the form of raised RTT and/or increased loss probability,which reduces the data rate for elastic flows. In the operationalInternet normal values for these parameters can span many orders ofmagnitude. For example on research and education backbones, lossprobabilities as high as 1E-6 would be considered massivelycongested. In other parts of the world loss probabilities as low as1E-2 might be considered extremely good. There is not a standardway to determine when the load is high enough to effect service orwhen the users would perceive the network as "congested".


Your discussion certainly applies to traffic congestion, but is not
applicable in this case.

PCE congestion is much easier to quantify since the measurements are
restricted to a single server. Congestion state is reported by a PCE
as a simple state, and an expected duration.

Here is the new text added to the document:

"A PCE is congested when it has a backlog of PCEP requests such thatit cannotimmediately start to process a new request thus leading to waitingtimes. The congestionduration is quantified as being the (estimated) time until the PCEexpects to be able to

immediately process a new PCEP request."

Without a definition of what congested means the metric is uselessfor such things as choosing alternative paths. One implementation'suncongested state might be lower performance than anotherimplementation's congested state.


This should be clear from the definition above.

Even if you are thinking in terms of admission control (where theback-pressure is to reject calls), your success probability might behigher on a very congested heavily multiplexed path than anotherpath which has a single user is using most of the capacity, but notquite filling the link.


No, we are not thinking in terms of admission control. PCEP requests
are queued, not rejected. Thus knowledge of congestion is very
important to a PCC so as to potentially select another PCE.

Although my examples are somewhat contrived, my point still stands:without a definition of "congested" there is no value to sharing acongestion indication. I can't imagine any global definition ofcongestion that would work, and suspect that you need to add amechanism to define a local, organization/topology specificdefinition of congestion.


The issue here is probably that the definition of congestion was so
"obvious" to the people working on this that the concerns you raise
did not occur to them. Hopefully, the addition of the definition
set out above will clarify this.

Third, the only parameter carried by the congestion object is"expected congestion duration", as though the network can anticipatewhen the congestion will subside. It can't. It may be that thisparameter would be better identified by something like "recommendedpolling interval", e.g. "please don't ask again for x seconds."


The details of a PCE implementation is not in scope. A PCE is in no
position to give advice to a PCC on this, but it can judge the
existing queue size and the current arrival rate of new requests.

It should be clear that "expected congestion duration" is not a
guarantee. Congestion might clear sooner, or might persist longer.
It should be seen as an indication not a guarantee.

In a similar vein neither processing time nor liveness issufficiently well defined.


Section 4.3 seems to be perfectly clear on processing time.
RFC 4655 describes liveness.

Although this is perhaps a nit, the IANA directions are structuredin a way that forces somebody else to rewrite your text, possiblyintroducing errors, and peventing full review in last call. E.g.where you have "The MONITORING Object-Class is to be assigned byIANA (recommended value=19)" It would be better to say "TheMONITORING Object-Class is XX [Value to be provided by IANA,recommended value=1]" The point is to clearly distinguish between 3classes of text:
- Stuff that IANA adjusts in a clearly specified way while thedocument is at
 the RFC editor.
- Instructions to the IANA that should be removed while at the RFCeditor,
 generally about the above.
- Instruction to the IANA that should be preserved in the final RFC(Registry
 creation, etc), which might include some details in the previous two
 categories.
It should be clear to everyone (especially the reviewers) how theIANA text is expected to be appear in the final RFC, even when itcan't match the ID.


We have already had discussions with IANA on the content of this
section, and will reach agreement with them. Our main requirement
has been to show exactly the text that we want included in the
registry.

This draft has serious issues, described in the review, and needssome rethinking.


Thanks for your comments.

JP.

Thanks,
--MM--
-------------------------------------------
Matt Mathis     http://staff.psc.edu/mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf
I've reviewed draft-ietf-pce-monitoring-04.txt as part of thetransport area directorate's ongoing effort to review key IETFdocuments. These comments were written primarily for the transportarea directors, but are copied to the document's authors for theirinformation and to allow them to address any issues raised. Theauthors should consider this review together with any other last-call comments they receive. Please always CC tsv-dir(_at_)ietf(_dot_)org if youreply to or forward this review.
draft-ietf-pce-monitoring-04.txt describes procedures and extensionsto the Path Computation Element Protocol (PCEP) for monitoring thestate of the path computation chain for troubleshooting andperformance monitoring purposes.
It is designed specifically to carry information about PCE liveness,processing time and congestion.
However this draft does not define any of these metrics.
As a transport person, I have several comments about the congestionmetric.
First it wasn't clear from the document if "congestion" wasreferring to the PCE itself or the corresponding LSPs. For clarityof discussion, I will assume LSP congestion. Even if that is notcorrect, my comments are general and there are equivalent problemsfor PCE case.
Second, there is not a universal definition of congestion. Therelevant feature of congestion is that it perturbs transit flows, bycausing some sort of back-pressure. This back-pressure generallycomes in the form of raised RTT and/or increased loss probability,which reduces the data rate for elastic flows. In the operationalInternet normal values for these parameters can span many orders ofmagnitude. For example on research and education backbones, lossprobabilities as high as 1E-6 would be considered massivelycongested. In other parts of the world loss probabilities as low as1E-2 might be considered extremely good. There is not a standardway to determine when the load is high enough to effect service orwhen the users would perceive the network as "congested".
Without a definition of what congested means the metric is uselessfor such things as choosing alternative paths. One implementation'suncongested state might be lower performance than anotherimplementation's congested state.
Even if you are thinking in terms of admission control (where theback-pressure is to reject calls), your success probability might behigher on a very congested heavily multiplexed path than anotherpath which has a single user is using most of the capacity, but notquite filling the link.
Although my examples are somewhat contrived, my point still stands:without a definition of "congested" there is no value to sharing acongestion indication. I can't imagine any global definition ofcongestion that would work, and suspect that you need to add amechanism to define a local, organization/topology specificdefinition of congestion.
Third, the only parameter carried by the congestion object is"expected congestion duration", as though the network can anticipatewhen the congestion will subside. It can't. It may be that thisparameter would be better identified by something like "recommendedpolling interval", e.g. "please don't ask again for x seconds."
In a similar vein neither processing time nor liveness issufficiently well defined.
Although this is perhaps a nit, the IANA directions are structuredin a way that forces somebody else to rewrite your text, possiblyintroducing errors, and peventing full review in last call. E.g.where you have "The MONITORING Object-Class is to be assigned byIANA (recommended value=19)" It would be better to say "TheMONITORING Object-Class is XX [Value to be provided by IANA,recommended value=1]" The point is to clearly distinguish between 3classes of text:
- Stuff that IANA adjusts in a clearly specified way while thedocument is at
 the RFC editor.
- Instructions to the IANA that should be removed while at the RFCeditor,
 generally about the above.
- Instruction to the IANA that should be preserved in the final RFC(Registry
 creation, etc), which might include some details in the previous two
 categories.
It should be clear to everyone (especially the reviewers) how theIANA text is expected to be appear in the final RFC, even when itcan't match the ID.
This draft has serious issues, described in the review, and needssome rethinking.
Thanks,
--MM--
-------------------------------------------
Matt Mathis     http://staff.psc.edu/mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf