ietf
[Top] [All Lists]

Re: Review of draft-ietf-pce-monitoring-04.txt

2009-12-15 14:34:06
Dear Matt,

On Apr 28, 2009, at 5:44 AM, Matt Mathis wrote:

I've reviewed draft-ietf-pce-monitoring-04.txt as part of the transport area directorate's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors for their information and to allow them to address any issues raised. The authors should consider this review together with any other last- call comments they receive. Please always CC tsv-dir(_at_)ietf(_dot_)org if you reply to or forward this review.

draft-ietf-pce-monitoring-04.txt describes procedures and extensions to the Path Computation Element Protocol (PCEP) for monitoring the state of the path computation chain for troubleshooting and performance monitoring purposes.

It is designed specifically to carry information about PCE liveness, processing time and congestion.

However this draft does not define any of these metrics.

As a transport person, I have several comments about the congestion metric.

First it wasn't clear from the document if "congestion" was referring to the PCE itself or the corresponding LSPs. For clarity of discussion, I will assume LSP congestion. Even if that is not correct, my comments are general and there are equivalent problems for PCE case.


This is, in fact, the wrong assumption. The congestion metric refers
to the congestion of the PCE itself.

We will add a clarification of this point to the top of section 4.4 as follows:

Note that "congestion" as indicated by this object refers to the
processing state of the PCE and its ability to handle new PCEP
requests.

Second, there is not a universal definition of congestion. The relevant feature of congestion is that it perturbs transit flows, by causing some sort of back-pressure. This back-pressure generally comes in the form of raised RTT and/or increased loss probability, which reduces the data rate for elastic flows. In the operational Internet normal values for these parameters can span many orders of magnitude. For example on research and education backbones, loss probabilities as high as 1E-6 would be considered massively congested. In other parts of the world loss probabilities as low as 1E-2 might be considered extremely good. There is not a standard way to determine when the load is high enough to effect service or when the users would perceive the network as "congested".

Your discussion certainly applies to traffic congestion, but is not
applicable in this case.

PCE congestion is much easier to quantify since the measurements are
restricted to a single server. Congestion state is reported by a PCE
as a simple state, and an expected duration.

Here is the new text added to the document:

"A PCE is congested when it has a backlog of PCEP requests such that it cannot immediately start to process a new request thus leading to waiting times. The congestion duration is quantified as being the (estimated) time until the PCE expects to be able to
immediately process a new PCEP request."


Without a definition of what congested means the metric is useless for such things as choosing alternative paths. One implementation's uncongested state might be lower performance than another implementation's congested state.


This should be clear from the definition above.

Even if you are thinking in terms of admission control (where the back-pressure is to reject calls), your success probability might be higher on a very congested heavily multiplexed path than another path which has a single user is using most of the capacity, but not quite filling the link.


No, we are not thinking in terms of admission control. PCEP requests
are queued, not rejected. Thus knowledge of congestion is very
important to a PCC so as to potentially select another PCE.

Although my examples are somewhat contrived, my point still stands: without a definition of "congested" there is no value to sharing a congestion indication. I can't imagine any global definition of congestion that would work, and suspect that you need to add a mechanism to define a local, organization/topology specific definition of congestion.


The issue here is probably that the definition of congestion was so
"obvious" to the people working on this that the concerns you raise
did not occur to them. Hopefully, the addition of the definition
set out above will clarify this.

Third, the only parameter carried by the congestion object is "expected congestion duration", as though the network can anticipate when the congestion will subside. It can't. It may be that this parameter would be better identified by something like "recommended polling interval", e.g. "please don't ask again for x seconds."


The details of a PCE implementation is not in scope. A PCE is in no
position to give advice to a PCC on this, but it can judge the
existing queue size and the current arrival rate of new requests.

It should be clear that "expected congestion duration" is not a
guarantee. Congestion might clear sooner, or might persist longer.
It should be seen as an indication not a guarantee.

In a similar vein neither processing time nor liveness is sufficiently well defined.


Section 4.3 seems to be perfectly clear on processing time.
RFC 4655 describes liveness.

Although this is perhaps a nit, the IANA directions are structured in a way that forces somebody else to rewrite your text, possibly introducing errors, and peventing full review in last call. E.g. where you have "The MONITORING Object-Class is to be assigned by IANA (recommended value=19)" It would be better to say "The MONITORING Object-Class is XX [Value to be provided by IANA, recommended value=1]" The point is to clearly distinguish between 3 classes of text:

- Stuff that IANA adjusts in a clearly specified way while the document is at
 the RFC editor.

- Instructions to the IANA that should be removed while at the RFC editor,
 generally about the above.

- Instruction to the IANA that should be preserved in the final RFC (Registry
 creation, etc), which might include some details in the previous two
 categories.

It should be clear to everyone (especially the reviewers) how the IANA text is expected to be appear in the final RFC, even when it can't match the ID.


We have already had discussions with IANA on the content of this
section, and will reach agreement with them. Our main requirement
has been to show exactly the text that we want included in the
registry.

This draft has serious issues, described in the review, and needs some rethinking.


Thanks for your comments.

JP.

Thanks,
--MM--
-------------------------------------------
Matt Mathis     http://staff.psc.edu/mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf
I've reviewed draft-ietf-pce-monitoring-04.txt as part of the transport area directorate's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors for their information and to allow them to address any issues raised. The authors should consider this review together with any other last- call comments they receive. Please always CC tsv-dir(_at_)ietf(_dot_)org if you reply to or forward this review.

draft-ietf-pce-monitoring-04.txt describes procedures and extensions to the Path Computation Element Protocol (PCEP) for monitoring the state of the path computation chain for troubleshooting and performance monitoring purposes.

It is designed specifically to carry information about PCE liveness, processing time and congestion.

However this draft does not define any of these metrics.

As a transport person, I have several comments about the congestion metric.

First it wasn't clear from the document if "congestion" was referring to the PCE itself or the corresponding LSPs. For clarity of discussion, I will assume LSP congestion. Even if that is not correct, my comments are general and there are equivalent problems for PCE case.

Second, there is not a universal definition of congestion. The relevant feature of congestion is that it perturbs transit flows, by causing some sort of back-pressure. This back-pressure generally comes in the form of raised RTT and/or increased loss probability, which reduces the data rate for elastic flows. In the operational Internet normal values for these parameters can span many orders of magnitude. For example on research and education backbones, loss probabilities as high as 1E-6 would be considered massively congested. In other parts of the world loss probabilities as low as 1E-2 might be considered extremely good. There is not a standard way to determine when the load is high enough to effect service or when the users would perceive the network as "congested".

Without a definition of what congested means the metric is useless for such things as choosing alternative paths. One implementation's uncongested state might be lower performance than another implementation's congested state.

Even if you are thinking in terms of admission control (where the back-pressure is to reject calls), your success probability might be higher on a very congested heavily multiplexed path than another path which has a single user is using most of the capacity, but not quite filling the link.

Although my examples are somewhat contrived, my point still stands: without a definition of "congested" there is no value to sharing a congestion indication. I can't imagine any global definition of congestion that would work, and suspect that you need to add a mechanism to define a local, organization/topology specific definition of congestion.

Third, the only parameter carried by the congestion object is "expected congestion duration", as though the network can anticipate when the congestion will subside. It can't. It may be that this parameter would be better identified by something like "recommended polling interval", e.g. "please don't ask again for x seconds."

In a similar vein neither processing time nor liveness is sufficiently well defined.

Although this is perhaps a nit, the IANA directions are structured in a way that forces somebody else to rewrite your text, possibly introducing errors, and peventing full review in last call. E.g. where you have "The MONITORING Object-Class is to be assigned by IANA (recommended value=19)" It would be better to say "The MONITORING Object-Class is XX [Value to be provided by IANA, recommended value=1]" The point is to clearly distinguish between 3 classes of text:

- Stuff that IANA adjusts in a clearly specified way while the document is at
 the RFC editor.

- Instructions to the IANA that should be removed while at the RFC editor,
 generally about the above.

- Instruction to the IANA that should be preserved in the final RFC (Registry
 creation, etc), which might include some details in the previous two
 categories.

It should be clear to everyone (especially the reviewers) how the IANA text is expected to be appear in the final RFC, even when it can't match the ID.

This draft has serious issues, described in the review, and needs some rethinking.

Thanks,
--MM--
-------------------------------------------
Matt Mathis     http://staff.psc.edu/mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>