Re: Gen-art LC review: draft-mm-netconf-time-capability-05

Hi Robert,

Thanks for the comments.

We have submitted an updated version of the draft, which addresses the comments 
we received from you and other reviewers in IETF last call.
https://tools.ietf.org/html/draft-mm-netconf-time-capability-06 

Our responses to your comments can be found below.
Please let us know if you have further comments or questions.

Thanks,
Tal and Yoram.

-----Original Message-----
From: ietf [mailto:ietf-bounces(_at_)ietf(_dot_)org] On Behalf Of Robert Sparks
Sent: Thursday, July 09, 2015 12:40 AM
To: General Area Review Team; ietf(_at_)ietf(_dot_)org; draft-mm-netconf-time- 
capability(_dot_)all(_at_)ietf(_dot_)org
Subject: Gen-art LC review: draft-mm-netconf-time-capability-05

I am the assigned Gen-ART reviewer for this draft. For background on 
Gen- ART, please see the FAQ at

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments 
you may receive.

Document: draft-mm-netconf-time-capability-05
Reviewer: Robert Sparks
Review Date: 8 Jul 2015
IETF LC End Date: 29 Jul 2015
IESG Telechat date: not yet scheduled

Summary: This draft has open issues to address before publication

This draft adds two separable concepts to netconf
* Asking for and receiving knowledge of when a command was executed
* Requesting that a command be executed at a particular time

The utility of the first is obvious, and I have no problems with the 
specification of that part of this extension. Would it be better to 
pull these apart and progress them separately?



We believe there is a great benefit to defining these two feature together, 
although each of them can be used independently. The second certainly gains 
from the first, since the execution-time provides feedback to the client about 
the actual time of execution compared to the scheduled time of execution.

The utility of the second would be more obvious if the draft didn't 
limit the time to be "near future scheduling". It punts on most of the 
hard problems with scheduling things outside a very tight range (15 
seconds in the future by default), without motivating the advantages of 
saying "wait until 5 seconds from now before you do this".

So:

Why was 15 seconds chosen? Could you add a motivating example that 
shows why being able to say "now is not good, but 5 seconds from now is 
better" is useful? (Something like having a series of things happen as 
close to simultaneously without the network delay of sending the 
requests impacting how they are separated perhaps?)


Point well taken. We have added the following example, motivating why near 
future scheduling (<15 seconds) can be useful:

A typical example of using near-future scheduling is a coordinated commit; a 
client needs to trigger a commit at n servers, so that the n servers perform 
the commit as close as possible to simultaneously. Without the time capability, 
the client sends a sequence of n commit messages, and thus each server performs 
the commit at a different time. By using the time capability, the client can 
send commit messages that are scheduled to take place at time Ts, which is 5 
seconds in the future, causing the servers to invoke the commit as close as 
possible to time Ts.

We have also added an explanation of why 15 seconds were chosen as the default 
value:

The default value of sched-max-future is defined to be 15 seconds. This 
duration is long enough to allow the scheduled RPC to be sent by the client, 
potentially to multiple servers, and in some cases to send a cancellation 
message, as described in Section ‎3.2. On the other hand, the 15 second 
duration yields a very low probability of a reboot or a permission change.

Given the punt, why isn't there a statement that sched-max-future MUST 
NOT be configured for more than some small value (twice the default, or
30 seconds, perhaps), especially while this is targeted for 
Experimental? Without something like that, I think the document needs 
to talk about more of the issues it is trying to avoid with longer term 
scheduling, even if it doesn't solve those issues. (If I have a fast 
pipe, I can make a server keep a lot of queued requests, eating a lot 
of state, even if the window is only 15 seconds. Pointing to how 
netconf protects against state-exhaustion abuse might be useful).


Note that we did not define a maximal value for sched-max-future, since one of 
the goals was to define a generic tool that can be used for various different 
environments. The draft clearly states the intention of using 
near-future-scheduling, but the requirements and constraints of different 
environments may require the sched-max-future to have a different value, 
potentially higher than 30 seconds. Hence, we prefer not to define a maximal 
value. Indeed, in the draft 06 there is a more detailed discussion about the 
issues we are trying to prevent by using near-future scheduling (Section 3.6).

The security considerations section talks about malicious parties 
attempting to cause sched-max-future to be configured to "a small 
value". Could you more clearly characterize  "small", given that the 
default is 15 seconds?


Agreed.
We rephrased this paragraph to be more clear about the "small" value:

This YANG module defines <sched-max-future> and <sched-max-past>, which are 
writable/creatable/deletable. These data nodes may be considered sensitive or 
vulnerable in some network environments. An attacker may attempt to maliciously 
configure these parameters to a low value, thereby causing all scheduled RPCs 
to be discarded. For instance, if a client expects <sched-max-future> to be 15 
seconds, but in practice it is maliciously configured to 1 second, then a 
legitimate scheduled RPC that is scheduled to be performed 5 seconds in the 
future will be discarded by the server.

Even with the near-future limit, there are issues to discuss introduced 
with the ability to cancel a request:

* What prevents a 3rd party from cancelling a request? I think it's 
only that the 3rd party would have to obtain the right id to put in the 
cancel message. If so, the document should talk about how you keep 
eavesdroppers from seeing those ids, and that the servers that generate 
them should make ids that are hard to guess.


We understand this needs further clarification. As noted by Andy Bierman in a 
corresponding mail:

Since the scheduled rpc event is sent to every client that is 
listening for notifications, there is no possibility for security 
through hard-to-guess token, as is done with the "persist-id"  for cancelling 
a confirmed-commit.



We rephrased the paragraph to clarify these issues:

This YANG module defines the <cancel-schedule> RPC. This RPC may be considered 
sensitive or vulnerable in some network environments. Since the value of the 
<schedule-id> is known to all the clients that are subscribed to notifications 
from the server, the <cancel-schedule> RPC may be used maliciously to attack 
servers by canceling their pending RPCs. This attack is addressed in two 
layers: (i) security at the transport layer, limiting the attack only to 
clients that have successfully initiated a secure session with the server, and 
(ii) the authorization level required to cancel an RPC should be the same as 
the level required to schedule it.

* Especially given the near-future limitation, you run a high risk that 
the cancel arrives after the identified request has been executed. It's 
not clear in the current text what the server should do. I assume you 
want the server to reply to the cancel with a "I couldn't cancel that"
rather than to do something like try to undo the request. The document 
should be explicit.

* The document should explicitly disallow adding <scheduled-time> to 
<cancel-schedule>




Agreed.
We have addressed these two comments by adding the following paragraph:

A cancel-schedule message MUST NOT include the scheduled-time parameter. A 
server that receives a cancel-schedule should try to cancel the schedule as 
soon as possible. If the server is unable to cancel the scheduled RPC, for 
example because it has already been executed, it should respond with an 
rpc-error [RFC6241], in which the error-type is 'protocol', and the error-tag 
is 'operation-failed'.

One editorial comment: It would help to move the concept of the 
near-future limitation much earlier in the document, perhaps even into 
the introduction and abstract.




Agreed.
We added the following to the introduction:

The NETCONF time capability is intended for scheduling RPCs that should be 
performed in the near future, allowing to coordinate simultaneous configuration 
changes, or to specify an order of configuration updates. Time-of-day-based 
policies and far-future scheduling, e.g., [Cond], are outside the scope of this 
memo.

[Cond]                          Watsen, K., "Conditional Enablement of 
Configuration Nodes", draft-kwatsen-conditional-enablement-00 (expired), 2013.

And for the shepherding AD: The document has no shepherd or shepherd 
writeup. While a writeup is not required, one would have been useful in 
this case to discuss the history of (lack of) discussion of the 
document on the group's list and the group's reaction to progressing as 
Experimental as an Individual Submission.

Time Capability in NETCONF diff 06 vs 05.pdf
Description: Time Capability in NETCONF diff 06 vs 05.pdf