RE: Gen-art LC review: draft-mm-netconf-time-capability-05

Hi Robert,

Thanks for the detailed comments.

We have posted an updated draft. 
https://datatracker.ietf.org/doc/draft-mm-netconf-time-capability/

We believe this draft addresses all the issues you raised.

Please let us know if you have further comments about the current draft.

I still think, especially while this as at experimental, you should scope 
this with
an absolute max. But I'm just one reviewer. Work it out with your AD.


Regarding an absolute max of sched-max-future: as suggested, we consulted our 
AD about this issue. Since the intended status is experimental, we prefer not 
to define an upper bound, so as not to limit the experimental scope of the time 
capability. 
However, we added a new subsection "The Tradeoff in Setting the 
sched-max-future Value", and we believe the tradeoff is currently clear to the 
reader.

Well, those are just a subset of the things that could change in
command's context that would cause the command to be erroneous or even
damaging if it were run, and you're not addressing the other security
issues that come with very long scheduling (overflowing buffers, or
having lots of time to schedule a massive number of commands to all try
to happen at once). I suspect there are other things that pressured adding

the "near future"

restriction that haven't been captured well yet.


Well, the thing is that 15 seconds (or 'a few seconds' for that matter) is a 
long
enough time to send thousands (or more) of scheduled RPCs, so I am not sure
the sched-max-future mitigates the buffer overflow threat. Generally
speaking, Section 3.6 discusses erroneous scenarios, and not security threats.

I would suggest to add some text to the security considerations section, which
discusses the overflow attack you mentioned here. Would this address your
concern?


We updated the security considerations section, and it currently includes the 
buffer overflow, and multiple-RPCs-at-the-same-time attacks.


Thanks,
Tal.

-----Original Message-----
From: Tal Mizrahi
Sent: Wednesday, August 05, 2015 2:41 AM
To: 'Robert Sparks'
Cc: ietf(_at_)ietf(_dot_)org; General Area Review Team; draft-mm-netconf-time-
capability(_dot_)all(_at_)ietf(_dot_)org
Subject: RE: Gen-art LC review: draft-mm-netconf-time-capability-05

Hi Robert,

Thanks again for the prompt responses.

Well, those are just a subset of the things that could change in
command's context that would cause the command to be erroneous or even
damaging if it were run, and you're not addressing the other security
issues that come with very long scheduling (overflowing buffers, or
having lots of time to schedule a massive number of commands to all try
to happen at once). I suspect there are other things that pressured adding

the "near future"

restriction that haven't been captured well yet.


Well, the thing is that 15 seconds (or 'a few seconds' for that matter) is a 
long
enough time to send thousands (or more) of scheduled RPCs, so I am not sure
the sched-max-future mitigates the buffer overflow threat. Generally
speaking, Section 3.6 discusses erroneous scenarios, and not security threats.

I would suggest to add some text to the security considerations section, which
discusses the overflow attack you mentioned here. Would this address your
concern?

I think you're saying that in production deployments today, the
authorization policy is "the peer was able to send me a packet". Is
that wrong?


I can't comment about what is deployed in production today, although I am
sure there are operators out there who can comment about that. RFC 6536,
which defines a NETCONF access control model, is cited by 6 other RFCs, so I
do not think access control has been overlooked by the community.
Nevertheless, I believe that (much like RFC 6241) the access control specifics
are not within the scope of the current draft.


Thanks,
Tal.

-----Original Message-----
From: Robert Sparks [mailto:rjsparks(_at_)nostrum(_dot_)com]
Sent: Tuesday, August 04, 2015 9:01 PM
To: Tal Mizrahi
Cc: ietf(_at_)ietf(_dot_)org; General Area Review Team; draft-mm-netconf-time-
capability(_dot_)all(_at_)ietf(_dot_)org
Subject: Re: Gen-art LC review: draft-mm-netconf-time-capability-05



On 8/4/15 11:19 AM, Tal Mizrahi wrote:

Hi Robert,

Thanks for the comments.

A typical example of using near-future scheduling is a coordinated
commit; a client needs to trigger a commit at n servers, so that
the n servers perform the commit as close as possible to simultaneously.
Without the time capability, the client sends a sequence of n
commit messages, and thus each server performs the commit at a
different time. By using the time capability, the client can send
commit messages that are scheduled to take place at time Ts, which
is 5 seconds in the future, causing the servers to invoke the
commit as close

as possible to time Ts.

I'm interested in your response to Andy's point on this paragraph.

Okay, so here is Andy's point:

You should pick a different example because the NETCONF
confirmed-commit procedure is designed to be loose-coupled.  The

default timeout is 10 minutes.

Since the client needs sessions open with all servers involved in
the network-wide commit, there is no advantage in staging the
<commit> operations 15 sec. in advance, to make sure the servers
are

reachable.

And here is our response from 02-Aug-2015:

Right, confirmed-commit is loose-coupled. But the example quoted
above (Example
1 in the draft) is not intended to replace the confirmed commit. The
purpose in this example is different: the client wants the commit
RPCs to be executed at the same time in all servers.
The confirmed-commit serves a different purpose, which is to make
sure that everyone either commits or rolls back. BTW, a confirmed
commit can be sent with the scheduled-time element, allowing to
enjoy

the best of both worlds.


Please let us know if you have further concerns about this point.

The default value of sched-max-future is defined to be 15 seconds.
This duration is long enough to allow the scheduled RPC to be sent
by the client, potentially to multiple servers, and in some cases
to send a cancellation message, as described in Section ‎3.2. On
the other hand, the 15 second duration yields a very low
probability of a

reboot or a permission change.

I'm not finding the explanation terribly persuasive, but it's at
least _some_ explanation - thanks for that.  I'll leave it to the
ADs and other reviewers in the field to see if it's sufficient for
an experimental protocol.

(*) Please see comment (**) below.

Note that we did not define a maximal value for sched-max-future,
since one of the goals was to define a generic tool that can be
used for various different environments. The draft clearly states
the intention of using near-future-scheduling, but the requirements
and constraints of different environments may require the
sched-max-future to have a different value, potentially higher than
30 seconds. Hence, we prefer not to define a maximal value. Indeed,
in

the draft 06 there is a more detailed discussion about the issues we
are trying to prevent by using near-future scheduling (Section 3.6).

Without a maximal value, I think you need more of a discussion
guiding the choice of sched-max-future. Otherwise, you are just
waiving your hands at not addressing the problems with far-future
scheduling, and potentially well-meaning but uninformed people are
going to go step in them anyway. There was a point to choosing the
near-

future limit.

Enforce it or explain it with more vigor please.

(**) Your point is well taken. What we suggest, regarding this point
and the

previous point (*), is that we add more text explaining the factors
that affect sched-max-future to Section 3.6 .


Here is the new text we suggest. Please let us know if this addresses
your

comment:



The challenge in far future scheduling is that during the long period
between

the time at which the RPC is sent and the time at which it is scheduled
to be executed the following erroneous events may occur:

- The server may restart.
- The client's authorization level may be changed.
- The client may restart and send a conflicting RPC.
- A different client may send a conflicting RPC.

Well, those are just a subset of the things that could change in
command's context that would cause the command to be erroneous or even
damaging if it were run, and you're not addressing the other security
issues that come with very long scheduling (overflowing buffers, or
having lots of time to schedule a massive number of commands to all try
to happen at once). I suspect there are other things that pressured adding

the "near future"

restriction that haven't been captured well yet.


In these cases if the server performs the scheduled operation it may

perform an action that is inconsistent with the current network policy,
or inconsistent with the currently active clients.


Near future scheduling guarantees that external events such as the

examples above have a low probability of occurring during the
sched-max- future period, and even when they do, the period of
inconsistency is limited to sched-max-future, which is a short period of time.


Hence, sched-max-future should be configured to a value that is high

enough to allow the client to:

1. Send the scheduled RPC, potentially to multiple servers.
2. Receive notifications or rpc-error messages from the server(s), or
wait for

a timeout and decide that if no response has arrive then something is wrong.

3. If necessary, send a cancellation message, potentially to multiple

servers.


On the other hand, sched-max-future should be configured to a value
that is

low enough to allow a low probability of the erroneous events above,
typically on the order of a few seconds. Note that even if
sched-max-future is configured to a low value, it is still possible
(with a low probability) that an erroneous event will occur. However,
this short potentially hazardous period is not significantly worse than
in conventional (unscheduled) RPCs, as even a conventional RPC may in
some cases be executed a few seconds after it was sent by the client.


The default value of sched-max-future is defined to be 15 seconds.
This

duration is long enough to allow the scheduled RPC to be sent by the
client, potentially to multiple servers, and in some cases to send a
cancellation message, as described in Section ‎3.2. On the other hand,
the 15 second duration yields a very low probability of a reboot or a

permission change.

I still think, especially while this as at experimental, you should
scope this with an absolute max. But I'm just one reviewer. Work it out with

your AD.

This YANG module defines the <cancel-schedule> RPC. This RPC may be
considered sensitive or vulnerable in some network environments.
Since the value of the <schedule-id> is known to all the clients
that are subscribed to notifications from the server, the
<cancel-schedule> RPC may be used maliciously to attack servers by
canceling their pending

RPCs.

This attack is addressed in two layers: (i) security at the
transport layer, limiting the attack only to clients that have
successfully initiated a secure session with the server, and (ii)
the authorization level required to cancel an RPC should be the same as

the level required to schedule it.

To help me along, point me to the specifics of what you use to set
and verify such an authorization level?

Indeed, there is a need for an authorization scheme, which is able to
set and

verify the authorization level.

NETCONF (RFC 6241) does not explicitly define an authorization
scheme, and

it is probably not within the scope of the current draft to define such
a scheme either.

Quoting RFC 6241:

    This document does not specify an authorization scheme, as such a
    scheme will likely be tied to a meta-data model or a data model.
    Implementors SHOULD provide a comprehensive authorization scheme

with

    NETCONF.
    ...
    Different environments may well allow different rights prior to and
    then after authentication.  Thus, an authorization model is not
    specified in this document.  When an operation is not properly
    authorized, a simple "access denied" is sufficient.

I think you're saying that in production deployments today, the
authorization policy is "the peer was able to send me a packet". Is
that wrong?




Please let us know if you have further comments or concerns about any
of

the issues above.


Thanks,
Tal.