ietf
[Top] [All Lists]

Re: [sidr] Last Call: <draft-ietf-sidr-origin-ops-21.txt> (RPKI-Based Origin Validation Operation) to Best Current Practice

2013-09-25 21:24:13
On Wed, Sep 25, 2013 at 12:38 PM, George, Wes 
<wesley(_dot_)george(_at_)twcable(_dot_)com> wrote:
From: christopher(_dot_)morrow(_at_)gmail(_dot_)com 
[mailto:christopher(_dot_)morrow(_at_)gmail(_dot_)com]

[CLM]
In the RPKIcache example, 'consumer' is 'routers in your network'.
'Close' is 'close enough that bootstrapping isn't a problem', balanced
with 'gosh, maybe I don't want to put one on top of each router! plus
associated management headaches to deal with these new
systems/appliances'.

[WEG] that's part of my issue - the only way that you get "close enough that
bootstrapping isn't a problem" is when the cache and router are directly

there's some baseline that's acceptable, you intimate that IGP comes
up before EGP below. that makes some sense, and thus maybe the target
is 'in your igp, close enough that fiber failures won't be a problem'
then?

connected. Otherwise there *is* going to be some amount of time while
the router is coming up that it can't talk to its configured caches e.g. while

but the data in the cache only REALLY matters for bgp validation... so
your IGP clue below isn't unreasonable.

it learns the route(s) to the cache(s). I think that supports a recommendation
to put the caches in your IGP instead of BGP, so that you get faster

I actually didn't note a [ie]GP recommendation in the doc.

convergence of those routes and therefore have access to the cache
when BGP comes up and starts converging, rather than once BGP is
partially converged. But the draft doesn't say that.

ok

The question is, does the propagation/convergence delay for an IGP in an
average network (let's call it somewhere between subsecond and 5 seconds)
make a non-trival difference in RPKI's bootstrap behavior, especially since
BGP convergence is also dependent on IGP convergence? Can we make a
clearer recommendation of the performance envelope we're shooting for so
that people can design accordingly? I'm not sure I buy a general "faster(or
closer) is always better" recommendation - at some point, we hit diminishing
returns, given that this is mostly a human time-scale system. The document
doesn't provide clear guidance on how to balance that tradeoff.

i think a bunch of this really also depends on the operator deploying
though... 'its hard to get server people to do X for me' or 'gosh,
these appliances can be managed by network-operations! and they are
cheap-ish' or 'gosh, we don't have 1gbps ports anymore in general,
crap...'

I do think the original intent was to not dictate: "Must be 5ms from
the router, or else!!" and rely upon the operator to do the tradeoff
you just made above. Each network is different in it's expectations
from the infra, and each has different igp/egp designs as well as
fiber plant restrictions. I think it's going to be rough going making
a recommendation much more than:
  1) make sure the cache is available before BGP starts to converge for a device

and I actually can't come up with something else that's super helpful
:( even the above might be 'too much advice', if your plan is to
accept all routes and simply de-pref until validation might happen
then re-evaluate as you can.

[CLM]
I guess one way is to say: "People should understand the dependencies
and engineer appropriately" ... which you kind of asked to not say in
the original comment. (or is the issue that the dependencies aren't
clear?)

[WEG] The issue is that the dependencies aren't clear. I'm not expecting the
text to be too prescriptive here, because all networks are different, but I 
need
enough technical discussion to properly "understand the dependencies and
engineer accordingly". This is an operational considerations document, so it
needs to tell operators what breaks if they don't do it as recommended. If 
this

ok...

is about bootstrapping, then we need to be clearer about the relationship
between bootstrapping and network convergence (since recommending
that the cache is directly connected to the router is impractical) and how
it impacts RPKI cache-router communication and performance. If it's about
reducing latency via proximity, then we need to explain how much latency is
too much latency and why. If it's about proper geographic diversity within a
network's topology, then we need to say that. If we don't actually know if it
makes a difference, and so are defaulting to recommendations that most folks
agree are generally a good idea, we should say that. But right now we're
assuming too much, IMO.

ok, the current text is:
"   As RPKI-based origin validation relies on the availability of RPKI
   data, operators SHOULD locate caches close to routers that require
   these data and services.  'Close' is, of course, complex.  One should
   consider trust boundaries, routing bootstrap reachability, latency,
   etc"

Maybe something like:
"   As RPKI-based origin validation relies on the availability of RPKI
   data, operators SHOULD locate caches close enough to routers that
   require these data and services such that failures in local device
routing domain
   do not impact cache availability. One should consider trust
boundaries, routing
   bootstrap reachability, latency, etc"


-chris

(content warning removed.. since it didn't come from TWC, and my words
are not as restricted)

<Prev in Thread] Current Thread [Next in Thread>