Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protoc

On 28/01/12 3:02 PM, "Rob Austein" <sra(_at_)hactrn(_dot_)net> wrote:

At Wed, 21 Dec 2011 17:43:23 -0800, Terry Manderson wrote:


Apologies for my lack of attention to date on this topic, so speaking only
for myself here.


Similar apologies for not having answered this more promptly.  Somehow
we missed seeing this until our AD asked us about it.

Please see draft-ietf-sidr-rpki-rtr-25, just posted, which we hope
addresses most of your concerns (there are a few points on which I
think we're just going to have to agree to disagree).


I will read -25 soon and raise any concerns should they remain.

[..]

RADIUS doesn't have a bulk transfer operation, and bulk transfer of
data is the main task of this protocol, particularly at start-up.


Is that function of the protocol now highlighted in -25?


You are certainly entitled to your opinion, but it comes a bit late.
This work was done in the public view, with regular progress reports
to the SIDR WG, and we have multiple interoperable implementations
including several of the major router vendors.  So, with all due
respect, I don't think the folks who have put work into this will be
all that interested in abandoning running code at this point.


My example was to highlight that without the rationale for why *this*
protocol was desiired any number of options would/could seem perfectly
reasonable and attractive.

Glossary:

Global RPKI:
I disagree with this definition for two reasons. 1) I'm not aware of a
unified definition for 'distributed system' so this is all rather vague.


The term has been used to describe DNS for decades.  Also see:

  http://en.wikipedia.org/wiki/Distributed_computing


Citing wikipedia - the end is nigh!

Perhaps you could say 'published at a disparate set of systems'.


I don't find that any clearer.  Readers who can't understand the words
"distributed set" aren't likely to understand "disparate set" either.


I guess we remain in disagreement :).

2) Limiting
the servers to be "at" the "IANA, RIRs, NIRs, and ISPs" is also premature.
It's not clear to me that these entities will run their own repositories,
nor are they going to be the only repository operators in the lifecycle of
the RPKI.


This is essentially the same list as appears in section 1.1 of
draft-ietf-sidr-arch, with the term "LIR" replaced by "ISP".

I suppose we could add "or other service providers".


I think that would satisfy me.

Cache:
The words surrounding the fetch/refresh mechanisms of the RPKI is limiting.
Both draft-ietf-sidr-repos-struct and draft-ietf-sidr-res-certs allow for
other (future) retrieval mechanisms as defined by the repository operator
beyond RSYNC (loosely documented in RFC5781).


Terry, you've made it quite clear that you disagree with the SIDR WG's
decision to make rsync the mandatory-to-implement RPKI retrieval
protocol, but you lost that argument a long time ago, and I fail to
see the point of bringing it up here yet again.


That wasn't the intent Rob, please re-read the paragraph for the reality
that I think this document still needs to be flexible SHOULD a future
retrieval mechanism develop. If you still think that it shouldn't be
flexible - then we remain in disagreement.

Last sentence. "Trusting this cache further is a matter between the provider
of the cache and a relying party". In my mind the Relying Party was the one
that did the RPKI validation - would this not be better stated as "Trusting
this cache further is a matter between the provider of the cache and the
router operator".


If a router is making decisions based on data given to it by a server,
the router is the relying party in that relationship.  That the server
in question was itself the relying party in another relationship does
not change this.

The picture here is not all that different from the way that some
vendors have chosen to implement DNSSEC.  It's a two-tier security
relationship: an end-to-end relationship between the publisher of
signed objects and the validator of those signed objects, then a
separate security relationship between the entity that validated the
signed objects and the end entity that actually uses the data.


I think then we remain in disagreement on the phrasing, spelling out
precisely that the relying party identified here has a trust relationship
only with the cache, and not the larger RPKI is important.

Deployment Structure:

Why repeat the definition of "Global RPKI"? It's superfluous.


Because it's not a definition?

I agree that the text here is similar to the definition, but this
section is trying to describe the roles in the system.


Then I think the text needs work.

Local Cache: Again. 'Relying party' seems to be borrowed from the
CA/identity world. Unless you redefine that term here it seems as if the
"router" is making RPKI validation decisions. Which it is not. The router is
acting more like a NAS (See Radius, 2865) when talking to a local cache.

The definition of "routers" seems to get this right - eg "a client of the
cache".


See above.  "Relying party" is a security relationship term, not just
a PKI term.

Operational Overview

when you first use "ROA", please expand the TLA, and provide a reference.


Done.


Thanks.

Serial Query

I don't remember seeing a recommendation for how often a client (router)
sends a serial query. Is there a Min/Max? Surely doing it every second would
be excessive..


Maximum is covered in section 6.2: the router must send a Serial or
Reset Query no less frequently than once per hour.

Minimum is a good question.  We had been assuming that, as this is an
in-POP relationship with cache and router operated by the same party,
there would likely be a knob in the router (router guys live for
knobs) and setting it would be a matter of local policy.  If you want
your router to beat up your cache server every minute, who am I to
stop you?

We needed to set a maximum because that affects the architecture of
the cache (how long does it need to hold onto old data -- given the
potential size of the data sets involved, one might implement the
cache very differently if one needed to hold old data for a week
rather than an hour).


Thus some recommendation text would be helpful.

IPv4 Prefix:

"and nothing prohibits the existence of two identical
   route: or route6: objects in the IRR."

Why even mention the IRR here? It just doesn't seem at all relevant. (and
isn't defined)


Good catch.  Done.


Thanks

" IPvX PDUs" expand to IPv4 or IPv6. Globing into one is a misdirection
under a heading of 'IPv4 Prefix'

IPv6 Prefix

Some text here to say that the IPv6 data structure follows the same
semantics as the IPv4 data structure would be good.. or alternatively
restructure the document to Semantics, then describe the IPv4 and IPv6 data
structures as subheadings to Prefix PDUs.


Done.


Thanks

Error Report

What is "excessive length" of a PDU? at what point do you say "o.k, now I
can truncate".


Too long to be any valid PDU other than an Error Report.  Done.


Thanks

Fields of a PDU

For all types, instead of using "ordinal" can you use the exact description
of the number? eg unsigned integer? For me I always relate ordinals to set
theory.


Done.


Thanks

PDU type, the e,g is incomplete shouldn't it be "IPv4 Prefix = 4" with a
forward reference to the IANA Considerations section?


I think this is a matter of stylistic preference.


Yep. I can let that be.

Serial Number. "for example via rcynic", Is not defined and implementation
specific!


Please read the words "for example".

I suppose we could add a reference, but the last time we did that
somebody objected to having a reference pointing to the source code
for a particular implementation.


Do you need the example? Perhaps just remove it. (I may have missed it, but
I don't recall seeing bind, or any other reference code base mentioned in
any of the DNS documents.)

and there is a typo "completing an rigorously validated"..while
there, consider why you use the term 'rigorously'..


Sigh.  Next time, please be explicit about the typo you're seeing, our
eyes repeatedly bounced off the "an" here until after we'd posting
version -25.  It's not worth yet another rev just to fix that.


ok. Sorry I wasn't explicit at the time.

are there situations when a validation is less rigorous? If so
explain.


I suspect that my co-author was trying to say that one can't just
retrieve the data, pull the ASNs and prefixes out of the ROAs, and
feed them into the router, one has to do the RPKI validation first.

I guess we can remove the word if it offends you, but it seems
harmless.


I just want it to be clear that there is only one level of validation as per
the various RPKI object validation rules.

Session ID

What is the risk of a cache server starting/restarting with the same session
ID and serial number as before, but with different cache contents? Is this
an entropy concern? Just thinking of a potential scenario where a router is
cache-wedged. Is this at all probable? and why not - some words here to
cover this would be good.


We added several paragraphs on exactly this topic sometime around IETF
Last Call, I suspect the version you reviewed did not have that text.
I think we've addressed this point, please check the current text and
let us know if there's a further issue here.


I will read.

Flags

Can you reword the binary choice here? Do you actually need to delve into
'right to announce'? This is really about RIB entry behaviors yeah?


The semantics here are closely related to ROAs, which, as you no doubt
recall, are Route Origin Authorizations, so the text here follows that
model.

With all due respect, I do not think that a discussion of RIB entry
behavior here would be simpler.


fine.

Expand "IPvX".


Done.


Thanks

Start or Restart:

I think the terms in when a router needs to send a serial query or a reset
query need to be tighter. Saying MAY here is too loose. I would much prefer
to see a structure where if the router does not have a recorded serial for a
cache from a previous session, the router MUST send a reset query. Logically
you assume that to be the case, so be specific.


I think this is a stylistic matter again.  The router MAY do two
things here, one of which is only applicable if it has data from a
previous broken session.

The only real difference I see here between the current formulation
and the MUST formulation you prefer is that, as currently written, the
router could chose not to send anything at all initially; this option
doesn't seem particularly useful, so I don't mind removing it, but
neither do I see the difference between the current text and your
suggested change as a big deal.


Perhaps choose whichever has the lower chance of confusion for the router.

Thereafter the router MAY send a reset query, and SHOULD send a serial
query. I suspect this is what the vendors (who have chimed in on the list)
have coded.

This then corroborates section 4 where you suggest the router only send
serial queries for efficiency.


Section 6.2 already says that the typical exchange is for the cache to
send a Serial Notify, in the expectation that the router will schedule
an immediate Serial Query.  We didn't make it any stronger than that
because the folks implementing the router side of this expressed
concern at the notion that the cache could tell them to do something
(read: they understand that the notification mechanism will help speed
convergence, but they're worried that the dinky CPUs they're stuck
with in some of the relevant hardware will be swamped if they try too
hard, which is why routers are allowed to ignore notifications and
caches are rate-limited in sending them).

ok.

Transport:

MiTM is Man in the middle as I and many others know it. 'Monkey/piggy/pickle
in the middle' is a child's ball game.


Monkey-in-the-middle is a common non-sexist variant of this term.
Welcome to the 21st century.


Going back to a gender-neutral section of a professional writing text from
my MBA, it highlights that arbitrarily changing the linguistic definition of
certain gender inclusive scenarios is poor form. If the language where
'Men-in-The-Middle" or "A-Man-in-The-Middle", then certainly change it.

Otherwise Man-in-the-middle is perfectly gender ambiguous. - But that may
also be my style and I will let the RFC Editor handle as appropriate.

" Therefore, as of this document, there is no mandatory to
   implement transport which provides authentication and integrity
   protection."

if this is the case.. then why? what is the gain?


OK, this is the elephant in the living room.


[..]


Nobody is happy with this, but it's the least bad compromise we could
find between what the IETF would prefer and reality in the field.


O.K.

why not then make the router fetch the signed objects and do the
validation internal - this again seems to be the 'missing
requirements' problem.


See "currently shipping routing hardware", above.

SSH Transport

State up front that you MUST use SSHv2. (instead hinting in the third
paragraph)


Done.


Thanks.

TLS Transport
"Man in The Middle (MiTM)" please.


Above.

Router Cache setup

"When a more preferred cache becomes available, if resources allow, it
   would be prudent for the client to start fetching from that cache."

How does the client (I assume router) know when to do this as cache's are
not synchronized?? How does a router tell if any particular cache has more
current data over another cache? what if two caches contradict each other?


The document repeatedly states that the router has an ordered
preference list of the caches it uses.  The text you quote here
doesn't say "has more current data", it says "becomes available", ie,
it stops rejecting connection attempts, signalling errors, or
otherwise failing to be useful.


o.k.

Error codes

6: Withdrawal of Unknown Record (fatal), why drop the session? (which
presumably causes a restart) to a cache, assuming the cache is corrupt,
which will then send another Unknown Record, which is fatal... (repeat)??

Why not mark the cache as corrupt at the client?


This is one of several loss-of-synchronization problems.  The
assumption is that the router may have (somehow) lost synchronization
with the cache.  We don't really know which party is confused at this
point, all we know is that the session itself is no longer useful
because the router and cache are not communicating clearly.  So the
router's data isn't necessarily corrupt.

The router won't necessarily restart with this cache right away
either, it has several options: it might try another cache, it switch
to another set of data it has already loaded, or might try a reset
query to this cache.


o.k.

Security Considerations:

Transport Security. There are multiple valid options for a root trust anchor
including the structure from the IAB aligning it to the IANA. Perhaps
instead of saying " the IANA root trust anchor" say "Global RPKI root trust
anchor". Otherwise you might accidently find your validated cache only
covers unallocated and reserved blocks.


I think you're saying that using the term IANA here is politically
incorrect.


No. I'm saying that while discussions are underway, precisely which trust
anchor covers what is still on the table. At this stage one option has
IANA's RPKI CA being authoritative for only unallocated and reserved INRs.
It may be that there is a unified trust anchor above that, known loosely as
the global trust anchor. However tying the document to one particular TA
might result in a gross inaccuracy. Ultimately then, if the global trust
anchor is the IANA TA, you haven't lost.

Cheers
Terry

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard