Re: O&M Directorate Review of draft-ietf-hokey-erx-09

Hi Bernard,

Thanks for the followup.  Some notes inline:

On 2/8/2008 8:47 AM, Bernard Aboba wrote:

Comments below.

 > Date: Fri, 8 Feb 2008 01:38:30 -0800
 > From: ldondeti(_at_)qualcomm(_dot_)com
 > To: bernard_aboba(_at_)hotmail(_dot_)com
 > CC: ietf(_at_)ietf(_dot_)org
 > Subject: Re: O&M Directorate Review of draft-ietf-hokey-erx-09
 >
 > Hi Bernard,
 >
 > Many thanks for your review. Please see inline for some thoughts and
 > proposals for improvement of erx-09:
 >
 > On 2/6/2008 4:07 PM, Bernard Aboba wrote:
 > > Review of draft-ietf-hokey-erx-09
 > >
 > > I have reviewed this document as part of the Operations and Management
 > > directorate effort. These comments were primarily written for the
 > > benefit of the O&M area directors. Document editors and WG chairs
 > > should treat these comments just like any other last call comments.
 > >
 > > Detailed review comments are available here:
 > > http://www.drizzle.com/~aboba/EAP/erx-review.txt
 > >
 > > An answer to typical O&M issues is included below:
 > >
 > > 1. Is the specification complete? Can multiple interoperable
 > > implementations
 > > be built based on the specification?
 > >
 > > There are a few areas of the document which are unclear to me, such 
as how
 > > AAA routing is accomplished, and how/when peers require the local 
realm, and
 > > if so, how it is to be obtained. Also, clarity with respect to 
algorithm
 > > agility could be improved. There are also some issues with respect 
to the
 > > required behavior of ERX peers and severs (use of normative language).
 > >
 > > There are also situations in which multiple approaches can be chosen
 > > (such as
 > > the various bootstrap options), without one being chosen as 
mandatory or
 > > default. Choosing one approach would seem to be better.
 > >
 > > In my judgement, addressing these issues would improve the 
likelihood of
 > > being able to build multiple interoperable implementations.
 >
 > I agree. This has been brought up by Joe and we'll clarify the text.
 > Some of the confusion has to do with the evolution of the draft; Vidya
 > and I spent a good amount of time cleaning up around the WGLC time, but
 > it appears that we can do better.
 >
 > Pasi suggested adding a section on lower layer considerations. That
 > should help as well.
 >
 > >
 > > 2. Is the proposed specification deployable? If not, how could it be
 > > improved?
 > >
 > > Based on my reading of the document, it would appear that the ERX 
proposal
 > > requires changes to EAP peers, authenticators and servers, as well as
 > > RADIUS
 > > clients, proxies and servers. It also appears possible that changes 
to the
 > > lower layer protocols will be required in at least some cases, such 
as to
 > > make the local domain available to the peer.
 > >
 > > Given my experience in designing and operating wireless networks,
 > > deployments
 > > requiring changes only to peers and authenticators (but not servers or
 > > RADIUS
 > > infrastructure) can take as long as 3-5 years to complete. For example,
 > > WPA2 is still not universally deployed, even though the 
specification was
 > > finished in 2004.
 >
 > WPA2 compliance requires hardware upgrade in many cases and that may
 > have been the reason for the delay. In addition, some enterprises found
 > an alternative solution, i.e., IPsec VPNs, and so were not as motivated
 > to move to WPA2.
 >
 > In case of ERX, a firmware upgrade should be sufficient, which is much
 > more easier.

[BA] One thing that we've learned from the WEP/WPA experience is that
changes that often can be delivered via firmware/software upgrade are 
often linked
to new hardware for efficiency reasons.  For example, while TKIP was
designed for backward compatibility with WEP, few vendors offered
upgrades to existing WEP APs; most introduced the changes on new
models instead, out of the desire to only continue development on
newer branches of the code tree.  Similar examples exist for peer
updates (e.g. IPv6 support on legacy operating system versions).


I guess we can go on this forever as I have a different take on this and 
so far haven't seen any new information to change my mind.

In case of ERP, there are fewer messages, whereas in case of TKIP and 
IPv6 the upgrade involves additional processing requirements.  I do 
realize that processing logic is slightly more complex, but with some of 
the proposed optimizations in the recent reviews, the complexity is 
quite low.


So in practice, making changes to a component will often result in the
need for new hardware, even if hardware changes were not required by
the design.


Generally speaking, yeah, I do agree.  I think the one barrier for ERP 
deployment is local ER server deployment.  Everywhere else, while there 
is a need for a software/firmware upgrade, there is no need for hardware 
upgrades, in my opinion.  One caveat is potential memory upgrade 
requirements on home ER servers.


 >
 > >
 > > By also requiring changes to AAA infrastructure, it seems to me 
that ERP
 > > deployment will be made more difficult than upgrades to the lower layer
 > > (such as IEEE 802.11r), which appear to achieve a similar objective.
 > > This puts the ERX proposal at a competitive disadvantage, and makes it
 > > unlikely that it will be widely deployed in its current form.
 >
 > In the context of WLANs, I can understand your argument, but in the
 > context of foo wireless network, much of the work of 11r security, needs
 > to be repeated.

[BA] The Problem Statement document made it seem that the focus was
solely on intra-media handoff, not inter-media.  Also, at various points
in the document, it appeared that link layer changes were being required.
So if the intent is for the solution to apply to inter-media handoff, then
that needs to be clarified and there may also be a need to address
potential backward compatibility issues.


I meant handover within foo wireless network, not between 11 and foo. 
That said, given that there is support for inter-domain (and 
inter-technology does not introduce any new issues) handover within the 
charter, I need to take a closer look at the PS where it may be 
inconsistent with the charter.


 > 11r also requires firmware upgrades to APs and STAs;
 > furthermore, when physical threats to edge devices are considered, the
 > R0-KH needs to be in a safer location and that may mean more L2
 > architectural considerations. The problems don't go away; they go to a
 > different standards organization :).
 >
 > When considering new wireless network standards, I think ERP along with
 > the EMSK key hierarchy is better. Keys for other usages can also be
 > derived (the current alternative is static key provisioning).

[BA] While I would agree that the EMSK hierarchy enables use of
EAP for application layer security, I'm not sure you want to make the 
argument
for that in the ERP document.


No :).


 > > 3. Does the proposed approach have any scaling issues that could affect
 > > usability for large scale operation?
 > >
 > > The proposed approach introduces state into NASes, as well as RADIUS
 > > proxies and servers. This state is typically of two types: routing
 > > state and key state. In terms of key state storage, it would appear
 > > that the RADIUS server needs to store key state for each authenticated
 > > user within the Session-Id lifetime, regardless of where they are
 > > located. Local ERX servers store state for all local users, regardless
 > > of their home realms.
 > >
 > > In order to scale to handle a large user population, additional RADIUS
 > > servers are typically deployed, going against a replicated backend
 > > store (such as an LDAP directory). Similarly, additional RADIUS
 > > proxies are deployed to handle the forwarding load.
 >
 > To support the concept of local ER servers, I agree that additional
 > servers need to be deployed. However, in case of ERP with home, no
 > additional devices/hardware resources are necessary. Consider the
 > alternative: in the absence of ERP, the peer would be running EAP each
 > time, and in fact, taking up more resources than in case of ERP.

[BA] I think the issue is how ERP interacts with existing AAA failover
mechanisms, which allow paths to change on a dynamic basis.  The
document seems to assume the ability to do "route pinning" in various
places.

 > > In conventional RADIUS deployments, proxies act much like routers,
 > > so that the failure of a RADIUS proxy will not necessarily result in
 > > failure of an EAP authentication in progress. For example, a NAS
 > > could switch over from use of one proxy to another one and as long
 > > as the same RADIUS server remained reachable, the conversation could
 > > complete normally.
 > >
 > > Similarly, while failure of a RADIUS server during a conversation will
 > > require re-starting the EAP conversation, that conversation could
 > > complete normally if restrated with a new server, since all servers
 > > presumably have access to the same backend credential store.
 > >
 > > Some of these assumptions no longer apply with ERX, since RADIUS
 > > proxies and servers now store key state which is not replicated
 > > between them. Therefore RADIUS failover would disrupt the functioning
 > > of ERX in a way that it does not disrupt operation of RADIUS today.
 > >
 > > For example, if a RADIUS proxy or server goes down, all key state 
at that
 > > proxy/server may be lost (the document does not talk about use of 
stable
 > > storage to preserve keys), and therefore ERP requests will fail.
 >
 > Sure. If the state from an EAP authentication is lost, a new EAP
 > authentication run is required the next time the peer needs to
 > authenticate to the network. I do understand that the peer will try,
 > fail and then fall back to EAP. But, that comes with having the option.
 >
 > >
 > > With respect to the resource requirements required to store key state,
 > > I believe that they are manageable for the most part.
 > > Typically RADIUS servers have substantial resources
 > > associated with them, so that they are more capable of handling 
this kind
 > > of state than NASes which are embedded devices. In terms of NAS state,
 > > it would appear to me that the proposed approach scales better than
 > > existing proposals such as IEEE 802.11r, since an authenticator 
will only
 > > hold state for connected devices, as opposed to devices that *might*
 > > connect in the future.
 > >
 > > My only concern would be about RADIUS proxies. In my experience,
 > > proxies are often installed in co-location facilities where repairs
 > > can be expensive and difficult, and so they are often installed on
 > > stripped-down hardware; with the current move toward flash, they
 > > may not even have a hard disk in the near future. Such stripped
 > > down boxes may not be capable of maintaining large key caches.
 >
 > If a local domain wants to support ERP functionality, they'll make the
 > upgrades; otherwise, the peer would have to go to the home ER server for
 > re-authentication (the roundtrips will be fewer, but each roundtrip may
 > have higher latency).

[BA] On reading the document, my impression was that the peer would
typically be pre-configured for use of ERP with the home server.  That is,
the home domain would upgrade to ERP, and the peer would be configured
so as to enable use of ERP whenever the home NAI was to be presented.


Sure.

However, whether the peer could actually use ERP would be determined
by whether the local network supported it or not.  That might be 
pre-configured
(e.g. use ERP whenever connecting to a given SSID), or it might be
learned from the network (e.g. receipt of ERP packets).

Not quite.  It is conceivable that ERP runs with home ER server; but I 
wouldn't quite run it that way and agree with you.


 > > 4. Are there any backward compatibility issues?
 > >
 > > There seem to be some issues with respect to backward
 > > compatibility with EAP as defined in RFC 3748 and RFC 4137. For 
example,
 > > the document appears to enable two packets to be in flight at the same
 > > time, and there seems to be an assumption that ERP implementations 
will not
 > > respond to EAP-Request/Identity packets.
 > >
 > > A bigger problem may exist with respect to RFC 2284 implementations
 > > which represent the bulk of existing EAP deployments. Since RFC 2284
 > > does not specify how peers and servers behave when encountering new
 > > EAP message types or peer-initiated messages, the behavior in the
 > > field will be implementation dependent.
 > >
 > > Hopefully, this does not include unanticipated ill effects (crashes,
 > > security compromises) but it's not possible to rule this out without
 > > testing.
 > >
 > > There also may be issues with respect to compatibility with existing
 > > EAP lower layers. For example, it would appear to me that IEEE 
802.1X-2001
 > > (which represents the bulk of existing 802.1X deployments) does not 
support
 > > peer-initiated messages.
 >
 > 802.1X, I thought already supports the notion of peer-initiated
 > messages: EAPoL-Start and EAPoL-Logoff. In fact, we are trying to mimic
 > that in some ways with EAP-Initiate/Reauth-Start (in the other
 > direction, of course).

[BA] The EAPoL-Start and EAPoL-Logoff messages are handled differently
than EAP messages within IEEE 802.1X.  Since IEEE 802.1X-2001 combined
the 802.1X and EAP state machines (prior to RFC 4137), it
assumed that EAP messages were authenticator-initiated. 
IEEE 802.1X-2004 separates 802.1X from EAP processing, so I think
it is better.  However, the bulk of implementations are still IEEE 
802.1X-2001
(including most WPA2 implementations), as far as I know.

 > > In order to minimize the backward compatibility issues, it probably 
makes
 > > sense for the peer not to utilize ERP unless it has an indication 
that it
 > > is supported on a given network and AAA server (e.g. based on
 > > pre-configuration). Currently the document does not require this.
 >
 > Let me propose some text here. I understand this concern.
 >
 > >
 > > Sections of the document relating to AAA packet routing are somewhat
 > > unclear, and may introduce changes to the way that RADIUS
 > > clients route packets. However, discussion of AAA routing seems 
somewhat
 > > orthogonal to the purpose of this document, so one way forward 
would be to
 > > move this material to the RADIUS ERP document instead.
 >
 > Yes, that is the plan. Alan is suggesting that we cover some of it in
 > this document. I am going to talk to him and see what we might come up
 > as a way forward.

[BA] On reading the document, it seemed to me that many of the routing
issues could be addressed by having the authenticator put the appropriate
NAI into the User-Name attribute.  This would cause the packets to be
routed to the right entity without having to change the routing algorithm.


Ok, thanks.  I think this is what Joe may be proposing as well.


 > > 5. Do you anticipate any manageability issues with the specification?
 > >
 > > In today's carrier deployments, we are seeing the need for the 
facilities
 > > such as "Hotlining", which require the ability to modify authorizations
 > > or remove key state created by a user session.
 > >
 > > RFC 5137 typically uses the User-Name as the key which the NAS uses in
 > > order to locate the state which is to be affected. However, ERP
 > > introduces state within the local ERX server as well as on the NAS,
 > > and it is not clear how this state can be removed. For example, the
 > > local ERX server may not have access to the actual User-Name, since
 > > this could be hidden within the EAP conversation. As a result,
 > > I think that there is an implication that a user identifier such
 > > as the CUI is used to identify key state on the ERX server; however,
 > > this is not stated.
 >
 > Dan suggested a mechanism for this. His idea is to have the notion of a
 > root key name (emskname) and refer to the rest of the keys with the root
 > key name and the context of the key usage. With that in place, it would
 > be possible to delete all keys associated with "emskname." Would that
 > address this issue?

[BA] I'm not sure; I think it would be necessary to go through RFC 5137
usage scenarios to make sure.


I will do that.

thanks Bernard.

regards,
Lakshminath



 >
 > >
 > > 6. Does the specification introduce new potential security risks or
 > > avenues for fraud?
 > >
 > > One of the issues introduced by "fast handoff" specifications that
 > > bypass the AAA server is that this can result in accounting packets
 > > being sent without corresponding evidence of user presence. For
 > > example, when the user is required to authenticate at each 
authenticator,
 > > the home server has evidence that the user was in fact present at
 > > those locations and times, even though the session times could be
 > > inflated.
 >
 > Right, exactly; so there is some amount of inherent trust here. If the
 > visited domain lies about accounting records, it's a problem. Likewise,
 > the proxies could also modify accounting records without detection.
 >
 > >
 > > With ERP, it is required for the user to authenticate once within the
 > > local domain, and then for it to remain there until the keys expire.
 > > This could involve a continuous session, or the user could go to
 > > another domain and come back without having to re-authenticate.
 > >
 > > To some extent, the risk can be controlled by the home server
 > > administrator by changing the key lifetime so as to require
 > > re-authentication within a given time frame. However, the document
 > > does not describe how rIK key lifetime will relate to other lifetimes
 > > such as the Session-Id in order to accomplish this.
 >
 > We could introduce some text on this. In essence, the home server does
 > have some control on key lifetimes, but that is really up to individual
 > servers' local policies. We can provide some guidance.
 >
 > >
 > > A more serious issue appears to arise in the "implicit boostrap" 
exchange,
 > > where the DSRK request is inserted by the local ERX server in a normal
 > > EAP conversation. As specified in the document, the AAA server does
 > > not appear to have the ability to verify this request. For example,
 > > there is no requirement that the "local domain" correspond to the
 > > domain that would be returned from a PTR RR query on the 
NAS-IP-Address.
 > > This would seem to imply that any intermediate proxy can obtain a
 > > DSRK, and with it, the ability to submit unverifiable accounting
 > > records.
 >
 > One possible solution here is to have a local policy at a home ER server
 > to allow/deny implicit bootstrapping. The peer can know the home's
 > policy due to configuration and run explicit bootstrapping 1) when the
 > home requires it and 2) when it does not know the local domain name.
 > Would that alleviate the concern?

[BA] I think the issue is about the home server trust of the ER server, so
the policy would probably need to exist at the home server or on the peer
(as controlled by the home administrator).

 >
 > Thanks again for your review Bernard.
 >
 > best regards,
 > Lakshminath
 >
 > >
 > > This would seem to introduce a fraud risk that is not
 > > present in existing fast handoff proposals.
 > >
 > >
 > > 
------------------------------------------------------------------------
 > >
 > > _______________________________________________
 > > Ietf mailing list
 > > Ietf(_at_)ietf(_dot_)org
 > > http://www.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
http://www.ietf.org/mailman/listinfo/ietf