RE: O&M Directorate Review of draft-ietf-hokey-erx-09


Comments below.

Date: Fri, 8 Feb 2008 01:38:30 -0800
From: ldondeti(_at_)qualcomm(_dot_)com
To: bernard_aboba(_at_)hotmail(_dot_)com
CC: ietf(_at_)ietf(_dot_)org
Subject: Re: O&M Directorate Review of draft-ietf-hokey-erx-09

Hi Bernard,

Many thanks for your review.  Please see inline for some thoughts and 
proposals for improvement of erx-09:

On 2/6/2008 4:07 PM, Bernard Aboba wrote:

Review of draft-ietf-hokey-erx-09

I have reviewed this document as part of the Operations and Management
directorate effort.  These comments were primarily written for the
benefit of the O&M area directors.  Document editors and WG chairs
should treat these comments just like any other last call comments.

Detailed review comments are available here:
http://www.drizzle.com/~aboba/EAP/erx-review.txt

An answer to typical O&M issues is included below:

1. Is the specification complete?  Can multiple interoperable 
implementations
be built based on the specification?

There are a few areas of the document which are unclear to me, such as how
AAA routing is accomplished, and how/when peers require the local realm, and
if so, how it is to be obtained.  Also, clarity with respect to algorithm
agility could be improved.  There are also some issues with respect to the
required behavior of ERX peers and severs (use of normative language).

There are also situations in which multiple approaches can be chosen 
(such as
the various bootstrap options), without one being chosen as mandatory or
default.  Choosing one approach would seem to be better.  

In my judgement, addressing these issues would improve the likelihood of
being able to build multiple interoperable implementations.


I agree.  This has been brought up by Joe and we'll clarify the text. 
Some of the confusion has to do with the evolution of the draft; Vidya 
and I spent a good amount of time cleaning up around the WGLC time, but 
it appears that we can do better.

Pasi suggested adding a section on lower layer considerations.  That 
should help as well.


2. Is the proposed specification deployable?  If not, how could it be
improved?

Based on my reading of the document, it would appear that the ERX proposal
requires changes to EAP peers, authenticators and servers, as well as 
RADIUS
clients, proxies and servers.  It also appears possible that changes to the
lower layer protocols will be required in at least some cases, such as to
make the local domain available to the peer.

Given my experience in designing and operating wireless networks, 
deployments
requiring changes only to peers and authenticators (but not servers or 
RADIUS
infrastructure) can take as long as 3-5 years to complete.  For example,
WPA2 is still not universally deployed, even though the specification was
finished in 2004.


WPA2 compliance requires hardware upgrade in many cases and that may 
have been the reason for the delay.  In addition, some enterprises found 
an alternative solution, i.e., IPsec VPNs, and so were not as motivated 
to move to WPA2.

In case of ERX, a firmware upgrade should be sufficient, which is much 
more easier.


[BA] One thing that we've learned from the WEP/WPA experience is that
changes that often can be delivered via firmware/software upgrade are often 
linked
to new hardware for efficiency reasons.  For example, while TKIP was
designed for backward compatibility with WEP, few vendors offered
upgrades to existing WEP APs; most introduced the changes on new
models instead, out of the desire to only continue development on
newer branches of the code tree.  Similar examples exist for peer
updates (e.g. IPv6 support on legacy operating system versions). 

So in practice, making changes to a component will often result in the 
need for new hardware, even if hardware changes were not required by
the design.


By also requiring changes to AAA infrastructure, it seems to me that ERP
deployment will be made more difficult than upgrades to the lower layer
(such as IEEE 802.11r), which appear to achieve a similar objective. 
This puts the ERX proposal at a competitive disadvantage, and makes it
unlikely that it will be widely deployed in its current form.


In the context of WLANs, I can understand your argument, but in the 
context of foo wireless network, much of the work of 11r security, needs 
to be repeated.


[BA] The Problem Statement document made it seem that the focus was
solely on intra-media handoff, not inter-media.  Also, at various points
in the document, it appeared that link layer changes were being required.
So if the intent is for the solution to apply to inter-media handoff, then
that needs to be clarified and there may also be a need to address
potential backward compatibility issues.

11r also requires firmware upgrades to APs and STAs; 
furthermore, when physical threats to edge devices are considered, the 
R0-KH needs to be in a safer location and that may mean more L2 
architectural considerations.  The problems don't go away; they go to a 
different standards organization :).

When considering new wireless network standards, I think ERP along with 
the EMSK key hierarchy is better.  Keys for other usages can also be 
derived (the current alternative is static key provisioning).


[BA] While I would agree that the EMSK hierarchy enables use of 
EAP for application layer security, I'm not sure you want to make the argument 
for that in the ERP document.

3.  Does the proposed approach have any scaling issues that could affect
usability for large scale operation?

The proposed approach introduces state into NASes, as well as RADIUS
proxies and servers.  This state is typically of two types:  routing
state and key state.  In terms of key state storage, it would appear
that the RADIUS server needs to store key state for each authenticated
user within the Session-Id lifetime, regardless of where they are
located.  Local ERX servers store state for all local users, regardless
of their home realms. 

In order to scale to handle a large user population, additional RADIUS
servers are typically deployed, going against a replicated backend
store (such as an LDAP directory).  Similarly, additional RADIUS
proxies are deployed to handle the forwarding load.


To support the concept of local ER servers, I agree that additional 
servers need to be deployed.  However, in case of ERP with home, no 
additional devices/hardware resources are necessary.  Consider the 
alternative: in the absence of ERP, the peer would be running EAP each 
time, and in fact, taking up more resources than in case of ERP.


[BA] I think the issue is how ERP interacts with existing AAA failover
mechanisms, which allow paths to change on a dynamic basis.  The
document seems to assume the ability to do "route pinning" in various
places.

In conventional RADIUS deployments, proxies act much like routers,
so that the failure of a RADIUS proxy will not necessarily result in
failure of an EAP authentication in progress.  For example, a NAS
could switch over from use of one proxy to another one and as long
as the same RADIUS server remained reachable, the conversation could
complete normally. 

Similarly, while failure of a RADIUS server during a conversation will
require re-starting the EAP conversation, that conversation could
complete normally if restrated with a new server, since all servers
presumably have access to the same backend credential store.

Some of these assumptions no longer apply with ERX, since RADIUS
proxies and servers now store key state which is not replicated
between them.  Therefore RADIUS failover would disrupt the functioning
of ERX in a way that it does not disrupt operation of RADIUS today.

For example, if a RADIUS proxy or server goes down, all key state at that
proxy/server may be lost (the document does not talk about use of stable
storage to preserve keys), and therefore ERP requests will fail.


Sure.  If the state from an EAP authentication is lost, a new EAP 
authentication run is required the next time the peer needs to 
authenticate to the network.  I do understand that the peer will try, 
fail and then fall back to EAP.  But, that comes with having the option.


With respect to the resource requirements required to store key state,
I believe that they are manageable for the most part.
Typically RADIUS servers have substantial resources
associated with them, so that they are more capable of handling this kind
of state than NASes which are embedded devices. In terms of NAS state,
it would appear to me that the proposed approach scales better than
existing proposals such as IEEE 802.11r, since an authenticator will only
hold state for connected devices, as opposed to devices that *might*
connect in the future.

My only concern would be about RADIUS proxies.  In my experience,
proxies are often installed in co-location facilities where repairs
can be expensive and difficult, and so they are often installed on
stripped-down hardware;  with the current move toward flash, they
may not even have a hard disk in the near future.  Such stripped
down boxes may not be capable of maintaining large key caches.


If a local domain wants to support ERP functionality, they'll make the 
upgrades; otherwise, the peer would have to go to the home ER server for 
re-authentication (the roundtrips will be fewer, but each roundtrip may 
have higher latency).


[BA] On reading the document, my impression was that the peer would
typically be pre-configured for use of ERP with the home server.  That is,
the home domain would upgrade to ERP, and the peer would be configured
so as to enable use of ERP whenever the home NAI was to be presented.
However, whether the peer could actually use ERP would be determined
by whether the local network supported it or not.  That might be pre-configured
(e.g. use ERP whenever connecting to a given SSID), or it might be
learned from the network (e.g. receipt of ERP packets).

4. Are there any backward compatibility issues?

There seem to be some issues with respect to backward
compatibility with EAP as defined in RFC 3748 and RFC 4137.  For example,
the document appears to enable two packets to be in flight at the same
time, and there seems to be an assumption that ERP implementations will not
respond to EAP-Request/Identity packets. 

A bigger problem may exist with respect to RFC 2284 implementations
which represent the bulk of existing EAP deployments.  Since RFC 2284
does not specify how peers and servers behave when encountering new
EAP message types or peer-initiated messages, the behavior in the
field will be implementation dependent.

Hopefully, this does not include unanticipated ill effects (crashes,
security compromises) but it's not possible to rule this out without
testing.

There also may be issues with respect to compatibility with existing
EAP lower layers.  For example, it would appear to me that IEEE 802.1X-2001
(which represents the bulk of existing 802.1X deployments) does not support
peer-initiated messages.


802.1X, I thought already supports the notion of peer-initiated 
messages: EAPoL-Start and EAPoL-Logoff.  In fact, we are trying to mimic 
that in some ways with EAP-Initiate/Reauth-Start (in the other 
direction, of course).


[BA] The EAPoL-Start and EAPoL-Logoff messages are handled differently
than EAP messages within IEEE 802.1X.  Since IEEE 802.1X-2001 combined
the 802.1X and EAP state machines (prior to RFC 4137), it
assumed that EAP messages were authenticator-initiated.  
IEEE 802.1X-2004 separates 802.1X from EAP processing, so I think
it is better.  However, the bulk of implementations are still IEEE 802.1X-2001
(including most WPA2 implementations), as far as I know.

In order to minimize the backward compatibility issues, it probably makes
sense for the peer not to utilize ERP unless it has an indication that it
is supported on a given network and AAA server (e.g. based on
pre-configuration).  Currently the document does not require this.


Let me propose some text here.  I understand this concern.


Sections of the document relating to AAA packet routing are somewhat
unclear, and may introduce changes to the way that RADIUS
clients route packets.  However, discussion of AAA routing seems somewhat
orthogonal to the purpose of this document, so one way forward would be to
move this material to the RADIUS ERP document instead.


Yes, that is the plan.  Alan is suggesting that we cover some of it in 
this document.  I am going to talk to him and see what we might come up 
as a way forward.


[BA] On reading the document, it seemed to me that many of the routing
issues could be addressed by having the authenticator put the appropriate
NAI into the User-Name attribute.  This would cause the packets to be
routed to the right entity without having to change the routing algorithm.

5. Do you anticipate any manageability issues with the specification?

In today's carrier deployments, we are seeing the need for the facilities
such as "Hotlining", which require the ability to modify authorizations
or remove key state created by a user session.

RFC 5137 typically uses the User-Name as the key which the NAS uses in
order to locate the state which is to be affected.  However, ERP
introduces state within the local ERX server as well as on the NAS,
and it is not clear how this state can be removed.  For example, the
local ERX server may not have access to the actual User-Name, since
this could be hidden within the EAP conversation.  As a result,
I think that there is an implication that a user identifier such
as the CUI is used to identify key state on the ERX server; however,
this is not stated.


Dan suggested a mechanism for this.  His idea is to have the notion of a 
root key name (emskname) and refer to the rest of the keys with the root 
key name and the context of the key usage.  With that in place, it would 
be possible to delete all keys associated with "emskname."  Would that 
address this issue?


[BA] I'm not sure; I think it would be necessary to go through RFC 5137
usage scenarios to make sure.


6. Does the specification introduce new potential security risks or
avenues for fraud?

One of the issues introduced by "fast handoff" specifications that
bypass the AAA server is that this can result in accounting packets
being sent without corresponding evidence of user presence.  For
example, when the user is required to authenticate at each authenticator,
the home server has evidence that the user was in fact present at
those locations and times, even though the session times could be
inflated.


Right, exactly; so there is some amount of inherent trust here.  If the 
visited domain lies about accounting records, it's a problem.  Likewise, 
the proxies could also modify accounting records without detection.


With ERP, it is required for the user to authenticate once within the
local domain, and then for it to remain there until the keys expire.
This could involve a continuous session, or the user could go to
another domain and come back without having to re-authenticate.

To some extent, the risk can be controlled by the home server
administrator by changing the key lifetime so as to require
re-authentication within a given time frame.  However, the document
does not describe how rIK key lifetime will relate to other lifetimes
such as the Session-Id in order to accomplish this.


We could introduce some text on this.  In essence, the home server does 
have some control on key lifetimes, but that is really up to individual 
servers' local policies.  We can provide some guidance.


A more serious issue appears to arise in the "implicit boostrap" exchange,
where the DSRK request is inserted by the local ERX server in a normal
EAP conversation.  As specified in the document, the AAA server does
not appear to have the ability to verify this request.  For example,
there is no requirement that the "local domain" correspond to the
domain that would be returned from a PTR RR query on the NAS-IP-Address.
This would seem to imply that any intermediate proxy can obtain a
DSRK, and with it, the ability to submit unverifiable accounting
records.


One possible solution here is to have a local policy at a home ER server 
to allow/deny implicit bootstrapping.  The peer can know the home's 
policy due to configuration and run explicit bootstrapping 1) when the 
home requires it and 2) when it does not know the local domain name. 
Would that alleviate the concern?


[BA] I think the issue is about the home server trust of the ER server, so
the policy would probably need to exist at the home server or on the peer
(as controlled by the home administrator).


Thanks again for your review Bernard.

best regards,
Lakshminath


This would seem to introduce a fraud risk that is not
present in existing fast handoff proposals.


------------------------------------------------------------------------

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
http://www.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
http://www.ietf.org/mailman/listinfo/ietf