After a few days of decompressing, I have been considering
what to say that is helpful without unnecessarily prolonging
this conversation. I have been involved in the delivery of
wireless for six IETF meetings (#s 46, 56, 58, 60, 61, and 62),
some less painful than others and four without hosts. For those
commenting on how a familiar venue should help and wasn't it
better last time we were in Minneapolis, I distinctly remember
sitting in the health club hot tub at the end of IETF58 swearing
I would never do the wireless again. Believe me, it wasn't
better last time. For whatever reason, Minneapolis hasn't been
kind to IETF wireless recently.
As I wrote this it got longer and longer, so the abridged version is:
- We had problems on Monday, but we believed the wlan to
be operational (albeit without IPv6) with a few obscure
problem reports from Tuesday onward. If people were
indeed experiencing debilitating problems all week, then it
is unfortunate that we were not aware of the issues.
- We can document lessons learned and recommendations
for future hosts, but I believe the current model for
providing wireless to attendees is broken and needs to be
fixed. I would be happy to participate in the discussion on
how to fix this. These discussions should probably take place
off this already overloaded list.
So, unless you want gory details and rambling... you can stop
- We had no wireless hardware one week before we were
scheduled to install the wireless. We twisted arms to get
hardware and support from two companies who really
stepped up at the last moment. We started the wireless
install on Friday, March 4th.
- We did not deploy anything new or experimental. We
deployed what we had available. In the case of the
wireless, the alternative was about a dozen Cisco 350s
the secretariat had stashed away in case of emergency.
We did what we have done for the past two IETF
meetings - only with a different combination of
equipment in a different venue. The addition of 802.11a
did not add complexity and if anything improved the
situation by moving some of our wireless users out of
the b/g range. I would agree that we could drop the wep
and .1x portions, but again, this worked fine for the
previous two events. Believe me, we are very risk
- After a surprisingly easy install, we had a meltdown
Monday morning at the beginning of the first session.
This meltdown and the following shockwaves on
Monday resulted from some less than optimal
deployment decisions, a configuration issue, a bug in
the deployed code, and some unforeseen interactions
with the infrastructure. In an attempt to stabilize things,
we shed a number of capabilities culminating in a
downgrade of controller code on Monday night. I
would like to say that we did this in a careful and
reasoned fashion, but I will admit there was a fair
amount of chaos.
- Tuesday morning we wandered around trying to see
how things were going. Most people we talked to
seemed happy enough at the time but willing to tell us
war stories from Monday. (Thank you, but we already
knew Monday was bad.) We were getting sporadic
reports of IP connectivity problems, but we couldn't
seem to catch anyone actively experiencing the
problem. I sent out email soliciting input from people
experiencing or having experienced the problem
sometime after Monday. I received a total of four
responses over the next two days.
- At this point, we considered moving to more current
code, but the reports we were getting indicated that
things were working. Because our perception at that
point was that things were working, we made a decision
on Tuesday evening not to upgrade.
- We don't really have a good way to measure the user
experience. In this case, we thought the wireless was
mostly working. I asked for gentle feedback during the
meeting. With the exception of Steve Casner who
patiently reported back to us, we received basically
nothing. The helpdesk was also tracking problems for
us and again after Monday received very few reports.
- Until the flood of email started on Friday, we believed
that we had delivered a working wireless network (after
Monday) and with an occasional problem that impacted
a small subset of users but was not debilitating.
- I do not see much motivation for hosts (or vendors for a
meeting without a host) to support the IETF network.
The risk to benefit ratio is just too high. They can get
much better exposure in environments that are less
stressful and that they have more control over.
- Advanced staging helps to reduce configuration issues
and allows more time for operational troubleshooting
onsite. This can't happen when you don't have a host or
contract out the service.
So where do we go from here...Well, we have been asked to
document our lessons learned. While we can do this, it seems
to me that there are new issues that bite us each time (e.g. one
time we had APs that rebooted every time a certain threshold
of clients was reached. The code was eventually fixed. The
problem doesn't exist anymore.) We can provide guidelines
and experiences, but war stories from the world of IETF
wireless deployments don't seem that useful. I would be happy
to work to document the basic guidelines that we use, but
generally there is a set of constraints that complicate things -
like having no hardware. The key to improving the reliability is
continuity between meetings and the current model does not
Given the trouble the IETF has with getting sponsors for the
meetings, perhaps it is time to revisit our model of operation. If
we want the network (including wireless) as a production
service, then perhaps we should contract out that service to an
entity that would be responsible for it and could sustain more
continuity than the current model allows. This naturally costs
money. There are people out there that will do this for the right
price. I would be happy to hand over my green dot to someone
properly resourced to do the job.
Ietf mailing list