Re:reflections from the trenches of ietf62 wireless

Hi Karen,

Thanks for your feedback. I was one of those having problems and not
reporting it, but I didn¹t saw any email asking for more feedback after
Monday, so I assumed the ³fixing² work was on-going. Actually I¹d problems
to read my emails, and I believe even lost some emails for some strange
circumstance, which I tend to associate to the IETF connectivity, because
not having changed anything in my server, neither my laptop, when I come
back to Madrid, everything was working fine (and as said, NOTHING changed).

The sad thing is that for me everything was working fine on Monday, because
even when IPv4 was not working very fine, I was using IPv6 and worked very
well :-))) So from my point of view a big mistake disabling it, because
didn¹t solved the problems, as we have learned afterwards.

In *short* I will say that my conclusion to your exposition is that we lack
for a proper planning, something that we know already for the meeting
arrangements, but which is more critical in terms of ensuring a proper
network deployment (you need time to think about what can be wrong, what
went wrong already in previous occasions, and try to avoid it as much as
possible).

May be will be interesting to setup a mail exploder for taking care of the
meetings planning and preparation ? It can be not only technical but also
about the logistics of the meetings (or two different mail exploders).

Regards,
Jordi





De: "Odonoghue, Karen F CIV B35-Branch" 
<karen(_dot_)odonoghue(_at_)navy(_dot_)mil>
Responder a: <ietf-bounces(_at_)ietf(_dot_)org>
Fecha: Tue, 15 Mar 2005 11:08:51 -0600
Para: <ietf(_at_)ietf(_dot_)org>
Asunto: reflections from the trenches of ietf62 wireless

Folks, 

After a few days of decompressing, I have been considering
what to say that is helpful without unnecessarily prolonging
this conversation. I have been involved in the delivery of
wireless for six IETF meetings (#s 46, 56, 58, 60, 61, and 62),
some less painful than others and four without hosts. For those
commenting on how a familiar venue should help and wasn't it
better last time we were in Minneapolis, I distinctly remember
sitting in the health club hot tub at the end of IETF58 swearing
I would never do the wireless again. Believe me, it wasn't
better last time. For whatever reason, Minneapolis hasn't been
kind to IETF wireless recently.

As I wrote this it got longer and longer, so the abridged version is:
-  We had problems on Monday, but we believed the wlan to
be operational (albeit without IPv6) with a few obscure
problem reports from Tuesday onward. If people were
indeed experiencing debilitating problems all week, then it
is unfortunate that we were not aware of the issues.
-  We can document lessons learned and recommendations
for future hosts, but I believe the current model for
providing wireless to attendees is broken and needs to be
fixed. I would be happy to participate in the discussion on
how to fix this. These discussions should probably take place
off this already overloaded list.

So, unless you want gory details and rambling? you can stop
reading here? 

Technical Summary/Issues:
-  We had no wireless hardware one week before we were
scheduled to install the wireless. We twisted arms to get
hardware and support from two companies who really
stepped up at the last moment. We started the wireless
install on Friday, March 4th.
-  We did not deploy anything new or experimental. We
deployed what we had available. In the case of the
wireless, the alternative was about a dozen Cisco 350s
the secretariat had stashed away in case of emergency.
We did what we have done for the past two IETF
meetings  only with a different combination of
equipment in a different venue. The addition of 802.11a
did not add complexity and if anything improved the
situation by moving some of our wireless users out of
the b/g range. I would agree that we could drop the wep
and .1x portions, but again, this worked fine for the
previous two events. Believe me, we are very risk
adverse. 
-  After a surprisingly easy install, we had a meltdown
Monday morning at the beginning of the first session.
This meltdown and the following shockwaves on
Monday resulted from some less than optimal
deployment decisions, a configuration issue, a bug in
the deployed code, and some unforeseen interactions
with the infrastructure. In an attempt to stabilize things,
we shed a number of capabilities culminating in a
downgrade of controller code on Monday night. I
would like to say that we did this in a careful and
reasoned fashion, but I will admit there was a fair
amount of chaos. 
-  Tuesday morning we wandered around trying to see
how things were going. Most people we talked to
seemed happy enough at the time but willing to tell us
war stories from Monday. (Thank you, but we already
knew Monday was bad.) We were getting sporadic
reports of IP connectivity problems, but we couldn't
seem to catch anyone actively experiencing the
problem. I sent out email soliciting input from people
experiencing or having experienced the problem
sometime after Monday. I received a total of four
responses over the next two days.
-  At this point, we considered moving to more current
code, but the reports we were getting indicated that
things were working. Because our perception at that
point was that things were working, we made a decision
on Tuesday evening not to upgrade.
-  We don't really have a good way to measure the user
experience. In this case, we thought the wireless was
mostly working. I asked for gentle feedback during the
meeting. With the exception of Steve Casner who
patiently reported back to us, we received basically
nothing. The helpdesk was also tracking problems for
us and again after Monday received very few reports.
-  Until the flood of email started on Friday, we believed
that we had delivered a working wireless network (after
Monday) and with an occasional problem that impacted
a small subset of users but was not debilitating.

Structural/Administrative Issues:
-  I do not see much motivation for hosts (or vendors for a
meeting without a host) to support the IETF network.
The risk to benefit ratio is just too high. They can get
much better exposure in environments that are less
stressful and that they have more control over.
-  Advanced staging helps to reduce configuration issues
and allows more time for operational troubleshooting
onsite. This can't happen when you don't have a host or
contract out the service.

So where do we go from here?Well, we have been asked to
document our lessons learned. While we can do this, it seems
to me that there are new issues that bite us each time (e.g. one
time we had APs that rebooted every time a certain threshold
of clients was reached. The code was eventually fixed. The
problem doesn't exist anymore.) We can provide guidelines
and experiences, but war stories from the world of IETF
wireless deployments don't seem that useful. I would be happy
to work to document the basic guidelines that we use, but
generally there is a set of constraints that complicate things 
like having no hardware. The key to improving the reliability is
continuity between meetings and the current model does not
support that. 

Given the trouble the IETF has with getting sponsors for the
meetings, perhaps it is time to revisit our model of operation. If
we want the network (including wireless) as a production
service, then perhaps we should contract out that service to an
entity that would be responsible for it and could sustain more
continuity than the current model allows. This naturally costs
money. There are people out there that will do this for the right
price. I would be happy to hand over my green dot to someone
properly resourced to do the job.

Karen O'Donoghue 


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf