On Fri, Feb 6, 2015 at 8:47 AM, Jim Gettys <jg(_at_)freedesktop(_dot_)org>
wrote:
On Thu, Feb 5, 2015 at 6:34 PM, Richard Shockey
<richard(_at_)shockey(_dot_)us>
wrote:
OK yes, My Netflix download is going to kill your VOIP call.
Yes, it may, though pacing traffic may help (see the sched_fq work in
Linux).
RS > Well ..its always been more subtle than you think. You have to
distinguish between a Voice OVER IP call aka Skype etc vs a “managed
service” like Voice USING IP or VuIP which is an entirely different beast.
Modern VuIP which is all of Cable Voice in Europe and the US VZ FIOS etc
and VoLTE is managed and uses IP technologies such as SIP/IMS but the
routing may or may not have anything to to do with BGP. And BTW you have
to still do the first order number translation as well. AKA RFC 6116 ENUM
or something new which we are actually debating in RAI.
Its segmented managed traffic. Its not Netfilx killing Skype its
Microsoft Apple Android Updates as well. We have no visibility on how the
OS actually queues application packets if at all.
Yup; this is a problem in the upstream direction (so is not the case you
state above, but the inverse case, typified by events such as some one
emailing a pile of images to your friends/family, backup and similar
operations.
One of the most common bottlenecks is the WiFi or cellular hop, and if the
operating system does a single bloated FIFO drop tail queue discipline, you
get into trouble.
On Linux, this queue discipline has historically been one called
PFIFO_FAST, and implemented a large (typically 1024 packet) FIFO drop tail
queue (with a little bit of diffserv thrown in for good measure).
Turning on a different queue discipline is a single configuration line,
and it appears that people have been deciding to make fq_codel the default
in various Linux distributions as of last fall (it has been the default in
OpenWrt on routers for a while).
At some point, it may make sense to use a different queue discipline
(sched_fq) as the default, but I think more testing is needed. That's a
bit of a discussion better left to a different message.
OK we have a technical fix. But the problem is how to get it deployed by
the broadband providers.
There is a very good paper on computer security that might be relevant
here. 'Folk models of home computer security'
http://www.rickwash.com/papers/rwash-homesec-soups10-final.pdf
One of the thing we have found at Comodo is that virtually every home
computer issue is almost automatically attributed to being 'a virus'. What
the customer actually wants is for their damn machine to work. Or
alternatively, they want their relatives to stop calling them to ask them
to fix their computers.
The point is that people tend to leap to external attack as being the most
likely cause of any computer failure issue that isn't obviously dead
hardware. Which is really weird when I spent most of the 90s being told I
was an evil scaremonger for suggesting those cuddly-wuddly hackers might
actually be rather nasty thieving types.
So people see their Netflix or their Vonage suddenly sputter and the first
explanation that comes into their heads is 'my ISP is trying to kill them
to sell me their stuff instead'.
We have the potential here for a really bitter dispute. Instead of picking
up the phone to complain to their ISP, people have been picking up the
phone to complain to their Senator or the guy in the White House.
And it is going to be really difficult to explain to a lot of people who
have taken up arms on either side of this dispute that this might be the
cause of the slowdowns. One side is going to accuse us of being shills for
the corporate interests. And on the other side there are a lot of lobbyists
licking their chops at the thought of fat billable hours for as long as
they can make the fight worse.
So how do we de-escalate the situation?
One part is that we need more than a technical fix for this problem. We
need to be able to tell Joe or Jane Consumer how often these slowdowns
occur and what the cause is. The problem being that the cause of the
problem might be on the broadband provider side or the home user side of
the network.
So maybe have the residential gateway collect some data and expose it to
the consumer in some fashion. This could also help debug other connection
issues. I was having unexplained network slowdowns for a month that were
eventually found to have been caused by a falling tree snapping the fiber
but not the sheath round it. So the result was a flaky connection that had
the peculiar property of working for some frequencies but not others.
Another is to point out that just as the fact that you can't print from
your computer is because you have the wrong printer port assignment rather
than a virus does not mean viruses do not exist, the fact that buffer bloat
explains some network slowdowns does not mean it explains every slowdown.
Nor does it mean that malice is never a cause.