Previous Entry Share Next Entry
The Bufferbloat Crisis
device
jducoeur
[Warning: this one's mainly for the techies]

I confess that I'm way behind on my technical blogs, so everyone may already know this one. But I only found out about it today, when my mother (the least techie member of the family) sent me a link to a brief Mother Jones article which boiled down to "Death of the Internet -- news at 11". That led to a slightly overwrought and confusing blog post from Robert Cringely, on the subject of "bufferbloat", which he predicted is going to be the next year's big online problem. Fortunately, *that* led me to a long series of blog posts by Jim Gettys, who was the one that put all the pieces together, and who actually explains the problem in ridiculously gory detail.

The above link to the Gettys series picks up in the middle, but seems to be where he is transitioning from lots of posts about experiments, and begins to talk about what's going on. I commend the series to anybody who is inclined to the technical side of the Internet, understands terms like "TCP", "router", "buffering", and suchlike, and has some attention span -- it's not brief.

The upshot, though, is that the Internet appears to be Pretty Damned Broken at the moment. The problem mostly shows up in weird latency lagginess, especially when you're sharing a line with a high-bandwidth connection (such as video). The crux of the issue is a well-intentioned tragedy of the commons, an accident of Moore's Law. It appears that, since memory is so cheap, everyone is building Massively Huge Memory Buffers into pretty much every piece of networking equipment -- and unless those buffers are integrated with smart traffic shaping, they totally screw up TCP's traffic management. The result is that everyone is slamming way more traffic onto the Net than they should be, over-saturating a lot of connections and causing surprisingly bad packet loss and latency.

Or to put it more simply: there's no *good* reason why people see as much lagginess on the Internet as they do today -- it's heavily the result of building buffers into network equipment that individually make some sense, but in context make things worse.

At least, I think that's what it's saying -- I won't kid you, I haven't spent anywhere near enough time to really internalize his argument yet. But it does sound like there's a subtle but pervasive bug in the way people are building equipment, which is causing the Net to badly underperform relative to what it *should* be able to do, and the problem is getting steadily worse...

  • 1
In the World At Large, I'm techier-than-the-average-bear, but among your friends I suspect I'm one of the least aware. I just have this deep-seated faith, though, that when the 'Net is about to explode, someone out there will invent the reverse-pulse frannistanium flux capacitor which will allow data packets engraved on the inner surface of neutrinos, thus increasing capacity by 24 orders of magnitude, which will take another dozen years to Not Be Enough. Isn't that the way things have always worked?

Typically, yes. The problem in this case is that it's not a single thing that's broken -- it's that a large fraction of the equipment on the Net is separately broken, all in mostly the same way. Gettys is actually mildly pessimistic about it getting fixed, simply because the problem is so dispersed and there isn't a single screwup to point to and fix.

(In that respect, it's somewhat like the IPv4/IPv6 problem -- it's clear what's wrong, it's clear how to fix it, but since it requires *everybody* to work together to fix it, we've come right to the brink of Absolute Disaster and it's still not fixed.)

I do suspect things will gradually be improved, but it may well be a slow and painful process...

It does seem to take people losing money for things to get fixed, and then not always (vis: AT&T's network in NYC, which is horrendous, but not unprofitable enough to really remedy properly)...but I share baron_steffan's inherent optimism about this.


In that respect, it's somewhat like the IPv4/IPv6 problem

The difference is that it doesn't do any good for one person to switch to IPv6, but one router getting its buffers reduced might help everybody a little bit.



And as a problem, 'too much memory' is one of the better ones to have. Just pull/reallocate it.

Is coordination (working together) required, or couldn't everybody just reduce their buffers independently? Are we talking about herding cats, or getting cats to dance in a chorus line?

Honestly, it's unclear to me. It seems like working independently should get at least much of the benefit, but Gettys seems to imply that he believes coordination is needed. (Which is, obviously, a good deal more difficult.) This is one of the places where the series lost me, so I'm not sure that I am interpreting it correctly...

It looks to me as if he doesn't think we need coordination, exactly; we need education, so that the various players will fix or replace their equipment.

What might need coordination is adopting ECN (explicit congestion notification), which has been implemented for 10 years, but has never been widely deployed, because it can lead some old firewalls to drop packets. Even there, the coordination required is less "everybody do this on Friday" and more "everybody take your ancient routers to the dump by Friday".

I think he thinks coordination is required because this is a tragedy of the commons problem. Normally, those only fix themselves by altruistic behavior when the domain is small, and the reputation boost for doing so is significant.

because this is a tragedy of the commons problem

But it isn't, really. The large buffers don't actually help anybody, and at least some people can benefit by shrinking the buffers in their own hardware.


Ah, sorry, I meant the work involved to fix it, not necessarily the large buffers themselves. Analogous to why elections get low voter turnout, even though the cost for voting, at least inexpertly, is small.

My understanding at this point is thus:

1. The problem is that one of the assumptions of TCP (the protocol that ensures that data gets delivered, for the non-tech audience) is that things aren't buffered. But they are, and that breaks some of TCP's algorithms in ways that cause re-transmissions. So, yes, I think you understand it.

2. For any given pair of points on the Internet (say, www.livejournal.com and your computer), the problem can only be entirely solved if all of the individual links between the two points fix the problem. It's not enough for you to swap out your router for a new one if every other router between you shows the problem.

the problem can only be entirely solved if all of the individual links between the two points fix the problem

No, not really. Er...strictly speaking, yes; but each link that removes bufferbloat reduces the problem. The problem is that, under load, a buffer increases the latency on a link. If a link's bitrate is N bits/second, and the buffer is M bits, then a full buffer adds M/N seconds of latency. That extra latency on each link adds up over the whole path, so any link that reduces its buffers reduces the total latency of the path.

In practice, I suspect bufferbloat is almost entirely a function of edge equipment—home routers, cable modems, and the stuff at the ISP's POP. Core routers have generally been built to route at wire speed, with no need for buffers. At least, that's how things were done when I was participating in the IETF around 2001. Bit rates have not been keeping up with Moore's Law, so I expect wire-speed routing is easier these days, not harder.


  • 1
?

Log in

No account? Create an account