Google networking error caused outage

Google's attempt to roll out new networking addresses as part of the IPv6 transition did not work out as planned, causing widespread service outages for about an hour.

Tom Krazit Former Staff writer, CNET News
Tom Krazit writes about the ever-expanding world of Google, as the most prominent company on the Internet defends its search juggernaut while expanding into nearly anything it thinks possible. He has previously written about Apple, the traditional PC industry, and chip companies. E-mail Tom.
Tom Krazit
3 min read

Updated at 12:25 p.m. PDT with word that Google has confirmed an error on its end caused the outage, and at 3:30 p.m. PDT with Google's comment on McAfee's description of the events.

Widespread outages involving several Google services--including search, Google Docs, and Gmail--were caused by an upgrade gone awry inside of Google, according to engineers.

Dmitri Alperovitch, vice president of threat research for McAfee, said that Google this morning attempted to make changes to key Internet routing numbers--known as autonomous system numbers--as part of its ongoing transition from an older networking standard to a newer one called IPv6. An unknown "bug" inside Google's network involving some sort of hardware failure or glitch prevented Internet service providers from finding Google's new ASNs on the Internet--effectively sealing it off from many customers, he said.

Not all Internet users were affected, but some that use larger providers--such as AT&T or Verizon--appeared to be disproportionately hurt because large ISPs "peer" with Google, or interconnect their networks with Google's networks in order to improve speed and reduce bandwith costs, Alperovitch said. Not all customers at those providers were affected, and smaller ISPs that didn't interconnect their networks were able to route around the problem. But just like when a bad car accident shuts down a key highway, the ripple effects were felt elsewhere.

The outage began at 8:13 a.m. PDT, according to McAfee's data, and was fixed by 9:14 a.m. PDT. The issue was discussed inside forums dedicated for ISPs and their engineers, such as the North American Network Operators Group. McAfee's customers reported the issue to the security company, which monitors network traffic for some customers.

Google is a major fan of IPv6 and makes many of its services available through the new network technology. However, IPv6 has been slow to arrive overall, in part because it's a very difficult transition from the current IPv4 network.

Google spokesman Eitan Bencuya wouldn't confirm what caused the problem but said the company plans to detail what happened in a company blog to be published "shortly."

Update at 12:25 p.m. PDT: Google has confirmed that "an error in one of our systems caused us to direct some of our Web traffic through Asia, which created a traffic jam." The company did not elaborate on what caused the error in a blog post, but claimed just 14 percent of users were affected.

"We've been working hard to make our services ultrafast and 'always on,' so it's especially embarrassing when a glitch like this one happens. We're very sorry that it happened, and you can be sure that we'll be working even harder to make sure that a similar problem won't happen again," Google wrote.

Updated 3:30 p.m. PDT: Google has denied that work on the transition to IPv6 is to blame for this morning's outage, but will not specify what was to blame. "This issue is unrelated to any work we are doing in transitioning toward supporting IPv6," a spokesman said. McAfee said it obtained its information from Google on a private mailing list for networking professionals of which Google is a member, but declined to provide a copy of the thread in question.