BlackBerry outage: RIM a victim of its own success?

The very nature of RIM's network could have led to the outage that caused millions of BlackBerry users to go without e-mails for hours.

Marguerite Reardon Former senior reporter
Marguerite Reardon started as a CNET News reporter in 2004, covering cellphone services, broadband, citywide Wi-Fi, the Net neutrality debate and the consolidation of the phone companies.
Marguerite Reardon
5 min read
Research In Motion's massive BlackBerry e-mail outage this week highlights how vulnerable the company's network has become as it tries to keep up with demand for its popular service.

Research In Motion did not provide details of what caused the outage, which left millions of BlackBerry subscribers without access to e-mail on Tuesday evening and into Wednesday morning. The company said in a statement released early Wednesday that it was still reviewing the situation.

But analysts say that judging from the nature of the outage and who was affected, the problem falls squarely on RIM's shoulders. For one, the outage only impacted data services, including e-mail and mobile Web browsing. Subscribers were still able to make phone calls and send and receive SMS text messages.

All of this points to some kind of technical issue within one of Research In Motion's Network Operations Centers, which acts as an intermediary between corporate mail servers and recipients.

The e-mail outage, first reported by WNBC, began around 5 p.m. PDT on Tuesday and lasted until the wee hours of the morning on Wednesday when e-mail began trickling into inboxes to users across North America and parts of Europe and Asia. The widespread disruption highlights just how vulnerable RIM's network has become, especially as the company's subscriber base grows.

Over the years, RIM has built a good reputation as a reliable service provider attracting bankers, lawyers and even congressional lawmakers as subscribers. Lately the company has also been trying to attract more mainstream customers with new handsets like the BlackBerry Pearl and the BlackBerry 8800, both of which include media players and mobile browsers for Web surfing.

The result has been a spike in subscriber growth. In the company's latest quarter, it reported it had added 1.02 million new subscribers, taking its total to 8 million. This is a huge increase from the 2 million subscribers the company reported a year ago when it settled its patent infringement case with NTP. The company expects to add between 1.125 million and 1.15 million subscribers during the current quarter.

"With all the recent subscriber growth the company has seen, it's not surprising that they would have network problems," said Dan Taylor, managing director of the Mobile Enterprise Alliance, a nonprofit trade organization that promotes enterprise mobility. "They've just about quadrupled their subscriber base in the last 12 to 16 months. In some ways it was an accident waiting to happen. I'm sure the people running the NOC were aware that something could happen, and I'm sure they are working to get it fixed."

How it works
While it's not known for sure what caused RIM's outage, it's not difficult to see how the very nature of RIM's network could potentially lead to a major service outage. RIM's service is centralized and it works by routing all BlackBerry e-mails through one of two main NOCs, which are essentially large data centers. One NOC is located in Canada and it primarily services the Western Hemisphere as well as parts of Asia, said analysts familiar with the company. The other data center, located in the U.K., handles e-mail traffic in Europe, Africa and the Middle East.

The BlackBerry Enterprise Server, which sits on the corporate network, receives e-mails from the company's Exchange or Lotus e-mail server and forwards those e-mails in an encrypted tunnel to one of the NOCs. The NOC then acts as an efficient delivery system that authenticates users and forwards the messages to the appropriate handheld device.

Because user authentication is handled by RIM away from the corporate network, it protects companies from hackers who may try to obtain information through e-mail servers, which sit inside the company's firewall. RIM's approach also means that corporate IT departments don't have to juggle relationships with multiple mobile operators because RIM handles all of that for them in the NOC.

The flipside of RIM's approach is that with only two NOCs handling e-mails from 8 million subscribers, there are two major points of potential failure. And when something goes wrong in one or both of these data centers, it can result in an outage like the one that occurred Tuesday night and Wednesday morning, which technologically paralyzed users.

"Anytime you have a situation where traffic is flowing through a single data center, there is potential for a catastrophic outage," said Gene Signorini, vice president of enterprise research at the Yankee Group. "But that said, the RIM architecture also provides a lot of benefits to its corporate customers. It's just the nature of the beast."

Some of the most common issues that can result in an outage are power failures, failure of a critical component that takes down a larger component, software bugs, viruses and other attacks from the outside, or patches that fail. RIM hasn't identified which issue caused this particular outage, but Todd Kort, principal analyst at Gartner said the outage may have been caused by a software bug.

"If the RIM outage is affecting other parts of the globe, this fact most likely points to some type of software bug," he said in an e-mail.

While Motorola's Good Technology also uses NOC architecture to push e-mail to subscribers, competing mobile e-mail solutions like the ones sold by Microsoft and Nokia through Intellisync do not route e-mails through a centralized data center and are thus immune to this kind of outage. With these architectures, a single company or a single mobile operator might experience an outage, but it would be nothing in comparison to the magnitude of what we just saw with BlackBerry.

That said, companies managing their own corporate wireless e-mail do not get the management and security benefits that those using RIM's BlackBerry service get.

Investors seemed to take the BlackBerry outage in stride, with the company's share price changing little on Wednesday as service began to come back to life. Kort of Gartner said he believes that whatever infrastructure problems the company's quick growth may be causing, he is confident that RIM will quickly fix them.

"RIM has already recovered nicely from paying $600 million to NTP in its patent case," he said. "It has about $1.4 billion in cash, so I'm sure they can buy whatever state-of-the-art equipment they need to keep their network solid."