X

RIM offers explanation for massive outage

After two days of silence, Research In Motion finally provides some details of what wiped out its BlackBerry e-mail service.

Marguerite Reardon Former senior reporter
Marguerite Reardon started as a CNET News reporter in 2004, covering cellphone services, broadband, citywide Wi-Fi, the Net neutrality debate and the consolidation of the phone companies.
Marguerite Reardon
3 min read
Research In Motion finally offered some details late Thursday about what caused a severe outage of its BlackBerry e-mail service from Tuesday evening until Wednesday morning.

The company said in a statement that it had ruled out security and capacity issues as a cause of the outage that left millions of so-called "CrackBerry" addicts without access to their e-mail for several hours. The company also said the incident was not caused by any hardware failure or core software issue.

Ruling out those causes, the company has "determined that the incident was triggered by the introduction of a new, noncritical system routine that was designed to provide better optimization of the system's cache." In computing terms, a cache is a temporary storage area for that allows data to be served up quickly.

RIM said the system routine had not been expected to affect the regular operations of the BlackBerry servers and infrastructure. Despite previous testing, the new system routine produced an unexpected effect that set off a chain reaction, triggering a series of interaction errors between the system's operational database and the cache.

After RIM isolated the database problem and tried unsuccessfully to fix the issue, it began its "failover" process to a backup system. But that also failed.

"Although the backup system and failover process had been repeatedly and successfully tested previously, the failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," the company said in the statement.

RIM also said it has already identified several aspects of its testing, monitoring and recovery processes that it plans to improve as a result of the incident.

Since the outage's start--around 5 p.m. PDT Tuesday--the company had been quiet about its cause. But experts said they were convinced the issue had to do with RIM's network since subscribers were still able to make phone calls and send and receive text messages.

RIM's service is centralized and works by routing all BlackBerry e-mails through one of two main network operations centers, which are essentially large data centers. One center is located in Canada and primarily serves the Western Hemisphere as well as parts of Asia. The other data center, located in the U.K., handles e-mail traffic in Europe, Africa and the Middle East. Analysts had speculated that since most of the people affected by the outage were based in North America that it was likely the problem occurred in the center located in Waterloo, Ontario.

By Wednesday morning, RIM said, the e-mail had begun trickling into in-boxes across North America. The service was operating normally on Thursday, the company said.

RIM has built a strong reputation as a reliable service provider that has attracted bankers, lawyers and lawmakers as subscribers. The company has recently been trying to broaden its appeal to consumers with new products, such as the BlackBerry Pearl handheld and the BlackBerry 8800.

The new strategy has helped the company rapidly expand its subscribers. In its latest quarter, RIM reported it had added 1.02 million new subscribers, taking its total to 8 million. This is a huge increase from the 2 million subscribers the company reported a year ago, when it settled a patent infringement case with NTP. The company expects to add between 1.12 million and 1.15 million subscribers during the current quarter.