X

What's behind AOL outages?

The online giant has suffered two power-related outages in a month, and analysts wonder if the service is fully prepared.

Paul Festa Staff Writer, CNET News.com
Paul Festa
covers browser development and Web standards.
Paul Festa
5 min read
When an electrical malfunction this week put America Online (AOL) on the fritz for the second time in a month, some users and analysts began wondering if the online service had fallen victim to an extraordinary coincidence--or if the company is inadequately prepared for unforeseen problems.


CNET Radio talks to power conditioning expert Craig Waterman
 
AOL has declined to elaborate on the circumstances of the malfunctions, saying the two incidents are under investigation. Spokesperson Wendy Goldberg did say the company doesn't think the incidents are related.

The scope of the outages and AOL's apparent lack of understanding about them are surprising, however, given the online service's size and dominance in the market. And many experts in networking and power supplies, as well as some of AOL's Internet competitors, question why AOL's systems have proven vulnerable.

"It's baffling that they would have two power-related failures with redundancy," said Craig Waterman, an AOL user and president of power reliability supplies provider PowerSource in Campbell, California.

Most "mission-critical" services--such as operations at hospitals, military installations, and more recently Internet service providers--are backed up with what is known as "redundant" power supplies that take over when the main power is knocked out. AOL is backed up with "extensive redundancy," according to Goldberg, but she did not explain why those redundant systems did not prevent the system from going down.

In Tuesday's outage, members throughout the system were unable to log on for about 20 minutes. Subsequently, an unspecified number of users were unable to send or receive email or use other AOL services including chat and newsgroups for a few hours--and, in some cases, into the next day.

"We had never had this happen before, and we don't want it to happen again," Goldberg said of the earlier incident, on February 24. "We don't know the cause, and we will investigate until we're satisfied we've found it."

Although AOL and other ISPs have paid a lot of high-profile attention to backing up servers and increasing bandwidth, ISP managers note that power is an equally critical link in the chain of providing uninterrupted service.

AT&T WorldNet, for example, houses its servers in two data centers in separate states. Each of those data centers is individually redundant, meaning that servers, data networks, and power supplies are backed up. And they are mutually redundant to a certain extent, so that if a disaster were to completely disable one, the other would provide at least partial service.

WorldNet constantly monitors the data centers' power supply both as it enters the buildings and as it courses through them, according to WorldNet product line director Rose Klimovich. "That way, if something goes wrong, we have time to fix it before it affects customers," she said. AOL makes online waves

The ISP also periodically tests its power redundancy by bringing down its main power supply to make sure the backup system kicks in, a precaution Klimovich said should be a matter of course. "We do periodic disaster-recovery planning to see what would happen if things did go down," she added.

WorldNet has never had a power-related service outage, according to Klimovich.

ISP EarthLink Network started out with its own share of power-related service problems and quickly learned from them, according to Steve Dougherty, the company's director of Internet operations.

"We started off as a small ISP in a noncommercial area, and we only had a small amount of battery backup," Dougherty recalled. "We had two power outages two years ago that caused service problems, and when we built our new data center we made sure it would never happen again. This sort of thing doesn't happen to big ISPs or to people who know what they're doing."

EarthLink's power backup consists of two sets of UPS (uninterruptable power supply) batteries, either one of which can support the data center for at least one hour. A separate power generator will kick in within 8 seconds of a power outage and last for 72 hours on a refillable tank of fuel.

Dougherty speculated that claiming power trouble might be a fabrication in some cases when the problem is actually related to software. But he also said the varied nature of AOL's recent outages, which affected email, chat, and newsgroups, probably pointed to an electrical problem.

Goldberg responded that the online service is scrupulous about reporting the cause of its service problems.

"We have been very direct and honest with our members in terms of what causes system glitches," she said. "When it's software, we say its software. When it's hardware, we say it's hardware. When it's electrical, we say its electrical."

AOL has two data centers in separate cities in Virginia, where the company is based. Last year, the company announced the construction of a third data center, which Goldberg said will open "in the near future."

The timing of AOL's recent power lapses could hardly have been worse--the first on the heels of the company's announcement that it would raise its rates 10 percent to $21.95 per month for unlimited access, and the second just days before that price hike was implemented. On top of that unfortunate juxtaposition of events, the outages come as people's expectations of reliability are increasing for ISPs overall.

At last count, AOL has more than 11 million members and has expanded its international presence considerably. But along with the dominance AOL has enjoyed has come the perception among users that the online service is akin to a public utility, such as the phone company.

"People are increasingly expecting AOL and ISPs to act like other utilities, and that means always on," Forrester Research analyst Kate Delhagen said. "As people's expectations continue to rise, the question becomes whether or not the expectations are in synch with what the industry can deliver. Yet any provider unable to deliver that is at risk."

Those expectations are beginning to spill over into legal action as well. In one case, an Illinois state legislator has moved to punish ISPs when service goes down. And AOL itself became the target of dozens of U.S. attorneys general after it experienced serious service problems last year in the wake of introducing its unlimited-access pricing plan.

With a majority of the domestic Internet subscriptions, AOL faces a singular challenge in maintaining its network, Delhagen said.

AOL is quick to agree. "We have the biggest dial-up network in the world," Goldberg said. "And our priority is keeping members connected."

AOL network downtime, including both scheduled and unscheduled maintenance, is less than 1 percent, according to Goldberg.

While analysts play down outages' potential to drive AOL subscribers into the arms of the online service's competitors, they do suggest that the higher-priced service could lose future subscribers if a reputation for poor connectivity precedes it.

"It will be interesting to see how other ISPs market their $19.95 rates now that AOL has raised its prices, and get some of the churn that AOL is going to realize from these problems," International Data Corporation analyst Jill Frankle said. "People who pay a monthly subscription fee should be able to get on all the time."