Amazon working again, but what went wrong?
Site is back up and running after a two-hour outage. What went wrong at the e-commerce giant?
Update 4:36 p.m. PDT with outside comment about possible causes of the Amazon.com outage.
A two-hour Amazon.com outage is over. Now on to the post-mortem: what triggered the problem?
Amazon declared itself clear of the problem this afternoon. "The Amazon retail site was down for approximately two hours earlier today beginning around 10:25 a.m. The site (is) back up," the company said in statement.
But as to the explanation, the company only hinted that its complicated computing infrastructure was, unsurprisingly, a culprit.
"Amazon's systems are very complex and on rare occasions, despite our best efforts, they may experience problems. We work to minimize any disruption and to get the site back as quickly as possible," the company said, declining to comment further.
Human error?
The most likely culprit was simple human error, in the estimation of Shawn White, director of operations for Keynote Systems, which monitors Web site availability.
"Some engineer might have made a particular change, not knowing it could cause a trickle-down effect" that eventually brought down the site.
For example, he said, somebody in charge of maintenance might have been directing Internet traffic to a particular group of servers, but selected the wrong group.
But at Amazon? "What I find still so surprising is it happened in the middle of the day. Typically you do that in off-peak hours," White said. "They rank on the top with performance and availability, consistently, time and time again."
Network attack?
Another possible explanation is an attack such as the distributed denial-of-service (DDOS) attack that struck Amazon and other high-profile sites in 2000. White thinks it unlikely, though, that a crushing load of network traffic brought Amazon down.
"These guys are experts at dealing with flash floods of users," including those that routinely arrive during peak shopping days. "Usually, when you see a site going under because of traffic issues or a denial-of-service attack, you see a gradual slowdown in performance and drop in availability. Here we saw at 10:16 a.m. it completely dropped off 100 percent."
Soups Ranjan, a senior member of the technical staff of network protection and management company Narus, hasn't yet found any attack evidence.
"It doesn't seem to be the result of a network-initiated attack, at least from my preliminary analysis from our probes," Ranjan said.
Human error may not sound as gripping a tale as a network attack, but there's plenty of drama for the people responsible. And it's the career-limiting variety of drama, said Illuminata analyst Gordon Haff, who hazarded a guess that Amazon's problem involved its front-end Web servers.
The security group of WebSense, a Web site and communications protection company, also saw no evidence Amazon's problem was security related.
CNET staff writer Robert Vamosi contributed to this report.