Dropbox outage sparked by buggy server upgrade

A planned server maintenance job went awry because of a "subtle bug" in an upgrade script - but users' files were never at risk, the company says.

Dropbox logo

Dropbox is pinning the blame for its Friday outage on a glitch in its server upgrade process.

The file storage and sharing site went offline on Friday and continued to suffer problems even after returning to life over the weekend. The remaining issues were corrected and the core service was restored as of Sunday 4:40 PM PT, according to Dropbox. For example, one issue prevented Dropbox users from sharing folders, but that feature is now working again.

How and why did the outage occur in the first place?

Rebutting earlier reports of a hack or DDoS (Distributed Denial of Service) attack, Dropbox said that the outage was caused by a "subtle bug" in a script involved in upgrading the operating system on its database servers. Each database uses one master and two slave machines for redundancy, a system that was caught up in the glitch.

"On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines," Dropbox's head of infrastructure, Akhil Gupta, said in a blog posted on Sunday. "During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS. A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted, which resulted in the site going down."

Gupta insisted that the files of Dropbox users were never at risk during the outage since the databases don't contain any actual file data.

To try to avoid further such outages, Gupta said that the Dropbox team has now added checks that require servers to confirm their current state before they can run an incoming command. Dropbox has also developed and implemented a tool that it believes will help speed up the recovery of large databases.

Tags:
Internet
About the author

Journalist, software trainer, and Web developer Lance Whitney writes columns and reviews for CNET, Computer Shopper, Microsoft TechNet, and other technology sites. His first book, "Windows 8 Five Minutes at a Time," was published by Wiley & Sons in November 2012.

 

Join the discussion

Conversation powered by Livefyre

Show Comments Hide Comments