Facebook said late Monday that the company believes a "faulty configuration" change caused a widespread outage that lasted roughly six hours.
"Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication," Facebook's vice president of engineering and infrastructure, Santosh Janardhan, said in a blog post. "This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."
Monday's outage also impacted the tools that Facebook employees use. Facebook said it hasn't found any evidence that user data was compromised during the outage.
In a more detailed post published Tuesday, Janardhan said there was a "bug" in a tool meant to prevent mistakes like what triggered the outage from happening. Facebook encountered multiple problems, including getting access to its data centers and domain name system servers, which had become unreachable. Referred to as the phone book of the internet, DNS translates domain names like Facebook.com to numeric Internet Protocol addresses. "The total loss of DNS broke many of the internal tools we'd normally use to investigate and resolve outages like this," Janardhan said.
Facebook also had to carefully manage how quickly it brought its services back online because a sudden surge in traffic could cause a new round of crashes. "Every failure like this is an opportunity to learn and get better, and there's plenty for us to learn from this one," Janardhan said. The company is extensively reviewing what happened.
The rare outage, which also impacted other apps owned by Facebook such as Instagram, WhatsApp and Facebook Messenger, showcased how dependent people and businesses are on social media even as the company faces more scrutiny from lawmakers and regulators. The Wall Street Journal recently published a series of stories detailing how Facebook knew about the platform's problems, including its harmful impact on the mental health of teenagers.
Former Facebook product manager Frances Haugen, the whistleblower who gathered the internal documents used by the Journal, testified before Congress on Tuesday.
Monday's outage was reminiscent of other times Facebook's services went offline. For instance, Facebook experienced an outage in 2019 that lasted more than 14 hours, which the social network said was the result of a "server configuration change."
Read also: Best memes and jokes about the big Facebook outage