CloudFlare security service goes down after router failure
The hour-long outage occurred when the Web security service detected a DDoS attack against one of its customers and tried to defend against it.
Web security service CloudFlare was offline for about an hour this morning due to a systemwide failure of its edge routers.
The outage, which began around 1:47 a.m. PT, removed the security layer for 785,000 Web sites, including 4chan and Wikileaks, according to TechCrunch. CloudFlare said the outage occurred while it was trying to defend one of its customers from a distributed denial-of-service attack.
The outage affected Juniper routers running the Flowspec protocol, which allows customers to broadcast router rules to a large number of routers efficiently. CloudFlare uses the protocol to update the rules on routers to battle attacks and shift traffic.
CloudFlare co-founder and CEO Matthew Prince said in a company blog post today that it detected a DNS attack this morning when it identified attack packets between 99,971 and 99,985 bytes long, much larger than the 500-byte average and CloudFlare's 4,470-byte maximum packet size.
Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed.
While CloudFlare service was restored about an hour later, Prince said company is examining the cause of the failure and has contacted Juniper to learn whether this is a known bug. Prince also said customers would receive service credits.
Juniper said it was working with CloudFlare to determine the cause but believed it was related to an issue that was patched last October.
"Juniper Networks is aware of and investigating a reported network outage with one of our customers, CloudFlare," the company said in a statement. "While we have not completed our investigation, we believe this incident was triggered by a product issue that Juniper identified last October, when a patch was also made available."
Prince noted striking similarities between its outage and last year's.
"In CloudFlare's case the cause was not intentional or malicious, but the net effect was the same: a router change caused a network to go offline," Prince wrote.
Updated March 4 at 3 p.m. with statement from Juniper.