The Oklahoman

Facebook says outage caused by own error

Engineers scrambled to fix the problem on site

-

LONDON – The global outage that knocked Facebook and its other platforms offline for hours was caused by an error during routine maintenanc­e, the company said.

Santosh Janardhan, Facebook’s vice president of infrastruc­ture, said in a blog post that Facebook, Instagram and WhatsApp going dark was “caused not by malicious activity, but an error of our own making.”

The problem occurred as engineers were carrying out day to day work on Facebook’s global backbone network; the computers, routers and software in its data centers around the world along with the fiber-optic cables connecting them.

“During one of these routine maintenanc­e jobs, a command was issued with the intention to assess the availabili­ty of global backbone capacity, which unintentio­nally took down all the connection­s in our backbone network, effectively disconnect­ing Facebook data centers globally,” Janardhan said Tuesday.

Facebook’s systems are designed to catch such mistakes but in this case a bug in the audit tool prevented it from properly stopping the command, Janardhan said.

That change also triggered a second problem that made things worse by making it impossible to reach Facebook’s servers even though they were operationa­l.

Engineers scrambled to fix the problem on site, but this took time because of the extra layers of security, Janardhan said. The data centers are “hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them.”

Once connectivi­ty was restored, services were brought back gradually to avoid traffic surges that could cause more crashes.

It was an “unforeseen anomaly” for a faulty maintenanc­e update to take down Facebook’s backbone network, but the company probably could have avoided a scenario in which its servers were completely taken offline, making it impossible to access the tools needed to fix it, said Angelique Medina, of Cisco Systems’ ThousandEy­es, a firm that monitors internet outages.

 ?? ALFREDO ESTRELLA/AFP VIA GETTY IMAGES ?? Facebook and its Instagram and WhatsApp platforms were hit by a massive outage, impacting millions of people as users flocked to other networks to sound off.
ALFREDO ESTRELLA/AFP VIA GETTY IMAGES Facebook and its Instagram and WhatsApp platforms were hit by a massive outage, impacting millions of people as users flocked to other networks to sound off.

Newspapers in English

Newspapers from United States