Arkansas Democrat-Gazette

Facebook outage kicked off by ‘error of our own making’

- KATE CONGER Informatio­n for this article was contribute­d by Kurt Wagner of Bloomberg News (TNS).

A cascade of mistakes made during maintenanc­e on Facebook’s network caused the outage that took its services offline Monday, the company said in a blog post published Tuesday.

Facebook’s family of apps, which includes Instagram, WhatsApp and Messenger, were offline for more than five hours as employees scrambled to repair the damage. More than 3.5 billion people around the world use Facebook’s services.

The company said it found no evidence that user data was compromise­d during the downtime.

The initial problem occurred in a network Facebook calls its backbone, which connects its data centers around the world, Santosh Janardhan, a vice president of infrastruc­ture at Facebook, wrote in the blog post.

During maintenanc­e of the network, a command was issued to assess how much capacity was available. But the command backfired, disconnect­ing the network and blocking Facebook’s data centers from communicat­ing, Janardhan said.

But it was just the beginning of the problems. “This change caused a complete disconnect­ion of our server connection­s between our data centers and the internet,” Janardhan wrote. “And that total loss of connection caused a second issue that made things worse.”

With Facebook’s data centers offline, the company’s servers that manage its internet addresses were also unavailabl­e. “This made it impossible for the rest of the internet to find our servers,” Janardhan said.

As the scope of the outage became clear, Facebook engineers struggled to restore access because its data centers are heavily protected and the employees could not gain immediate entry, the company said.

Facebook’s internal tools and communicat­ions systems were also affected by the disruption, adding to the challenge for engineers working to identify and resolve the issue. Its internal work product, Workplace, was also affected.

“To every small and large business, family, and individual who depends on us, I’m sorry,” Chief Technology Office Mike Schroepfer tweeted Monday afternoon. His apology was reiterated by Facebook’s engineerin­g blog.

“We’ve done extensive work hardening our systems to prevent unauthoriz­ed access, and it was interestin­g to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity but an error of our own making,” Janardhan wrote.

Once the engineers were inside Facebook’s data centers and began to work, they were able to restore the network.

Newspapers in English

Newspapers from United States