The News-Times (Sunday)

Cloud-computing outage triggered by effort to boost system’s capacity

-

SEATTLE — The addition of new servers to Amazon’s dominant cloud-computing network triggered a cascading set of errors that took down large swaths of the web Wednesday, the company acknowledg­ed.

Amazon said in a lengthy and technical blog post Saturday morning that a massive computing network in Northern Virginia began to fail after “a relatively small addition of capacity” it started to make to the system just before 6 a.m. Eastern on Wednesday. But because of “an operating system configurat­ion,” the new capacity set off a series of errors that overwhelme­d Amazon’s network of servers.

Within a few hours, the malfunctio­ns began hitting customers of Amazon Web Services, the company’s cloud-computing unit. Customers of the Amazonowne­d Ring security camera service couldn’t log in or watch video. Users struggled to operate their iRobot vacuum cleaners because the outage affect the iRobot Home App. And media companies, including The Washington Post (privately owned by Amazon Chief Executive Jeff Bezos), experience­d publishing system outages.

Amazon acknowledg­ed that the system failure was exacerbate­d by the co-dependenci­es its various services have on one another. To resolve the issue, Amazon needed to restart a piece of its system it described as “many thousands of servers,” a lengthy process that had to be done gradually. But because other Amazon cloud services rely on Kinesis, including its Cognito authentica­tion offering, they failed as well.

And because Amazon uses Cognito itself to let customers know about the status of its cloud operations through its Service Health Dashboard website, it couldn’t immediatel­y update that site. The company has a backup method to update the site, but said “it is a more manual and less familiar tool for our support operators.”

An Amazon spokeswoma­n didn’t respond Saturday to a request for comment about the outage.

The failure of its service underscore­s a danger of only a handful of vendors managing global cloud computing. Amazon held 45 percent of the global market in 2019, according to the market research firm Gartner. In addition to Ring and iRobot, Amazon’s customers include Netflix, BP and CapitalOne, all of which run significan­t pieces of their computing operations on AWS.

Newspapers in English

Newspapers from United States