Amazon cloud storage outage stalls internet
Amazon Web Services’ main storage system goes down for 4 hours
Four-hour hiccup disrupts business
It didn’t quite break the Internet, but a fourhour outage at Amazon’s AWS cloud-computing division caused headaches for hundreds of thousands of websites across the United States on Tuesday.
Little-known to consumers familiar with Amazon’s online shopping site, Amazon Web Services is a giant provider of the back-end of the Internet. For sites such as Netflix, Spotify, Pinterest and BuzzFeed, as well as tens of thousands of smaller sites, it provides cloud-based storage and Web services for companies so they don’t have to build their own server farms, allowing them to rapidly deploy computing power without having to invest in infrastructure.
For example, a business might store its video or images or databases on an AWS server and access it via the Internet.
While not all were affected by the outage at one of AWS’ main storage systems, some experienced slowdowns after a big por- tion of Amazon Web Services’ Amazon S3 system went offline Tuesday afternoon.
The service is used by 148,213 sites according to market research firm SimilarTech. The outage appeared to have begun around 12:35 p.m. ET, according to Catchpoint Systems, a digital experience monitoring company. Operations were fully recovered by 4:49 p.m. ET, Amazon said. The Seattle-based company did not comment on the cause of the outage.
The system that went down was the first of what now are three AWS regions in the U.S.
It is still the largest and is also where AWS rolls out new features, “so it’s disproportionately big,” said Lydia Leong, a cloud analyst with Gartner.
AWS began as a profitable sideline to Amazon’s main online sales business but has since
grown to become the major player in the arena as well as a major money-maker in its own right for Amazon.
In the fourth quarter of 2016, the division accounted for 8% of Amazon’s total revenue.
“This is a pretty big outage,” said Dave Bartoletti, a cloud analyst with Forrester. “AWS had not had a lot of outages, and when they happen, they’re famous. People still talk about the one in September of 2015 that lasted five hours,” he said.
S3 has “north of 3 to 4 trillion pieces of data stored in it,” Bartoletti said.
AWS S3 is used by businesses both large and small. “More than anything else, S3 customers need to be able to get at their data, because often S3 is used to store images. So no S3, no nice picture or fancy logo on your website,” Leong said.
That was exactly the problem faced by Lewis Bamboo, a small, family-owned bamboo nursery in Oakman, Ala. “As our business is in bamboo plants, pictures are a very important part of selling our product online. We use Amazon S3 to store and distribute our website images. When Amazon’s servers went down, so did the majority of our website,” said the company’s chief technology officer Daniel Mullaly.
“Thankfully we also store the images locally, and I was able to serve the images directly from our server instead,” he said.
The effects of the outage varied depending on the site and how it used AWS. Modern websites usually pull data from multiple databases in the cloud that can be stored all over the world, so a photo might come from one place, a price list from another and a customer database from a third.
For that reason, entire websites rarely go down, but various parts of them may take a long time to load or not load at all, leaving broken links or images.
Companies have been steadily moving storage to the cloud be- cause it is cheaper, easily accessible and more resilient. But the downside is that when there are problems, there’s a cascade effect.
“There are lots of people having a not very good day at the moment,” Leong said on Tuesday afternoon.
It’s possible to contract with multiple companies to avoid potential problems, but it’s pricey, so many companies make peace with the knowledge that on rare occasions they’re going to have that very bad day.
“Only the most paranoid, and very large companies, distribute their files across not just AWS but also Microsoft and Google, and replicate them geographically across regions — but that’s very, very expensive,” she said.
Amazon wasn’t able to update its own service health dashboard for the first two hours of the outage because the dashboard itself was hosted on AWS.
The most common causes of this type of outage are software related, Leung said. “Either a bug in the code or human error.”