Chicago Sun-Times

Ugh! Typo to blame for server blackout

Wrong coding caused 4- hour outage at AWS

- Elizabeth Weise

The major SAN FRANCISCO outage that hit tens of thousands of websites using Amazon’s AWS cloud- computing service on Tuesday was the result of a simple typo — just one incorrectl­y entered command.

The four- hour outage at Amazon Web Services’ S3 system, a giant provider of back- end services for nearly 150,000 websites, caused disruption­s, slowdowns and failure- to- load errors across the USA.

Amazon’s Simple Storage Service ( S3) lets companies use the cloud to store files, photos, video and other informatio­n they serve up on their website. It contains trillions of these items, known as “objects” to

programmer­s.

When the system was down, websites could not access the photos, logos, lists or data they normally would have pulled from the cloud.

While most of the sites didn’t go down, many had broken links and were only partly functional.

Thursday, Amazon published a public letter outlining what happened.

Tuesday morning, an Amazon team was investigat­ing a problem that was slowing down the S3 billing system.

At 9: 37 a. m. PT, a team member executed a command that was meant to take a few of the S3 servers offline.

“Unfortunat­ely,” Amazon said in its posting, one part of that command was entered incorrectl­y — it had a typo.

That mistake caused a larger number of servers to be taken offline than they’d wanted.

To get it back, both systems required a full restart, which takes longer than simply rebooting your laptop.

While Amazon says it designed its system to work even if big parts failed, it also acknowledg­ed that it hadn’t actually done a full restart on the main subsystems that went offline “for many years.”

 ?? AMAZON ?? AWS, or Amazon Web Services, is the cloud- computing wing of Amazon.
AMAZON AWS, or Amazon Web Services, is the cloud- computing wing of Amazon.

Newspapers in English

Newspapers from United States