The AWS Amazon Web Services pavilion stands at a trade fair in Hanover, Germany Getty Images / Photo by Sean Gallup

Editor's note: The following text is a transcript of a podcast story. To listen to the story, click on the arrow beneath the headline above.
MYRNA BROWN, HOST: Up next, internet ripples.
Early yesterday, Amazon Web Services—or AWS—reported it had a problem…
MONTAGE: CNN: A massive outage. It's hit hundreds of popular websites and apps. Maybe you noticed this morning.
NDTV: And it's having a global meltdown.
REUTERS: Disrupting businesses and some of the world's most popular apps, including Snapchat and Reddit.
FOX13 SEATTLE: Including remote working devices, Roblox, Robin Hood, and Coinbase.
BBC: And some banks including Halifax, Lloyds, and the Bank of Scotland.
CNBC: It's also reflective of how large a share Amazon Web Services has and how many things it does power.
MARY REICHARD, HOST: Amazon said it identified the problem and worked quickly to fix it, but questions remain. Is the outage a blip…or the symptom of a larger problem? WORLD’s Paul Butler has more.
PAUL BUTLER: Millions of people around the world yesterday encountered at least one app or cloud-based service that didn’t work properly, from online shopping to sharing photos on social media. Amazon.com featured cute pictures of dogs and cats, saying they were sorry for the interruption. Other sites just wouldn’t load. What they all had in common is what’s under the hood: Amazon Web Services.
AWS provides on-demand and scalable computing power, data storage, and other digital publishing and management services. The company is the largest cloud provider in the world. So what happened?
MONTGOMERY: I think enough reports are coming out that lead me to believe it's not a foreign adversary.
Retired Rear Admiral Mark Montgomery is a cybersecurity expert and Senior Fellow with the Foundation for Defense of Democracies.
It appears that Amazon initiated a technical update that included an incorrect Domain Name System—or DNS. That’s a directory which translates human-friendly domain names—like wng.org—into numerical IP addresses. Computers use those numeric IDs to communicate with each other—sort of like a phonebook for the internet.
Monday’s error started a cascade of problems for AWS, particularly for the US-EAST-1 Region.
MONTGOMERY: There's a handful of locations around the United States, around the world through which most data flows through. Northern Virginia is a more famous one, probably the most dense, data flow area in the world.
The Virginia site previously suffered outages in 2020 and 2021. The problem this time appears to stem from an internal system that helps balance massive computing loads across the AWS network.
MONTGOMERY: We want Amazon to have systems that monitor efficiency, effectiveness, productivity. That's what you're paying for.
Montgomery compares the system to how traffic lights work.
MONTGOMERY: It ensures that if your light’s green, my light’s red, so that I don't proceed into this.
But if the signals glitch and get out of sync, accidents happen.
MONTGOMERY: The good news is no one was in an intersection getting hit by a car. The bad news is some companies lost, probably lost profits today.
Montgomery says this problem is similar to one last summer, when cybersecurity company CrowdStrike launched a faulty update.
MONTGOMERY: If you think back to CrowdStrike, CrowdStrike cost into the hundreds of millions of dollars with just one customer: Delta.
Montgomery suspects the costs this time will also mount quickly.
MONTGOMERY: It would not be unrealistic to expect that a company as large as AWS, with this long a period of disrupted services could end up in the billions of dollars.
Rating the seriousness of this outage on a scale of one to ten, Montgomery rates this one a seven.
MONTGOMERY: Because it's a lot of money, but it's a “1”, if I think long, long term, where this reminds us that we have to be more secure, more careful, more redundant in our development of systems.
Monday’s outage is a reminder that today’s cloud-based, interconnected global internet economy can be vulnerable. But Montgomery says now is not the time to over-react:
MONTGOMERY: This beautiful integration of networks and data and systems that the United States has undergone over the last 40 years has made us money and that, to me, is the driver. And the driver here is, is that you figure out why it went wrong, you try to make sure it doesn't happen again, and you move forward.
Reporting for WORLD, I’m Paul Butler…with additional reporting from Harrison Watters.
WORLD Radio transcripts are created on a rush deadline. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of WORLD Radio programming is the audio record.
Please wait while we load the latest comments...
Comments
Please register, subscribe, or log in to comment on this article.