Think about the last time you had a great digital experience. Chances are you cannot remember it. Not because it was forgettable, but because it worked. The page loaded. You were already logged in. The portal recognized you. The transaction went through. You moved on with your day.
That invisibility is the goal. And it is much harder to achieve than it looks.
Most conversations about digital experience focus on the front end: the interface design, the colors, the navigation, the content. Those things matter. But the experiences that actually build trust and loyalty are built on the infrastructure underneath: uptime, identity, APIs, performance, secure access, and the governance that holds all of it together. When those systems work, nobody notices. When they break, everyone does.
When the Foundation Fails, Everything Fails
In October 2021, Facebook went down for nearly six hours. Not because someone hacked it. Not because of a cyberattack. A routine maintenance command accidentally severed the BGP routing connections that allowed Facebook’s data centers to communicate with the internet. DNS records became unreachable. Facebook, Instagram, and WhatsApp disappeared simultaneously for more than three billion users.
The business impact was immediate. An estimated $60 million in advertising revenue was lost in a single day. The stock dropped roughly 5%. Internal tools that relied on Facebook’s own infrastructure went dark, which meant engineers could not even log in remotely to fix the problem.
The front end was fine. The design team had not changed anything. The content was all there. But without routing, DNS, and identity infrastructure working correctly, none of it was accessible to anyone.
SSO Failure Is Not an Authentication Problem. It Is a Customer Experience Problem.
In November 2022, Okta experienced a worldwide outage affecting customers using Microsoft 365 Single Sign-On. Users across the United States, EMEA, and Japan were unable to log in through federated SSO. The issue sat entirely in the identity layer, a place most users never think about, until it stops working.
From the user’s perspective, the product was broken. Their tools were inaccessible. Their workday stalled. It did not matter that the application itself was healthy. If the door does not open, the room might as well not exist.
This is why identity infrastructure deserves the same operational rigor as the platforms it protects. SSO is not a security checkbox. It is the invisible handshake that makes every other experience possible. When it is well-built, users never encounter it. When it degrades, every system behind it becomes unreachable.
One Network, One Country, One Failure Point
On July 8, 2022, Rogers Communications, one of Canada’s three dominant telecom providers, suffered a network outage that lasted more than 15 hours and affected approximately 12 million people, roughly one third of the country’s population.
The cause was a routine maintenance update that went wrong in Rogers’ core network. The consequences were anything but routine. Interac, Canada’s national debit payment network, went offline, leaving businesses unable to accept card payments and consumers unable to access funds. ATMs stopped working. Emergency 911 services were disrupted in major cities including Toronto, Ottawa, and Winnipeg, with at least one reported death linked to the inability to reach emergency services by mobile phone. Government agencies including Service Canada and the Canada Revenue Agency were knocked offline. Courts adjourned. Hospitals asked on-call staff to come in physically because internal systems were unreachable.
Small business owners turned away customers who had no cash. A plant shop owner in Toronto described it simply: his business stopped entirely because most customers no longer carry cash.
Rogers had experienced a similar outage just 15 months earlier. The pattern is the same one that appears across industries: infrastructure that is invisible when working becomes the entire customer experience when it fails. And when a single provider underlies payment systems, emergency services, and government access simultaneously, a maintenance error becomes a national crisis.
The AWS us-east-1 Lesson: Invisible Dependencies Have Visible Consequences
On June 13, 2023, an outage in AWS’s us-east-1 region disrupted services for major organizations including the Boston Globe, the New York MTA, and the Associated Press. Amazon Connect, AWS’s contact center platform, was among the hardest-hit services. Callers could not connect. Chat sessions failed. Agents had login issues.
For any organization running customer-facing contact center operations on that infrastructure, the digital experience did not just degrade. It stopped. Members and customers calling for help could not reach anyone. The service desk was down at the exact moment people needed it most.
The pattern repeats across industries and platforms: organizations build rich customer-facing experiences on top of shared cloud infrastructure, without fully accounting for what happens when a single region goes down and takes SSO, APIs, and contact routing with it.
A Faulty Update, 8.5 Million Blue Screens, and $10 Billion in Losses
On July 19, 2024, CrowdStrike pushed a content configuration update to its Falcon security sensor. The update contained a logic error: the sensor expected 20 input fields and received 21. The result was an out-of-bounds memory read that crashed Windows systems into an unrecoverable boot loop. Roughly 8.5 million devices went down globally in what has been described as the largest IT outage in history.
The breadth of impact was staggering. More than 5,000 flights were canceled worldwide. Delta alone canceled over 7,000 flights across five days and reported $550 million in combined revenue loss and expenses. Banks including Chase, Bank of America, Wells Fargo, and Capital One reported disruptions. Hospitals were affected. Emergency services in multiple cities were degraded. The estimated total financial damage to Fortune 500 companies exceeded $10 billion.
None of it was caused by a hacker. None of it was a cyberattack. It was a single misconfigured update pushed automatically to systems that had granted a security vendor deep kernel-level access, because that is what security software requires to function.
The users affected did not know who CrowdStrike was. They knew their computer showed a blue screen. They knew their flight was canceled. They knew the hospital system was down. The invisible layer had become the only thing anyone could see.
What This Means for How We Build
The thread running through all five of these examples is the same. The front-end experience is only as good as the infrastructure underneath it.
Reliability is a member experience feature. Uptime is a trust investment. Identity infrastructure is a customer satisfaction lever. API performance is brand perception in disguise. Update governance is a public safety issue.
The organizations that get this right do not just build better-looking portals. They treat performance engineering, observability, access governance, release readiness, and change control as first-class priorities with executive visibility, defined KPIs, and dedicated ownership. They monitor what users cannot see. They measure what users cannot feel until it breaks.
The best digital experiences leave no trace. No loading spinner. No authentication prompt. No error message. No hold music. No blue screen.
Just the thing the user came to do, done.
That invisibility takes an enormous amount of deliberate, unglamorous, technically disciplined work to achieve. And it is the most important work any technology platform team can do.
