Beyond box ticking: why operational resilience is a live risk

Natalie Hamilton

1 month ago

By John Burns, Advisory Board member, My Compliance Centre

From Consumer Duty board reports to confirmation statements, there has been a view in the past that many aspects of financial services compliance can be taken care of via an annual return. The Financial Conduct Authority (FCA) is definitely not of this view, particularly in the case of operational resilience (OpRes), which should be treated as a rolling obligation.

Unfortunately, there is still a tendency in the industry to adopt the former approach, viewing OpRes through the lens of REP017. Firms complete this FCA return, mark it as “done” and quickly get back to business as usual.

But this tick-box mentality is incredibly risky because it treats resilience as a static task rather than a live risk.

The “single trench” failure

A key part of mitigating live risks is understanding your dependencies. This goes beyond merely knowing who your vendors are – it requires comprehensive mapping of physical and digital infrastructure to eliminate blind spots.

Take this genuine example of a bank, which religiously mapped all its redundancy lines back in the days when we relied on landlines.

The business had two separate cable exits leaving the building to ensure continuity. But a mile down the road, both cables were routed into the same trench. When a workman’s digger accidentally cut through that trench, the bank’s redundancy lines literally severed.

In the modern payments landscape, our “trenches” are digital. The concentration risk around providers like AWS or Cloudflare is massive. And while a small payments firm will struggle to renegotiate an AWS contract, they must understand exactly what happens if that service goes dark, which is not unheard of.

An analysis of Censuswide data showed that UK businesses experienced 50 million hours of disruptive internet downtime in 2023 at an estimated cost of £3.7 billion.

The following year, the infamous CrowdStrike incident affecting Microsoft services caused huge challenges to tech-dependent firms, with the UK economy taking a hit in the region of £2 billion.

Then, in October 2025, an AWS outage wreaked havoc for millions of UK consumers, with a number of well-known banking apps affected.

The relative frequency of this disruption explains why the FCA is clear on this issue. Companies in scope of the Payment Services Regulations 2017 or Electronic Money Regulations 2011 “must make sure they can deliver important business services in severe but plausible scenarios, like the CrowdStrike outage, to help minimise the impact on consumers and markets”.

Testing: severe but plausible

Testing this resilience must go beyond assuming everything goes right. You must perform simulations for “severe but plausible” scenarios of when things go wrong.

For some SMEs, it simply may not be feasible to physically pull the plug on a server. However, a rigorous desktop exercise, which involves getting the key decision-makers in a room and walking through a disaster scenario step-by-step, is mandatory.

This should follow a clear structure:

The trigger: Is it a cyber-attack, a liquidity freeze, or a critical vendor failure?
The reaction: Who makes the decision? What data do we have or need?
The comms: What do we need to tell our stakeholders? (This is often where businesses fall down).

This must be followed by a “lessons learned” exercise and updating of policies and procedures as necessary. Importantly, the testing and the “lessons learned” exercise must be recorded and retained. The FCA can demand to see a firm’s testing records for the past six years. While they are unlikely to do so in the normal course of events, if a firm has a failure, this is one of the first things the regulator will be asking to see. As always, if it’s not written down, it didn’t happen.

The two-way communication loop

Communication failures tend to be driven by firms only planning how to broadcast to customers (“We are having issues”) and not thinking about how to receive information.

To determine your unacceptable level of harm – a key regulatory metric – you need a mechanism to understand how an outage is affecting your users in real-time.

If people aren’t getting their salaries or settlements, the impact is immediate and in the world of social media, the story will be everywhere in minutes. Payments are the plumbing of financial services, and when that pipework blocks, the mess spreads immediately.

My advice is simple: don’t wait for the REP017 deadline to think about OpRes. Live and breathe it every day. Because if your external environment changes (for instance you get a new vendor, new tech or new staff), your risk assessment must change with it – and fast.

£3.7bn: the cost of internet failures to UK businesses

CrowdStrike outage: lessons for operational resilience

AWS outage reveals vulnerability in cloud-based online banking platforms, says GlobalData

CrowdStrike IT outage could cost UK economy up to £2.3bn: Kovrr