Cyber Intelligence report – Insight into the New Zealand threat landscape and associated risks for business leaders → 

Global IT outage – next steps

Cyber Security Strategy

Published by CyberCX on 24 July 2024

 

Last Friday, 19 July 2024, what began as a ripple of seemingly independent issues soon became a global crisis as organisations across every sector scrambled to come to terms with history’s biggest ever IT outage.

What we now know is that the outage was caused by a software update to Windows systems for CrowdStrike’s Endpoint Detection and Response (EDR) platform, Falcon Sensor. What remains unknown is the total economic and human impact of the outage, which saw services shut down across banking, logistics and transport, groceries and retail, media, health, education and more, as some 8.5 million devices running Windows experienced a critical error.

Many people reading this will have experienced some form of disruption, whether on a personal or professional level. For anyone working in a cyber, IT or other technical role, there is a good chance you lost some, if not all, of your weekend to the cause of helping your organisation recover.

And while the crisis phase of this outage is now over, we know many recovery efforts are ongoing.

Organisations across the economy – regardless of whether they were directly impacted or not – are this week reflecting on what can be learned from this outage to improve readiness, resilience and recovery before the next tech-related disruption, which is, unfortunately, inevitable.

Since Friday, the team at CyberCX has fielded many questions from customers, partners, media, and stakeholders across government and industry. The most burning questions have been practical in nature, and have been answered in real-time by technical teams solving problems on the fly, as well as by CrowdStrike themselves.

As we transition to recovery mode, the questions that will burn slower, but longer, are also beginning to emerge. To assist organisations grappling with these questions, CyberCX has outlined some signposts for you to follow along the way.

 

The following three headlines and talking points may prove useful for briefing non-technical executives and Audit and Risk Committees.

 

How can our organisation be better prepared for the next outage?

  • Your staff, customers and stakeholders need to be assured you can operate effectively through an outage. Ensure your business continuity plans (BCPs) are up-to-date and fit for purpose. If they do not account for large-scale IT outages, it’s time for a refresh.
  • Your organisation’s ability to identify, triage and respond to technology-related crises is a crucial capability that should be nurtured, strengthened and tested through regular simulation exercises.
  • The ability to communicate effectively with internal and external stakeholders is critical during a time of disruption and crisis. Make sure your organisation has established processes, plans and playbooks for communications during an outage – including out-of-band options if core systems are offline.

 

How do we build resilience against future outages?

  • Understand your organisation’s vulnerabilities and where risks reside. Investment in cyber security and digital technology should be made with a clear-eyed view of what decisions are being made around redundancies and concentration risk.
  • While every new piece of technology adopted by your organisation brings benefits, it can add complexity to your environment. Ensure you maintain an updated understanding of your architecture and how different systems connect and interact.
  • As this event demonstrated, disruptions caused by human error can quickly be exploited by malicious actors. In times of crisis, it’s important your organisation maintains vigilance against scammers and other threat actors.
  • Use intelligence to create an early-warning mechanism for future cyber crises. Accidents are hard to foresee, but not all major outages or incidents start with a mistake. Ensure you have access to timely, contextualised intelligence to detect future largescale incidents, from outages to mass vulnerability exploitations.

 

How do we bounce back better?

  • One of the best ways your organisation will be able to rapidly recover from the next outage is by mapping your third-party supply chain. This should extend to identifying third-party service providers to the business, as well as understanding IT and software exposure.
  • Once you have identified third parties, it’s important to assess how your organisation and each of your third parties will respond to different crisis scenarios. Ensure you consider both contractual expectations and practical realities.
  • Ensure you have clear processes for learning from incidents and ‘near misses’. After the crisis phase has passed, conduct a post-incident review (PIR). Document what worked, what didn’t, and what you would do differently next time to strengthen your organisation’s crisis recovery muscle.
  • Decisions made during an outage – or any crisis – should be subject to robust review processes to avoid knee-jerk reactions that undermine strategic objectives for the business. Rapidly replacing the software or technology at fault could have complex unintended consequences for your environment. Simply switching off anything you rely on for security could make a bad situation far worse.

 


 

While the source of this global IT outage was common to all, the road to recovery will be different for every organisation. We hope that the above considerations can help the conversations happening within your business in the wake of the outage.

For anyone working in IT or cyber security, we know this won’t be the last – nor likely the biggest or worst – IT outage we will see. If there is anything our team at CyberCX can assist you with to help improve your readiness, response or recovery capability, we stand ready to do so.

Contact CyberCX

Our passion and mission, as always, is to secure our communities.

Ready to get started?

Find out how CyberCX can help your organisation manage risk, respond to incidents and build cyber resilience.