Skip to main content

The 2024 CrowdStrike incident revealed critical flaws in IT resilience, highlighting the need for rigorous update testing, stronger security protocols, and redundancy measures to prevent future disruptions.

The CrowdStrike incident in July is a failure from a “trusted” partner that will happen again. This is one of the primary lessons from last year’s global breakdown. We must elevate this risk from a relied-on provider to prepare for the next one. In the CrowdStrike event, Microsoft crashed systems worldwide. And when they crashed, bad actors took advantage of the disaster by offering fixes, which were files with hidden and malicious agendas. 

Several major United States-based airlines were forced to suspend most of their operations. In Great Britain, the Royal Surrey Hospital had to suspend radiography treatment. The National Health Service in England reported disruptions in most doctors’ practices. Banks and financial companies all over the globe reported issues. The insurance company Allianz in Germany couldn’t log into its computers. One in 4 Fortune 500 companies experienced a service disruption and reportedly lost a combined $5.4 billion. The impact was astounding. 

Even remote work presented unique challenges when it came to fixes. Technicians sometimes made house calls to look at impacted laptops, and other remote workers had to be walked through complex repairs over the phone. With our digital interconnectedness, we need to prepare for when this happens again — because it will. 

Ahead, we look at some of the facets of preparedness, the checks and balances, and plans we need to implement so the world’s screens won’t go blue again. 

Incident Assessment: Three Must-Know Lessons

Don’t trust the trusted partner — A flawed update caused the CrowdStrike failure of 2024, which pushed to Windows operating systems worldwide. This update crashed an estimated 8.5 million critical machines and sparked a global IT outage. So, the first lesson to be gleaned is that we can no longer trust our trusted partners — we can’t simply push their updates through. We must treat cybersecurity updates like any other software update. This means applying greater resiliency at the end of the IT process and conducting more thorough and mandated testing before deployment. 

The CrowdStrike failure was due to a sensor configuration update to Windows systems. Commercial flights, hospital operations, financial services, and media broadcasts were all hit. The primary services affected were those running Falcon sensors for Windows Version 10 and above. This resulted in a system crash and a pandemic of blue screens. Such a titanic impact means preparing for that event to happen again is necessary! 

Beware the charlatan — The second facet to examine during the CrowdStrike incident was the appearance of the bad actor taking advantage of people in their time of need. Very quickly after this incident, a malicious file started making the rounds, claiming to be a quick fix to the problem — but the so-called “CrowdStrike hotfix” was simply malware that was reaching more people than usual as panic replaced sensible action. This lurking danger from opportunistic exploiters highlights the need for a clear protocol for incidents like CrowdStrike. It also calls for increased cybersecurity training for employees. You do not want to be behind the curve. 

CrowdStrike was a global event. Healthcare organizations worldwide suffered. Small to mid-sized businesses and local governments experienced shutdowns. Even in centralized offices, getting machines back up and running was manual and laborious because servers that would have run a fix and pushed it out were also down. So, we must go on with the lessons. 

Back up your backup — CrowdStrike made some organizations break the cardinal rules of basic security hygiene. Flash drives with the script to implement a fix were being put in envelopes and mailed out to those in need. So, another lesson learned is that it’s better to have multiple cybersecurity tools. Extra security insight can be helpful as a “checks and balances” to confirm an exploit or even pinpoint an utterly different attack surface not identified by other cybersecurity tools. 

Running multiple vulnerability checks on the same data set or database from different software programs is doing just what is necessary. The results often are always different. Cybersecurity tools are imperfect, so using multiple tools can help mitigate risks. A wealth of information can be gathered and focused onto a single-pane-of-glass dashboard so that a holistic analysis can lead to better decision-making. 

Enable an N+1 redundancy in security measures. The concept means an extra “+1” component for every “N” component necessary to ensure continued system operation. This means having a backup cybersecurity solution as effective as the primary one. It is important to note that the updates themselves could be possible risks, so waiting for one cycle of updates may be prudent to ensure that others discover similar bugs and not one’s own company. 

Living a little longer with a known vulnerability may be part of a practice that mitigates the potential disruption of a bad update. Do this on a case-by-case basis on what patches can be done in an N+1 timeframe, which is so critical that the risk must be taken to update them immediately. Each company should set its policies and decide on the right approach. 

Incident Response Plan 

The three lessons above lead us to reassess the incident response plan. All IT departments should review the effectiveness of their response to the CrowdStrike incident and ask questions such as: 

  • Were clear roles, responsibilities, and policies established?
  • Was the current plan executed as it was written up? If not, why?
  • When was the last time the communication plan was updated?
  • Were the proper individuals notified in the correct order?
  • Who is responsible for triage to assess severity and impact?
  • Who is monitoring the restored systems and documenting the effectiveness of all actions?
  • Is the incident response plan updated to properly account for the “trusted provider” scenario discussed here? 

Reassess and keep the plan current by performing tabletop exercises to help you discover weaknesses or outdated information. Tabletop assessments test an organization’s preparedness and response against realistic scenarios, so make them real. This includes conducting a “hotwash,” where an after-action review is performed to analyze strengths and weaknesses. 

The Preparation Imperative 

The CrowdStrike lesson learned means that organizations now know what to be on guard for. Firstly, beware the trusted partner! A failure from a “trusted” partner will happen again. To prepare, IT needs to treat cybersecurity updates the same way they do standard software patches. Reviewing any update notes, setting up a test environment and applying the update, getting the necessary approval for deployment, staggering the deployment, and informing all relevant stakeholders are the actions of the preparedness plan. 

We are now all interconnected. The lessons we take away from CrowdStrike will be necessary for executives and cybersecurity/IT professionals to reduce the impact of future incidents. Don’t trust software updates with impunity. Patch, update, test, train, backup, and create clear response plans. It’s additional work, but it’s all worth it.

Leave a Reply