What is Your IT Disaster Recovery Plan? Lessons from the CrowdStrike Crash

Disaster recovery plan

Written by Jackie Bilodeau

I am the Communications Director for CGNET, having returned to CGNET in 2018 after a 10-year stint in the 1990's. I enjoy hiking, music, dance, photography, writing and travel. Read more about my work at CGNET here.

July 25, 2024

It goes without saying that in today’s digital age, organizations rely heavily on their IT infrastructure to run effectively. However, unforeseen catastrophic events like natural disasters, cyberattacks and more can severely disrupt operations. If unprepared, the consequences can be devastating, including significant financial and reputational damage. This is where an IT disaster recovery plan (DRP) becomes crucial. A well-structured plan ensures that your organization can quickly recover from IT disruptions and continue its operations with minimal downtime.

A Wake-Up Call

Last week, we all witnessed just such a disaster when a faulty update from cybersecurity firm CrowdStrike caused a global outage. The update, intended to enhance security, instead led to the infamous “blue screen of death” and cyclic reboots of Windows systems, crippling millions of devices worldwide. The CrowdStrike crash disrupted critical services, including banking and healthcare networks, and even grounded entire fleets of airplanes across the globe. The recovery process has been tedious  – including for some of CGNET’s customers – requiring manual intervention to locate and delete the defective file and restore affected systems.

This incident clearly underscores the importance of having a robust IT disaster recovery plan in place.

What is an IT Disaster Recovery Plan?

An IT disaster recovery plan is a documented policy and set of procedures designed to help an organization recover its IT systems and data in the event of a disaster. This plan is a subset of the broader business continuity plan (BCP) and focuses specifically on restoring IT infrastructure and operations. Following is some guidance for setting up a recovery plan for your organization.

Steps to Creating an Effective DRP

  1. Obtain management buy-in: Ensure that top management will allocate the resources and time to develop and implement the DRP.
  2. Create a DRP team: Assemble a team responsible for overseeing the development and implementation of the DRP. Each member should have a specific role in the plan’s success.
  3. Conduct a Business Impact Analysis (BIA): Identify critical business functions and the IT resources required to support them.
  4. Develop recovery strategies: Create strategies to restore hardware, software, and data in time to meet the organization’s needs. This includes ensuring that all critical information is backed up and that copies of program software are available for reinstallation.
  5. Document the plan: Clearly document the DRP, including step-by-step procedures for recovering IT systems and data.
  6. Test the plan: Regularly test the DRP to ensure its effectiveness. Conduct drills and simulations to expose any weaknesses and make necessary adjustments.

Lessons learned

Beyond exposing the importance of having a DRP, the CrowdStrike event additionally highlighted several key lessons for organizations:

  1. Regular testing and validation: Ensure that all updates and patches are thoroughly tested before deployment.
  2. Backup and recovery: Maintain up-to-date backups of critical systems and data to allow for a quick recovery.
  3. Communication plan: Develop a communication plan to keep stakeholders informed during a disaster. Have a secondary way to communicate, such as by text, for when systems are down. Clear communication can help manage expectations and reduce panic.
  4. Stay updated: Whenever new information is released following an incident, make sure you revise your DPR accordingly.
  5. Encryption keys: Keep track of your hard drives’ encryption keys! Without those keys, data on the drives would be far more difficult – if not impossible – to access after a crash.
  6. Vendor management: Work closely with vendors to ensure they also have robust disaster recovery measures in place. Regularly review and update service level agreements (SLAs) to include disaster recovery provisions

By following the steps outlined above and learning from incidents like the CrowdStrike crash, organizations can better prepare for and recover from IT disruptions. Remember, the key to effective disaster recovery is proactive planning, regular testing, and continuous improvement. 

Written by Jackie Bilodeau

I am the Communications Director for CGNET, having returned to CGNET in 2018 after a 10-year stint in the 1990's. I enjoy hiking, music, dance, photography, writing and travel. Read more about my work at CGNET here.

You May Also Like…

You May Also Like…

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Translate »
Share This
Subscribe