April 30, 2018
Business Continuity Plan Checklist
In today’s technology-driven world, most businesses run on complex infrastructure such as web frontends, databases, and hybrid networks. They are often stretched between on-premises infrastructure and public cloud providers. With all the “moving parts” of the infrastructure driving 24×7 e-commerce and other businesses, there is always the potential for downtime, whether as a result of human error, malicious attacks, or natural disasters. Businesses must have a plan in place that evaluates the impacts of such events and outlines a response. A solid business continuity plan allows organizations to anticipate any potential disasters that would affect business operations and determine the best course of action. Let’s take a closer look at business continuity plans and what should be covered in them.
What Is a Business Continuity Plan and Why Do You Need One?
A business continuity plan is often confused with a disaster recovery plan. A business continuity plan has a broader scope, determining how the business can continue delivering products and services during unplanned disruptions. A disaster recovery plan is more focused on IT, and encompasses all the processes involved to bring all business-critical systems back online as quickly as possible after an unplanned outage. When day-to-day operations are affected, the bottom line is that the disruptions cost the business money. Organizations must have a business continuity plan that details a strategy to mitigate and proactively minimize the impact on business operations while keeping production going. A disaster recovery plan is part of that.
Business continuity plans are extremely important. Organizations that have not planned for the unexpected, in the best of circumstances, can find themselves scrambling to maintain or continue business while struggling with recovery of data or business systems. In the worst case, the impact could be severe enough that the business stands no chance of recovering.
Business Continuity Plan Checklist
How do you begin to formulate a business continuity plan? While the task may seem daunting, a checklist can help you quickly formulate a framework of priorities. Building on such framework, you can create a business continuity plan that is tailored to your business. This can help you proactively plan for maintaining business operations through a crisis.
- Identify key contacts and team members
- Identify key business services potentially affected by disasters
- Perform a risk assessment and impact analysis
- Develop a recovery and/or contingency plan for specific business services
- Determine recovery time objectives (RTOs) and recovery point objectives (RPOs)
- Ensure business-critical data is protected
- Designate a disaster recovery (DR) site for network/data failover
- Test your business continuity plan and improve weaknesses found
Let’s look at each of these checklist items in a bit more detail and see why they are important aspects to consider in your business continuity plan.
1. Identify Key Contacts and Team Members
Identifying contacts and team members who can play key roles in keeping the business running is an important first step in formulating a business continuity plan. Business continuity planning needs to happen from the top down in an organization. A business continuity team leader should be designated to spearhead the effort of business continuity planning. Generally speaking, the business continuity planning team should include members from each department involved with normal business operations. Senior team members are usually the most knowledgeable in terms of understanding how disaster events could affect business operations.
An additional consideration to make when identifying key business continuity plan personnel is determining the types of disaster events that may impact business operations. These events could include natural disasters, severe weather events, building security threats, data loss due to IT system failure, computer viruses, hacking, power outages, facilities damage (from weather, fire, or other events), theft or vandalism, etc. Different staff members might be better suited for responding to different types of disasters. Work with HR to identify the strengths of your employees based on their backgrounds and expertise.
When assigning responsibilities to different parties in your business continuity plan, make sure you have at least one alternate for every role. Both the primary delegate and their alternate should be well-versed in how to execute their jobs.
2. Identify Key Business Services Potentially Affected by Disasters
To formulate an effective business continuity plan, every process, procedure, system, and external resource must be thoroughly understood. Without an in-depth understanding of the services or infrastructure needed for delivering your organization’s products or services, a disaster is sure to interrupt business continuity. The business continuity plan must identify these key elements of infrastructure or services that are absolutely critical before you can develop a contingency plan accordingly.
Key services and infrastructure would most likely include:
- Power systems
- Telecommunications connectivity – WAN, LAN, telephone
- Physical building infrastructure
- IT systems – web servers, applications servers, database servers
Determine the physical, logical, and other resources absolutely necessary to do business. Only then can the business continuity plan properly address backup infrastructure, secondary systems, and levels of service during a disaster.
3. Perform a Risk Assessment and an Impact Analysis
Once key business services that could be affected by disaster are identified, organizations must perform a risk assessment. A risk assessment identifies vulnerabilities associated with business-critical systems, activities, and supporting resources. An impact analysis, often performed alongside, evaluates the effects on the business if those risks materialize into an actual disruption.
The risk assessment must consider both probability and criticality of each risk. The probability represents the likelihood of the disaster event occurring, while the criticality is the severity of the impact on business operations. The impact analysis would then identify business processes affected and determine the tolerance level for each process being degraded, disrupted, or completely unavailable.
For instance, a site is located on the coast might be more prone to business continuity interruptions caused by hurricanes or flooding. These could be identified as risks. Then, considerations would be made to determine the impact on business if the site was indeed hit with a hurricane or flood, and what degree of such impact could be tolerated.
Organizations with multiple sites must absolutely perform a risk analysis for each location. The challenges and complexities of disaster events are likely unique to each site, especially if they are geographically distant. The relationships and dependencies between locations should also be considered; what happens if an entire site is lost? All of these factors must be considered in a risk assessment and impact analysis to formulate an accurate business continuity plan that accounts for all possibilities.
When performing the impact analysis, you must have an understanding of the costs to your organization. This includes how business is affected when services and processes are degraded, disrupted, or completely unavailable. What is the fiscal impact for business processes or services being interrupted for minutes, days, or even weeks?
4. Develop a Recovery and/or Contingency Plan for Business Services
After the core elements required for business operations have been identified (step 2), and a risk analysis has determined the most likely risks to those core services (step 3), you should develop a contingency plan. A contingency plan outlines arrangements for recovering and continuing those core services in the event of a disaster. Your contingency plan may leverage the following strategies, among others:
- Alternate business procedures – e.g., manual workarounds for mechanized or automated processes until the systems are back up and running
- A secondary or alternate site to resume business operations
- Site-level network and server failover
- Recovery of off-site backups of business-critical data
- “Hot-spare” or standby resources, which can be put into service immediately when the primary components fail
Recovery and contingency planning are often collectively referred to as a continuity of operations plan (COOP). Your COOP covers the resources, actions, procedures, and information needed in the event of a major disruption of business operations.
5. Determine Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO)
When businesses consider IT systems in their business continuity plans, there are two important metrics for restoring service to systems: Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
A Recovery Time Objective or RTO determines how much IT system downtime a business can reasonably tolerate before business processes or services are restored.
A Recovery Point Objective or RPO defines how much data loss a business can tolerate. Like RTOs, RPOs are measured in units of time (minutes, hours, days, or weeks). For example, perhaps your company can afford to lose a day’s worth of research data without incurring too much damage, but only ten minutes’ worth of online transaction logs.
For more information on RTO, consult our white paper.
6. Ensure Business-Critical Data Is Protected
Disaster impact can be significantly mitigated by properly protecting your business-critical data. The best practice to ensure resilience for backups is the “3-2-1 rule”: make sure you have a minimum of 3 backups across 2 different kinds of media, with at least 1 copy stored offsite. You would not want the same disaster that affected your production data to take out the backup data as well!
Choose a data protection solution that enables you to meet the RTOs and RPOs set in your business continuity plan.
NAKIVO Backup & Replication allows organizations with virtual infrastructure to follow the “3-2-1” backup methodology, as well as helping them achieve even the shortest RPOs and RTOs they set. NAKIVO Backup & Replication provides important data protection functionality:
- Image-based backup
- VM-level replication
- Backup copy jobs (for off-site DR and public cloud targets)
- Application-aware backup for transactional consistency of databases
By relying on a proven, fully-featured solution, organizations can rest assured that business continuity from a data protection standpoint is guaranteed in the event of disruption.
7. Designate a DR Site for Network/Data Failover
With much of today’s business infrastructure relying on IT infrastructure, organizations must consider how a potential disaster would affect communications and data access. If core network and data resources, including virtual machines, are located at a primary production site, what happens if that site goes offline? Designating a disaster recovery (DR) site for network/data failover is crucial to ensuring RPO and RTO are met as set in the business continuity plan.
By designating a DR facility (often located in a different geographic region), businesses can keep a “warm-standby” copy of resources such as virtual machines safely outside of the production site environment. In the event of a site-wide failure bringing down production resources such as the core network and virtual machines, traffic can be failed over to the DR location. The “warm-standby” VMs essentially become production workloads, allowing business operations to be brought back online quickly and efficiently.
With NAKIVO Backup & Replication businesses can effectively replicate current production VMs to an offsite DR location. The replica VM is an exact copy of the original VM used as a warm standby for an automated Failover job. Businesses can set the replication interval to align with the RPO set (step 5). This is part of an effective contingency plan for site-wide failure affecting production workloads.
8. Test Your Business Continuity Plan and Improve on Weaknesses Found
Once the business continuity plan has been created, regular, rigorous testing is crucial. This means not only conducting “table top” walkthroughs of the plan, but also staging full simulations. You should carry out all processes, procedures, and secondary systems to mimic the flow of how the business continuity plan would be carried out in real disaster circumstances. These types of tests are best carried out quarterly, for several reasons. The key team members should be familiar with the process and able to perform their parts under pressure without confusion. Furthermore, changes to your infrastructure, environment, protocols, workloads, and/or workforce can introduce complications in the plan. These potential hitches are often only discovered in the course of full run-throughs.
The simulations should be watched by an independent observer, who can take notes on inadequacies and weaknesses. There should be debriefings after each run-through, after which reports can be drafted. In the reports, weaknesses and issues should be documented, as well as proposed updates to the plan. The reports, as well as the plan (updated accordingly) should be circulated to all team members.
Business continuity planning is essential to ensure business services can continue and/or be recovered if core business functions are degraded, disrupted, or rendered completely unavailable. Organizations that fail to create a business continuity plan (including risk assessment, impact analysis, disaster recovery, and contingency planning) could suffer major downtime, data loss, or other impairments. Those effects could cause businesses to incur losses of income, customer confidence, and reputation. With this checklist, businesses can build the framework for an effective business continuity plan that makes them resilient to any disaster, whether natural or manmade. If you don’t have a comprehensive business continuity plan for your business yet, you should make this a top priority. Disaster can strike at any time – and often does when least expected. By planning, preparing, testing, and reinforcing the business continuity plans, your business can withstand even the worst crises.
NAKIVO Backup & Replication offers a host of features to help you with disaster recovery for virtualized environments, including several of those mentioned in this article. Test the user-friendly product in your own VMware, Hyper-V, or AWS environment for free with the full-featured Free Trial, and see how quickly you can recover your data.