September 7, 2018
Site Recovery with NAKIVO Backup & Replication Part 2: Preparing Your Infrastructure
In the previous blog post of our series on Site Recovery, you saw how to plan disaster recovery for your environment. This article explains how to prepare for site recovery, with a focus on VM replication. A discussion of the role of VM replication is followed by a complete walkthrough of replication job configuration with NAKIVO Backup & Replication.
What Is VM Replication?
Virtual machine replication is the process of creating an identical copy of a source VM (termed a “VM replica”) on a different host (the target host). The VM replica is a regular VM that remains in a powered-off state until it is needed (at which point it can be up and running on its host almost instantly). A VM replica doesn’t consume resources in its powered-off state, similarly to a backup. Unlike a backup, however, a replica is not compressed. A replica can thus be restored in much less time. NAKIVO Backup & Replication creates VM replicas with multiple recovery points that represent regular VM snapshots made during incremental replication jobs.
The Role of VM Replication for Site Recovery
One of the key actions for site recovery with NAKIVO Backup & Replication is VM failover. Failover is the action of switching from a failed production VM to a healthy VM replica that was created beforehand with a replication job. By using failover to replica, you can perform fast (almost immediate) recovery of the VM. If the networks to be used by the VMs at the target site differ from those at the source site (which is likely, in most cases), the Network Mapping and Re-IP features of NAKIVO Backup & Replication jobs can help you automate VM network configuration during VM failover.
VM Replication Best Practices
In order to provide reliable replication resulting in VM replicas that can be successfully recovered including all applications, you should know the details of the process. Let’s have a look how to create VM replicas in the best way.
VM Replication at the Host Level
Perform VM replication at the host level rather than the guest level. Legacy backup and replication solutions use agents installed on the guest OS (operating system) of each virtual machine. Agents consume computing resources, which significantly impacts performance. Furthermore, they must be installed on each VM individually, which is inconvenient and time-consuming for the administrator(s).
A guest OS running on a VM doesn’t interact directly with physical devices. The virtualization layer is the intermediate layer between the physical hardware and the guest OS. Replication should be performed on the level of the virtualization layer; this process is called host-level replication. Host-level VM replication is much more efficient. The guest OS is not aware of the replication process; the virtual machine data, including virtual disks and other VM files, is replicated directly from the datastore attached to the host.
Applications such as database servers and email servers interact with RAM (Random Access Memory) intensively. If a VM snapshot needed for replication were taken while these applications were running without any additional actions, then the effect would be similar to unexpected power loss and shutdown. Some data may be lost or corrupted. This is because some pending transactions were stored in RAM at the moment of the snapshot and were not written to the disk. These transactions would be lost. Database recovery can be a complex and time-consuming process.
Use application-aware replication to avoid this issue. When using application-aware methods, the applications are frozen (quiesced) and the memory is flushed. Data cannot be written to the disk before a snapshot is taken. Once the application-consistent snapshot is taken, a VM replica can be created. Such VM replica can be successfully restored with the applications therein running properly.
Determining Retention Settings
NAKIVO Backup & Replication’s retention settings allow you to store multiple recovery points for a defined period of time. For example, suppose you have configured a VM replication job to run once per day, and 10 recovery points are kept. When the replication job runs eleventh time, the oldest recovery point is deleted (the one from 10 days ago) and a new recovery point is created for today’s replication.
However, if you need to recover an older state of the virtual machine, then you need an older recovery point, and the simplistic retention policy described above would be insufficient. The Grandfather-Father-Son (GFS) retention policy is much more flexible and can help you in this case. The GFS retention policy would let you keep the last few recovery points (e.g. the past 5 daily replication jobs) as well as one recovery point from a week ago, a month ago, and a year ago.
Creating and Configuring a VM Replica to Be Ready for Failover
First, make sure that VMware Tools or Hyper-V Integration Services are installed on any VMs you want to fail over. These utility suites are needed to make the application-aware snapshots used for VM replication. NAKIVO Backup & Replication supports application-aware host-level replication for VMware VMs, Hyper-V VMs, and EC2 instances with special functionality for MS SQL Server, MS Exchange Server, and Active Directory Domain Controller. Let’s consider how to create and configure a replication job in NAKIVO Backup & Replication, which is a prerequisite for performing failover during site recovery.
Creating Replication Jobs with NAKIVO Backup & Replication
Open the web interface of NAKIVO Backup & Replication in your browser. On the home screen, click Create > VMware vSphere replication job. (Note: For the purposes of the current walkthrough, replication of a VMware VM is performed. The steps would be much the same for an Amazon EC2 replication job or a Microsoft Hyper-V replication job.)
1. Select one or more VMs that you want to replicate. In this case, DC-VM is going to be replicated. Click Next.
2. Select a destination container and datastore. ESXi host 10.10.10.56 and the local datastore on this host are selected in this example. Click Next to proceed.
3. At this point, you can enable network mapping and define the network mapping rules for the replication job. Network mapping is useful if the networks used by VMs at the target (DR) site differ from those of the source site (which is likely). You can also configure network mapping later when configuring failover or failback actions for the Site Recovery job. Read our VM failover guide for a more detailed walkthrough of network mapping and Re-IP configuration. Click Next when you are ready to proceed.
4. Similarly to network mapping, you can configure Re-IP if the IP addresses used by the VMs at the source site and target site are different. Click Next to continue.
5. Configure the replication job scheduling options. You can set the replication job to be run immediately upon completion of another job, periodically (e.g. every 2 hours or every 5 days), daily/weekly (e.g. at 2:00pm every workday or every Sunday morning), or monthly/yearly. The calendar dashboard can help you to schedule jobs with a user-friendly visual overview of your schedule. When you have configured schedule to your satisfaction, click Next.
6. When configuring the retention settings for your replication job, you can follow the Grandfather-Father-Son retention policy. Check the appropriate boxes and enter right values, then click Next when ready.
7. Define the replication job options. At this step, you can enable application-aware mode, job-specific bandwidth throttling, changed block tracking for VMware VMs (if you are replicating a Hyper-V VM, you would use Microsoft’s analogue, Resilient Change Tracking), and other options. Click Finish or the Finish & Run button when you have configured all the options.
Once your replication job has run successfully, you have a VM replica ready. You can now prepare for site recovery by configuring failover.
Preparing for site recovery in an important process. Having a VM replica is a prerequisite for automated failover, which is an integral part of most site recovery workflows. A VM replica is used for failover when disaster occurs; the workloads are switched over from the failed source VM to its replica at the DR site. This blog post has walked through the use of NAKIVO Backup & Replication to create and maintain VM replicas. With the product’s new Site Recovery functionality (introduced in v8), you can build workflows for fast and simple recovery of your virtual environment in a case of disaster. These workflows typically include automated failover as a key step. Now that you have replication jobs configured for your business-critical VMs, you are ready to set up Site Recovery. In the next blog post in our series on Site Recovery, you can learn about site recovery workflows and see how to create a Site Recovery job with the appropriate sequence of actions.
You can explore the new Site Recovery functionality in your own environment with a full-featured free trial of NAKIVO Backup & Replication v8.