VMware vSphere Replication Best Practices

Virtual machine replication, or VM replication, remains one of the best data protection techniques for short recovery times and system availability during failures or downtimes of any origin. A VM replica can be powered on instantly in case of disaster, allowing you to recover a VM within seconds and maintain essential business operations.

To avoid any possible pitfalls while running VM replication software – such as app failures and errors, lost data, inadequately high consumption of resources, etc. – follow the VMware vSphere replication best practices covered in this post as part of an efficient and reliable VM data protection strategy.

What Is vSphere Replication and How It Works

VMware vSphere replication is the process of creating an exact copy of a VMware vSphere VM in a custom location, which can be an ESXi host or cluster. This VM copy, called a VM replica, is maintained in a powered-off state and updated periodically or continuously (continuous replication) to reflect the latest changes and state of the source VM.

Note: Replication in VMware vSphere is often confused with VMware backup, but these VM data protection methods pursue different objectives and should complement not substitute each other (see our blog post VM Backup vs. VM Replication).

10 VMware Data Replication Best Practices

Follow the best practices below and use them for VMware vSphere replication and virtual machine replication in other environments.

1. Define mission-critical workloads and prioritize them.

Determine which virtual machine and application data require the highest protection and the fastest recovery in case of downtime or disaster. You do not need to replicate everything, and not everything needs to be recovered within the same time frame.

For example, your online store maintains continuous operation and allows conducting direct order and purchase transactions with customers. In case your web server fails, the website, inventory, and CRM will be extremely critical to be recovered in the first place and as soon as possible to minimize the website downtime effect for customers, while back-office applications can wait.

2. Outline the data protection plan.

Failing to plan is planning to fail. Plan data protection activities with regard to business continuity demands, mission-critical workloads, priorities, individual data protection techniques, VM backup/replication job frequency and duration, required resources (data storage capacity, network bandwidth, VM backup/replication windows, etc.).

Make sure that your data protection plan also covers personnel responsibilities, dedicated hardware and software components, and disaster recovery scenarios.

3. Use a proper VM backup and replication ratio.

Although VM backup and VM replication seem to do the same thing, they have different objectives and cannot substitute each other. VM backup, whatever the type, is designed for long-term safekeeping of VM data. While VM replication provides for fast VM recovery, or in other words, VM high availability.

Virtual machine replication is a constituent element of a data protection plan, and you should find an efficient combination of regular VM backups and VM replication to fit your organization’s needs and budget.

4. Establish measurable criteria for VM data replication sequences.

Establishing measurable criteria may be of help when deciding which VM replication solution to choose. With regard to replication in VMware vSphere for high availability, you mainly need to consider two aspects:

  • Speed. How fast do you need VM data recovery to happen? Although a VM replica is a full copy of the original VM, you may still need some time to access it, for example, by remounting and bringing back a DBMS.
  • Fault tolerance is a system’s ability to maintain a good level of performance in case one or more components fail. Consider what fault tolerance degree your system should achieve to make VM data available in case a disaster strikes. While deciding how often you are going to employ virtual machine replication for fault tolerance, also take into account your hardware and network capacity.

Depending on your business continuity policy, establish the optimal proximity of the recovery point to the point of failure, that is, the recovery point objective (RPO), which is a tradeoff between minimizing your data loss and the cost of additional resources. For some businesses, it can be 24 hours, 7 days, or even 1 month, while others might need RPO to be merely 10 seconds.

5. Prepare the failover and failback roadmap.

To keep services and applications running with minimum disruption in case a production VM goes down, you should determine failover and failback procedures in advance. Work out production VM failover and failback action scenarios to be followed under different adverse circumstances.

Depending on the established data protection system, failover and failback procedures may be either manual or automated for both unplanned and planned shutdowns.

6. Enable verification of VM replicas.

Regular verification of your VM replicas to make sure that they are usable is a good practice. This will prevent any unpleasant surprises. It can happen that your VM replica turns out to be damaged or corrupted. Whatever the reason of the defect, verify VM replicas periodically and include the verification schedule in your plan.

When using third-party VMware vSphere replication solutions, it is recommended that you enable the feature of automatic verification of VM replicas, if such feature is available. Thus, all your VM replicas will be automatically verified for integrity.

7. Create application-aware replicas.

Modern data protection solutions provide you with a special application-aware replication mode allowing you to create application-aware VM replicas for applications, such as Microsoft Active Directory, Microsoft SQL Server, Microsoft SharePoint, Microsoft Exchange Server, or Oracle Database, by relying on VSS writers.

“Application-aware” or, in other words, application-consistent VM replication mode means that in the process of replication, the in-memory data and pending I/O operations are flushed to disk before a snapshot is made. Consequently, your VM replicas contain consistent data, and applications can be recovered smoothly, without data loss.

8. Build automation around the recovery tools.

Building automation around recovery tools brings down unnecessary delays if you have to restore VMs as soon as possible in case of disaster. Make sure automatic scripts are properly maintained. On the other hand, modern advanced data protection solutions offer integrated automation features.

9. Determine the retention and rotation policy.

This policy establishes the frequency of creating VM replicas and their retention time. VM replication data changes are continuously added to the replication changes database. Without its periodical purging, it would grow dramatically, until it consumed all the available disk space. This period between purges is called a replication purge delay. The default replication purge delay is set in your data protection software and varies from one to more days.

When establishing the retention and rotation policy, consider the replication purge delay. Changes older than the purge delay are cleared off from the replication changes database. Make sure that the replication changes database is backed up more often than the replication purge delay period; otherwise, some changes may be lost.

10. Choose the right VM replication and DR solution.

If you need to regularly perform virtual machine replication, you should select the right solution, which will fit your IT infrastructure specific needs and your organization’s budget.

When comparing VMware vSphere replication products currently available on the market, you should consider their specific features through the perspective of both functionality and performance. For instance, some products use synchronous VM replication (continuous replication), while others run semi-synchronous or asynchronous VM replication. While synchronous VMware data replication seems to be a better choice for the failover of transactional applications, full synchronization used in a slow network may critically bring down performance. In contrast, the asynchronous VMware replication has almost no impact on performance in this situation. On the other hand, if the primary server fails, asynchronous VMware VM replication can result in the loss of data committed.

Another point to take into consideration is the speed of your VM replication and restore operation, since it might require high-capacity hardware.

While remote replication of virtual machines to an offsite location or a cloud keeps your data safe, even if a disaster strikes your whole datacenter, make sure your infrastructure network has the required bandwidth.

VMware vSphere Replication with NAKIVO

NAKIVO Backup & Replication offers many useful features to make your VMware replication, as well as Hyper-V replication and even AWS EC2 instance replication simple and fast. The NAKIVO solution supports VMware changed block tracking for incremental replication, provides flexible retention settings and works directly with VMware APIs for VMware data replication at the host level. Use the step-by-step wizard in the web interface to configure vSphere replication.

How does replication work in VMware vSphere with NAKIVO data protection?

You can use VMware replicas created with NAKIVO for VM failover and complex full primary site recovery scenarios.

People also read