Synchronous vs. Asynchronous Replication Strategy
The modern business world is expanding with every second, meaning that there are ever-growing amounts of vulnerable data that must be protected. In the event of a disaster, every business must have a set of recovery strategies in place to protect and restore mission-critical processes as soon as possible. Hence, there arises a need for remote replication which implies sending business-critical data offsite for reliable storage and fast recovery.
What Is Remote Replication?
Remote replication is an essential part of data protection and recovery. Previously, replication was mostly used for copying and storing application data in off-site locations. However, with time, this technology has significantly expanded. Currently, replication allows you to create a synchronized copy of a VM on a remote target host. The copy of the VM is called a replica, and it functions just like a regular VM available on a source host. VM replicas can be transferred to and run on any capable hardware. They can be powered on in a matter of seconds in case the original VM fails. This technology can significantly decrease downtime as well as mitigating potential business risks and losses associated with disaster.
Before running a replication job, the following factors should be considered:
- Distance — the greater the distance between the sites, the more latency will be experienced.
- Bandwidth — the internet speed and network connectivity should be sufficient to ensure an advanced connection for rapid and secure data transfer.
- Data rate — the data rate should be lower than the available bandwidth so as not to overload the network.
- Replication technology — replication jobs should be run in parallel (simultaneously) for efficient network use.
These factors help define which type of replication is preferable when dealing with a specific type of disaster.
Synchronous vs. Asynchronous Replication Strategies
Two main types of data replication can be distinguished: synchronous and asynchronous.
Here, data is replicated to a secondary remote location at the same time as new data is being created or updated in the primary datacenter. This makes for near-instant replication, which enables you to keep your data replicas only a few minutes older than the source material. Essentially, both host and target sources remain completely synchronized, which is crucial for successful disaster recovery (DR).
Due to the fact that data is atomically updated in multiple remote locations, network performance and availability are affected. Atomic operations are defined as a sequence of operations that must be completed without interruption before another task can be performed. In the context of synchronous replication, this means that the write is considered finished only when both local and remote storages acknowledge its completion. Hence, zero data loss is guaranteed, but overall performance is slowed down.
In this case, replication is not performed at the same time as changes are made in the primary storage. Data is replicated only in predetermined time periods (this could be hourly, daily, or weekly). The replica can be stored in a remote DR location, as the replica does not have to be synchronized with the primary location in real time.
With asynchronous replication, data is not atomically updated in multiple locations, meaning that the application proceeds with writing data that is not yet fully replicated. Thus, a write is considered complete as soon as the local storage acknowledges it. With asynchronous replication, network performance and availability are improved without affecting bandwidth. This is due to the fact that replicas are not updated in real time. The downside is that in a disaster scenario, the DR site might not contain the most recently made changes, so some critical data could be lost.
Synchronous vs. Asynchronous Replication: Main Differences
|Distance||Works better when locations are in close proximity (performance drops in proportion to distance).||Works over longer distances (as long as network connection between datacenters is available).|
|Cost||More expensive||More cost-effective|
|Recovery Point Objective (RPO)||Zero||From 15 minutes to a few hours|
|Recovery Time Objective (RTO)||Short||Short|
|Network||Requires more bandwidth and is affected by latency; Can be affected by WAN interruptions (as the transfer of replicated data cannot be postponed until later).||Requires less bandwidth and is not affected by latency; Is not affected by WAN interruptions (as the copy of data can be saved at the local site until WAN service is restored).|
|Data loss||Zero||Possible loss of most recent updates to data.|
|Resilience||A single failure could cause loss of service; Viruses or other malicious components that lead to data corruption might be replicated to the second copy of the data.||Loss of service can occur after 2 failures.|
|Performance||Low (waits for network acknowledgement from the secondary location).||High (does not wait for network acknowledgement from the secondary location).|
|Management||May require specialized hardware; Supported by high-end block-based storage arrays and network-based replication products.||More compatible with other products; Supported by array-, network- and host-based replication products.|
|Use cases||Best solution for immediate disaster recovery and projects that require absolutely no data loss.||Best solution for storage of less sensitive data and immediate disaster recovery of projects that can tolerate partial data loss.|
Which Is Better: Synchronous or Asynchronous Replication?
There is no clear answer to this question; your choice depends entirely on your business priorities. Asynchronous replication works best with projects that span across long distances and are allocated a minimal budget. It is also suitable for businesses that can afford partial data loss. On the other hand, synchronous replication is performed when reliable and long-term storage is necessary and the business cannot afford to lose any critical data. It is useful when RTOs and RPOs are short.
However, there is a middle ground: you can use both synchronous and asynchronous replication strategies, at different infrastructure levels. For example, synchronous replication may be used to transfer and secure data over a Local Area Network (LAN) while asynchronous replication sends critical data to a remote DR site.
Replication in NAKIVO Backup & Replication
Mode of replication
vSphere Replication in NAKIVO Backup & Replication is forever-incremental. The first replication copies the full VM, but the following replication jobs will save only the changes to the data in the replica (increments). Moreover, after each replication job, a recovery point that references all data blocks required for the VM recovery is created. This mode of replication ensures reduced network load and saves you the time that would otherwise be spent on full replication jobs.
NAKIVO Backup & Replication offers fast deployment on various hardware and software platforms:
- VMware VA. The pre-configured VMware Virtual Appliance can be easily downloaded and then imported into VMware vSphere.
- NAS. By installing NAKIVO Backup & Replication directly onto a NAS device, you can create your own VM backup appliance.
- AWS AMI. NAKIVO Backup & Replication can be deployed in the Amazon cloud as a pre-configured Amazon Machine Image (AMI).
- NAKIVO Backup & Replication can be installed on a physical or virtual machine running Linux with a single command.
- NAKIVO Backup & Replication can be installed on a physical or virtual machine running Windows with a single click.
A snapshot captures the state of a system at a particular point in time. With NAKIVO Backup & Replication, VM replicas are created by means of VM snapshots, which are used to retrieve current VM data. Every time a replication job is performed, a temporary VM snapshot is taken, the changed data is identified, and all the updates are added to replica. After the job is completed, the snapshot is deleted.
Changed block tracking
NAKIVO Backup & Replication relies on VMware CBT (Changed Block Tracking) and Hyper-V RCT (Resilient Change Tracking) to identify and copy the changes that have been made in a VM since the last replication. This technology significantly improves the speed of replication jobs. If CBT and RCT are unavailable, NAKIVO Backup & Replication uses a built-in proprietary change tracking method.
Live application support
NAKIVO Backup & Replication is an application-aware solution. VMs are used to run all sorts of business-critical applications, including Microsoft Exchange, Active Directory, SQL, SharePoint, etc. For these programs with frequent input and output, it is essential to ensure that application data is always consistent, especially when a replication job is run. Thus, when a snapshot is created, applications inside the VM store all transactions in memory so as not to disrupt any running operations.
NAKIVO Backup & Replication facilitates the protection of critical VMs by allowing you to organize them into containers, such as resource pools, folders, or clusters. An entire container can be added to a particular replication job. You can easily add or remove elements from the container, which changes are then automatically reflected in the relevant replication jobs. The feature is flexible; you can also exclude certain VMs in a container from a replication job. In this case, the entire container gets protected except for the excluded VMs.
This feature allows you to automatically verify that VM replication is completed successfully. As soon as a replication job is finished, networking in the replica is disabled, and this replica is momentarily powered on to take a screenshot. The replica is then powered back off and reverted to the most recent recovery point. The user receives an email report with a screenshot of the test-booted OS.
NAKIVO Backup & Replication lets you organize replication jobs into groups (folders) so as to arrange applications, services, and locations in logical structures. Furthermore, bulk actions can be easily executed for all or selected jobs for a group.
If you want to be kept aware of the status of your replication jobs, NAKIVO Backup & Replication can notify you about it by sending automatic email reports, either on schedule or on demand.
NAKIVO Backup & Replication lets you configure replication jobs to be run either on demand or on schedule (daily, weekly, monthly, and yearly). You can even set up jobs to run on a custom schedule that meets your specific business needs, e.g., every 20 minutes, every 5 days, or the first Tuesday of every month. You can also specify time windows within which a job should start and finish.
Staging VM replication (Seeding)
The initial (full) replication of larger VMs can take a long time due to their size. To speed up the process, NAKIVO Backup & Replication can perform staged VM replication. This feature allows you to first transfer (“seed”) the initial VM replicas to removable media. Then, those replicas can be transported to the new site, where a new replication job is run using the transferred VMs. Then, only incremental replication is performed.
A recovery point represents a VM at a particular point in time, which is then used for VM recovery. With NAKIVO Backup & Replication, you can store up to 30 recovery points per VM replica. The product lets you store recovery points according to Grandfather-Father-Son (GFS) retention policies, as described below. This method ensures that VM replica recovery points are saved at the DR site with designated frequencies (e.g. daily, weekly, monthly, and yearly).
- Keep one recovery point per week for X weeks: the last recovery point of every week is stored for the specified number of weeks.
- Keep one recovery point per month for X months: the last recovery point of every month is stored for the specified number of months.
- Keep one recovery point per year for X years: the last recovery point of every year is stored for the specified number of years.
- RTO and RPO: a recovery point objective (RPO) is the limit for the earliest point in time to which your VM should be reverted during DR. Thus, it defines the amount of data that can be lost without causing unreasonable damage to your business. Replication can help you meet shorter RPOs, as your replication jobs can be run as you want with the custom schedules you set for them.
VM replication can also help you meet short recovery time objectives (RTOs). The RTO is the stipulated period of time within which your business operations must be recovered after a disaster. With replication, the VM can be instantly restored simply by powering on the replica.
VM replication can protect your business-critical services from a number of problems, including those caused by critical VM loss/failure, host/data store failure, or natural disasters. VM replication is generally used when projects operate with sensitive data and/or can tolerate zero data loss. Replication is appropriate for these cases because VM recovery can be performed easily and almost instantly if disaster strikes.
Replication functionality is used in the following cases:
- Disaster Recovery with Replica
Using NAKIVO Backup & Replication, the negative effects of system failure, such as downtime and loss of revenue, can be largely mitigated. With VM replication, you can near-instantly recover an entire VM using its replica, thus ensuring high availability of your business services.
- Failover and Failback
When a disaster takes out your primary database, your business can be gravely affected – unless you have an effective DR plan in place. This is where failover is useful. Failover is the process of switching from a source VM to a VM replica in order to move business-critical workloads from an affected site to a DR site.
Once you have managed to restore your primary site, you can switch business operations back to the original VM. This process is called failback, and it allows you to synchronize data between the primary site and the DR site.
- Site Recovery
With NAKIVO Backup & Replication, you can build site recovery workflows (jobs), which are easily assembled custom algorithms for automation and orchestration of the DR process. Manual implementation of a disaster recovery plan can be a time-consuming and resource-intensive task. Fortunately, NAKIVO Backup & Replication lets you arrange actions into site recovery jobs that can be run in just a few clicks. You can create special site recovery jobs to deal with any type of a DR event.
The following actions and conditions can be included in your site recovery workflows:
- Failover VMs. Fail over to an already-created VM replica.
- Failback VMs. Transfer workloads back from a VM replica at a DR site to a source VM at a production site.
- Start VMs. Start one or multiple VMs.
- Stop VMs. Stop one or multiple VMs.
- Run jobs. Run data protection jobs (backup, replication, etc.) that you have already created for VMs.
- Stop jobs. Stop VM data protection jobs that are running.
- Run script. Run your own pre- or post-job script on a Windows or Linux machine.
- Attach repository. Attach a backup repository.
- Detach repository. Detach a backup repository that is attached.
- Send emails. Receive email notifications detailing the results after a specific action is completed.
- Wait. Wait for a defined period of time before starting the next action.
- Check condition. Check whether a resource exists, whether a resource is running, or whether an IP/hostname is reachable before proceeding to the next action.
Any business can fall victim to unexpected disaster or system failure that can compromise the integrity of business-critical data. This makes having an effective DR plan absolutely essential in the modern business world, where high availability and business continuity are paramount.
Replication can become an invaluable tool for DR. Synchronous and asynchronous replication strategies should be implemented smartly, depending on your business priorities and needs. Asynchronous replication is a cost-effective strategy that requires less bandwidth and no additional hardware. It can be used for storing less sensitive data and transferring data over long distances. Though synchronous replication is highly dependent on network connection and latency, it guarantees zero data loss and allows you to instantly restore mission-critical operations.
NAKIVO Backup & Replication is a fast and flexible solution that can replicate your VMs to one or more remote locations for reliable storage. With the solution in place, you can simply power on your replicas when disaster strikes, thus avoiding any loss of revenue and long-term shutdown.