Performing Disaster Recovery Replication: Complete Walkthrough

This blog post is about disaster recovery replication, its usage today, and how NAKIVO Backup & Replication can help your organization with it. The blog post also includes a step-by-step guide on how to create a VMware replication job in NAKIVO Backup & Replication and how this replication job can be included into a site recovery job.

In this day and age, clients have little patience for any pause in the rendering of services, irrespective of the reason. For example, if you are visiting Company A’s website in search of a particular service, and this service is not available, you are likely to visit a website of a competitor to Company A which can deliver required services in its stead. In today’s feverish world, significant business downtime is highly likely to damage customer loyalty.

In other words, downtime may result in the following:

  • Loss of profit
  • Damage to your brand
  • Problematic relationship with customers and partners
  • Issues with supply chain
  • Legal problems, etc.

These results may be due to a lack of availability of business-critical services and data, which is what replication is for — to help you avoid downtime altogether or at least minimize its impact. Disaster recovery (DR) is far from being only about disaster recovery replication, and replication, likewise, is performed not only for disaster recovery, but also for data synchronization, integration, consolidation, and migration.

To recover your IT-infrastructure or at least its most critical parts after a disaster, your organization needs an alternative location which stores your replicated data and can be used as a failover site. An alternative site is needed for recovery from disasters that threaten your main site’s operative condition or even physical existence. Disaster recovery replication refers to creating exact copies of data, either within a single location or between a main location and alternative one(s). Disaster recovery replication should be constant and ongoing since if a disaster happens, you need to fail over your latest business-critical IT-processes to the DR software and hardware within an acceptable period.

Although at present, cloud replication is becoming increasingly popular, the use of alternative physical sites remains highly prevalent. The two main types of physical sites are the hot site and cold site. A hot site is a copy of your primary datacenter, containing the same equipment, software and hardware, so if your primary location is not operable, the hot site can instantly become a failover point. As you can probably imagine, its cost is correspondingly high. A cold site, on the other hand, is just a space with no hardware and software installed, which yet contains the necessary power and communication lines.

Factors Threatening Your Business Continuity and Requiring Disaster Recovery

A myriad of factors threaten your organization’s IT-infrastructure and its continuity. Some of them are mild and relatively frequent (unplanned downtimes for segments of the infrastructure), and others are catastrophic, but it’s fair to consider them all as disasters of different levels of severity. Let’s try to broadly categorize them:

  1. Natural disasters. These are acts of God over which no one has control. Predictable or unpredictable, they are overwhelming, wreaking havoc and destruction to all in their path — a path which may happen to contain the physical location of your organization. Floods, hurricanes, volcano eruptions, tornadoes, and earthquakes may not be a risk factor in your area, but extreme weather conditions are a threat everywhere. Civilization-related threats appear and disappear; we must always take precautions to avoid the very worst outcomes caused by natural disasters.
  2. Manmade disasters include acts of sabotage, terrorism, industrial espionage, vandalism, etc. Negligence and honest mistake are among these threatening factors, too.
  3. National and international events such as wars, strikes and other manifestations of unstable political situations can stamp your organization’s physical site out of existence.
  4. Technology and software-related failures and threats include power outages, hardware malfunctions and data loss as well as malicious-intention factors like viruses, ransomware and cyber-attacks.

In today’s world, IT-infrastructures are becoming increasingly universally virtualized, which has contributed to the overall effectiveness of DR. That is why contemporary disaster recovery replication software is more effective and affordable than ever, allowing you to create and orchestrate fully automated DR workflows and attain acceptable RTOs and RPOs.

Among DR metrics, the following should be taken into account when you configure replication:

  • Recovery Time Objective (RTO) is for measuring how long is acceptable for a recovery process to take or, in other words, how much time you can afford to lose before your organization resumes rendering its services.
  • Recovery Point Objective (RPO) refers to how up-to-date the files which you need to recover must be. If your mission-critical applications are very dynamic and many transactions occur within them, then you need to recover these applications instantly. Otherwise, you risk losing many transactions and, consequently, the money they were going to generate.
  • Work Recovery Time (WRT) indicates how much time it should take for the company to verify the integrity of the recovered data.
  • Maximum Tolerable Downtime (MTD) measures how much time the company can allow for disaster recovery without suffering serious losses and adverse consequences.

Below, you can find a set of replication characteristics which should be taken into account when choosing a disaster recovery replication solution for your organization.

Synchronous and asynchronous replication

During synchronous replication, data is written to a target data object while simultaneously being written to the corresponding source, allowing you to attain the lowest possible RTO and RPO objectives. This type of disaster recovery replication is preferred for high-end transactional applications and high-availability clusters requiring instant failover. The software client that writes the data receives the confirmation of the writing only after the data is committed to both the primary and secondary storage.

Although an object and its replica are kept synchronized, this creates latency in, and slows down, the app being synchronized, taking up bandwidth, and creating general overhead. If an alternative storage location is used, there is also the possibility that it could be disconnected. Yet synchronous replication allows you to fail over to the secondary site almost instantly and without data loss.

During asynchronous replication, data is written to a target data object only some time after it has been written to the corresponding source. The disaster recovery replication of the data occurs in set intervals (once a minute, ten minutes, an hour, etc.), according to a set schedule. This is a good choice if your network bandwidth cannot support the pressure of synchronous replication, that is, if the change rate of your mission-critical data constantly exceeds its rate of transfer to the failover site.

File-based and block-based replication

A file system stores files on certain disk blocks. One file may be stored on blocks scattered all across the disk. That is why when a file-based replication process reads the file, it has to “run” about the disk to find the read file’s scattered pieces. This “running about” takes considerable time. This time loss can be avoided via block-based replication, which transfers to a target location, not changed files, but changed blocks, reading blocks in the order in which they are situated on the disk. Therefore, other conditions being equal, it is preferable to opt for a DR solution performing block-based replication.

Full-size replication and incremental replication

To continuously replicate the whole volume of your data is unreasonable and impractical. However, one full-size replication needs to be done at first. As a result of this full-sized replication, an exact replica of the source object is created. Then, incremental replication can start, which means that only the data changes are copied over to the failover site (changes on the block level, if block-based replication is used). At present, all advanced DR solutions, like NAKIVO Backup & Replication, allow you to perform incremental disaster recovery replication.

Application-aware replication

If disaster recovery replication is application-aware, it captures the state of in-memory application data and I/O operations. This allows you to avoid data loss on the application. Replicated applications remain transactionally-consistent, meaning they won’t crash when run on the DR site.

Disaster Recovery Replication with NAKIVO Backup & Replication

If you are looking for the best VMware backup solution for your environment, try NAKIVO Backup & Replication. Our agentless data protection solution provides image-based application-aware incremental backup and replication for VMware VMs, as well as Hyper-V VMs and AWS EC2 instances. As mentioned above, we are going to demonstrate how to perform VM replication using VMware VM as an example.For VMware virtual environments, replication in NAKIVO Backup & Replication has the following features (many of which are available for Microsoft Hyper-V and AWS EC2):

  • Application-Aware Mode ensures that Microsoft Exchange, Microsoft Active Directory, Microsoft SQL and some other applications flush in-memory data and I/O transactions to disk before disaster recovery replication begins. Applications replicated in this mode are crash-inconsistent, which means that they can run error-free if a disaster occurs and replicas must be powered on.
  • Convenient replication automation via policies. Through policies, you can completely automate VM replication. A policy consists of rules based on VM size, tag, name, location, etc. A policy-based job works with all VMs corresponding to the set rules, finds these VMs automatically as they appear in your infrastructure, and adds them to replication jobs.
  • Recovery point retention is flexible, allowing you to have 30 recovery points, known as VM snapshots. Using the Grandfather-Father-Son rotation scheme, you can create daily, weekly, monthly, and yearly recovery points.
  • The Screenshot Verification feature allows you to verify whether replicas are in the operational state. Thanks to this feature, if a disaster occurs, you won’t be met with “pleasant” surprises like corrupted error-ridden VM replicas.
  • For data with undemanding RTOs, that is, for data replicated asynchronously, you can replicate, not production VMs, but their backups. This helps offload your main IT-resources.
  • For VM replicas, you can choose to create thin-provisioned disks, irrespective of what disks are used by production VMs. If a disk is thin, only data and applications occupy its size, and it contains no unused space.
  • Our product’s replication feature can be used in the context of the Site Recovery feature, which allows you to orchestrate and automate complex DR workflows. Via the Site Recovery feature, you can integrate replication, planned or emergency failover, failback and other operations into a single process which can be launched in a click!
  • Swap data — swap files (on Windows OS) and swap partitions (Linux OS) — can be excluded from VM replicas, which increases replication speed and saves storage space.
  • LAN-Free Data Transfer mode speeds up replication considerably via Hot Add and Direct SAN Access If NAKIVO Backup & Replication runs on a server with access to VM datastores, it can, thanks to the Hot Add feature, read VM data from these datastores, via the storage I/O stack, and, in the process, bypass the host’s TCP/IP stack. The Direct SAN Access feature, in turn, allows you to read data directly from a SAN storage via Fiber Channel or ISCSI, which increases replication speed and removes a portion of the load from your production network.
  • If you use Microsoft Exchange or Microsoft SQL Server, NAKIVO Backup & Replication can truncate the server’s transaction logs, so that they do not occupy too much space on the server.
  • Via the Network Acceleration feature, you can increase replication speed by up to 2 times. To use this feature, you only need to install additional Transporter onsite or offsite.
  • Installing additional Transporter can also allow you to encrypt replicated data when it is transferred and when it has reached the target repository.
  • Via the Multi-Tenancy feature, which allows you to deliver Replication-as-a-Service, you can create up to 1000 isolated tenants, and customers can use them to perform replication and other tasks on their own accord.
  • Advanced Bandwidth Throttling feature allows you to limit bandwidth for replication processes, so that they do not overload the network.
  • Should the need arise to save time and offload your network, you can first transfer (seed) a VM replica to removable media and then move them to a new location. After that, only incremental replication is required. NAKIVO Backup & Replication can use built-in proprietary changed block tracking and VMware’s Changed Block Tracking for performing incremental VM replication (backup).
  • You can install NAKIVO Backup & Replication on NAS devices and replicate data between them, enjoying increased performance and speed.

How to create VMware Replication Job with NAKIVO Backup & Replication

Below, we’ll demonstrate how to create a VM replication job for a VMware environment in NAKIVO Backup & ReplicationThe process is simple and intuitive, as you will be able to see for yourself.

On the main UI of NAKIVO Backup & Replication, click Create, then choose VMware vSphere replication job (depending on your environment, you can also choose Amazon EC2 replication job or Microsoft Hyper-V replication job).

Disaster recovery replication Create menu on the main UI of NAKIVO Backup & Replication1

After that, please follow the steps described below.

1. On the Source step of the New Replication Job Wizard for VMware vSphere, choose a virtual machine or a whole container of VMs to replicate and click Next.

Source tab of the New Replication Job Wizard for VMware vSphere1

2. On the Destination step, for the replica, choose a target container, a target datastore and a VM folder. After that, click Next.

Disaster recovery replication Destination tab of the New Replication Job Wizard for VMware vSphere1

3.  On the Networks step, enable and configure network mapping if on the target (DR) site, as VMs use different networks than on the main site. Having done this, click Next. Alternatively, you can skip this step and click Next at once.

Disaster recovery replication Networks tab of the New Replication Job Wizard for VMware vSphere11

4. On the Re-IP step, you can configure IP change routine if VMs use different IPs on the target (DR) site than on the main site. You can create a Re-IP rule or use an existing one. After this, click Next. You also can skip this step by clicking Next at once.

Disaster recovery replication Re-IP tab of the New Replication Job Wizard for VMware vSphere1

5. On the Schedule step, you should find a very convenient means of scheduling your replication job. You can check Do not schedule, run on demand if this is a one-time replication job or you are not yet sure about the specific details of your schedule. Under Schedule #1, you can choose Run daily/weekly (that is, certain days per week), Run monthly/yearly (that is, certain months per year), Run periodically or Run after another job. If you choose Run after another job, you should select, for example, job Z and configure, for instance, whether you want your current job to run after Z immediately or not, and whether it should launch After successful runsAfter failed runs, or After stopped runs. You can also Add another schedule (#2, #3, etc.) and Show calendar for your convenience. Another option for configuring is Effective from, which determines the day when the replication job schedule becomes active.

Ensure that your replication interval matches the maximum RPO for the VM you replicate.

Disaster recovery replication Schedule tab of the New Replication Job Wizard for VMware vSphere1

6. On the Retention step, you can set up to 30 recovery points to keep (after finishing a replication job, NAKIVO Backup & Replication should create a recovery point of the replica VM). With NAKIVO Backup & Replication, you can use a traditional Grandfather-Father-Son retention scheme, which is ideal, DR-wise, for storing replicas and backups.

Disaster recovery replication Retention tab of the New Replication Job Wizard for VMware vSphere1

7. On the Options step, you can set all the remaining options, maximally automating and providing nuance to your replication job. You can name the replication job and set it in app-aware mode, change tracking, network acceleration, encryption, VM verification, the type of disk used for the replica (thin disk or the one used by the replicated VM), log truncation, script usage, transport mode, bandwidth throttling, etc.

Disaster recovery replication Options tab of the New Replication Job Wizard for VMware vSphere11

8. Having configured all the options, click Finish or Finish & Run (if you want the job to run immediately). After the replica has been created, it is ready for your DR process.

How to create a Site Recovery job with NAKIVO Backup & Replication

The replication job you’ve just created can be a part of a complex automated DR workflow made possible via the Site Recovery feature. With the help of this feature, you can organize actions and conditions into comprehensive DR algorithms tailored for certain situations and purposes (say, power failure, disaster avoidance, etc.).

This is how you can integrate your replication job in a DR workflow via the Site Recovery feature:

1. In the NAKIVO Backup & Replication main UI, click Create, then choose Site recovery job.

Disaster recovery replication Create menu on the main UI of NAKIVO Backup & Replication

2. New Site Recovery Job Wizard will open. On the wizard’s Actions step, choose Run jobs.

Disaster recovery replication Actions tab of the New Site Recovery Job Wizard.

3. You can see the Run Jobs window, where you can choose replication jobs, including the job you’ve just created. After choosing a job and configuring it, click Save.

Disaster recovery replication Run Jobs tab of the New Site Recovery Job Wizard1.

4. The Actions step will open again. On this step, you can now either choose additional actions to include in the complex disaster recovery workflow or click Next. After that, just follow the New Site Recovery Job Wizard’s instructions until you’ve created the site recovery job.

Concluding Remarks

Our product can secure your virtual environment against unplanned downtimes and catastrophes through disaster recovery replication and backup options which are both block-based and application-aware. Via rule-based policies, you can automate and orchestrate disaster recovery replication processes, integrating them into complex comprehensive workflows. You can speed up replication jobs through Network Acceleration and Change Tracking and ensure, through VM Verification, that replicas are in the operational state. Functionality and price of NAKIVO Backup & Replication are among the best on the market.

To see for yourself and test NAKIVO Backup & Replication in your physical, virtual, or cloud environment, download the fully functional free trial today or request a live demonstration of the product.VMware Replication