May 24, 2017
VMware Backup Best Practices
Backups are a critical part of our VMware virtual infrastructure. These days with mission-critical 24 hour a day, 7 days a week, the always-on infrastructure behind many web-facing businesses, business continuity is essential. When thinking about backing up our VMware virtual environments, what are some best practices to make sure we are getting effective, efficient, and usable backups that can be restored in case of infrastructure failure, natural disaster, or malware infection? Let’s take a look at VMware backup best practices.
VMware Backup Best Practices
What are some of the best practices we need to give attention to when thinking about our VMware backups? Well, there certainly are things that make for having a solid, reliable, secure, and resilient VMware backup environment. These include first and foremost backing up your VMs, not relying on snapshots as backups, using changed block tracking, copying backups to a secondary location, replicating virtual machines, encrypting, and testing our backups.
The first mention of actually having backups or creating backups of your virtual machines as a VMware backup best practice may seem like a joke. However, it happens more often than not that virtual machines may actually be either intentionally or unintentionally excluded from being backed up. Some have made the mistake of thinking that virtual machines are more resilient than a physical machine due to the fact they usually reside on higher end equipment, including compute, memory, and storage. While this may be true and the risk of equipment failure may be slightly lower, there are still considerations of natural disasters, and user or malware induced data loss on a virtual machine. So, backup your virtual machines!
Don’t view snapshots as backups
We have already covered the topic of snapshots vs backups, and to reiterate the point, snapshots are not backups. Many VMware administrators have made the mistake of thinking they have a backup they can go back to when they have a snapshot on a virtual machine. Then, when corruption or loss happens due to any number of factors, the snapshot is no backup at all. The snapshot is a part of the virtual machine files themselves and is dependent on the virtual machine disks. This is why they are referred to as a snapshot "chain". Do not view your snapshots as backups! For our VMware backups to be valid, they need to be able to recreate the virtual machine without any of the source virtual machine files or infrastructure available to them.
Use Changed Block Tracking
Changed Block Tracking or CBT is part of the VMware Data Protection API which makes up the VMkernel storage stack. It also allows third-party backup applications to have hooks into these APIs to take advantage of CBT to perform backups. CBT enabled VM backups allow for much more efficient incremental backups of virtual machines. This is made possible by the changed tracking functionality that allows backup software to know which “blocks” have changed since the last backup of the virtual machine. VMware CBT benefits not only virtual machine backups but also other processes such as replication that we will discuss below.
Virtual machines that have changed block tracking enabled have an additional file with “-ctk.vmdk” in the virtual machine directory that is used to store the mapping of virtual disk blocks. This special file houses the block mapping information that is used to tell if blocks have changed since the last backup. The “ctk” file that is created stays the same size as long as the VMDK disk size is not increased.
The benefits to speed, performance, and efficiency of virtual machine backups when using CBT can be exponential. Unless you are using a RAW disk mapping or legacy virtual machine version for some reason, CBT is a definite recommendation to use with VMware virtual machine backups.
Copy your backups to a secondary location
Having secondary backup copies is a must in today’s world of backing up your VMware infrastructure. Enterprise IT environments simply can’t afford not to have multiple copies of their backup data. Having at least one other copy of your backup data offsite ensures that if disaster strikes at the physical production location affecting even your backup infrastructure, your backup copy will be the failsafe.
Also, these days, many have been hit with a Ransomware infection that has not only corrupted and encrypted production resources, but also backup resources. So, in a perfect storm if a user with administrator permissions is hit with Ransomware they may likely have access to production and backup systems including backup repositories, etc. We can imagine the damage that can be done in this case.
With backup copies, you can kick off an additional copy of your backup pulled from production to a secondary backup repository. Ideally, this backup copy repository would exist in a different physical location either close by or another geographic region. As long as you have network connectivity to your backup repository you can push that data across to the secondary backup repository.
Replicate your VMs to another location
Replication is different from having a backup copy. While backup copies are copies of backup data, replication is an actual copy of the production virtual machine that has been replicated to another VMware environment. Ideally, we would want this to be located in a secondary DR site so that if we have a failure in our primary location, we can failover to the secondary location which contains the replicas. Replication provides a much easier and quicker path to recovery from a more catastrophic disaster where multiple VMs or possibly a whole site is affected. With replication, we have copied and restored production VMs to the secondary site with the replication process. These are kept up to date as each VMware backup and replication cycle is completed.
If we do not have a secondary DR facility, we can replicate within the same environment as well to perhaps another host, datastore, cluster, etc. This would allow protection for a quick recovery from data loss resulting in a failed onsite datastore that might take out multiple VMs, etc.
Encrypting Your Backups
Most of the time these days when we hear about encryption it is usually in the context of something negative such as Ransomware. However, the encryption we are talking about with our backups helps to make our backup strategy secure. There are two types of encryption that we want to mention – encryption at rest, and in-flight encryption. Encrypting our VMware backup data is a best practice in today’s security-minded world.
A less than desirable byproduct of copying our data to multiple places using backups, backup copies, and replication is we now have our potentially sensitive data in multiple places. This increases our vulnerability to have data leakage. If someone were to get their hands on unencrypted backup data, they could take that backup data and restore it to another unauthorized environment and have access to data contained therein.
This is where encryption at rest comes into play. If we have encrypted our backup data repository where the backed-up data lives, the backup data is useless without the encryption key. If someone does get their hands on unauthorized data, it will be useless. This is encryption at rest. Below we see an example of the backup data at rest. In NAKIVO Backup & Replication, we have the ability to encrypt our backup repository. This means that all data that is backed up to the encrypted repository will be unreadable without the encryption key as mentioned above.
The second part of securing our backups with encryption is encrypting our data in flight. In a sample Backup Copy Job below in NAKIVO Backup & Replication, we can see we have the option to flag Encryption to Enabled in the job options. As the bubble tip notes, job data will be encrypted during the transfer, which protects the data sent over the network.
Testing VMware backups
Perhaps one of the most overlooked areas when thinking about backups is actually testing our backings. Many administrators have been caught in the nightmare of thinking they have good backups, disaster happens, and then they realize they actually have corrupted backups or critical data is missing, etc. Testing our backups is arguably as important as actually taking them.
Testing backups is often overlooked because it is time consuming and tedious to actually carry out. One might think it is simply not feasible to test every single backup that is configured to make sure everything is good. One powerful feature for verifying backups in NAKIVO Backup & Replication is screenshot verification.
Screenshot verification uses the flash boot technology to present VM disks stored in the backup repository directly to the VMware environment. With screenshot verification, NAKIVO Backup & Replication boots the VM, snaps a screenshot of the VM in a booted state, and emails the screenshot to you! On the properties of the job, we have the option to turn on screenshot verification.
There are definite things we want to do and take advantage of when creating and running VMware backup as best practice. These include but are not limited to backing up your VMs, not relying on snapshots as backups, using changed block tracking, copying backups to a secondary location, replicating virtual machines, encrypting, and testing our backups. VMware backup is a necessary part of RTO and RPO objectives for any organization running atop VMware vSphere infrastructure. Following these and other guidelines will help to ensure backup effectiveness, validity, security and vSphere environment resiliency.