VM Backup: How VSS Makes It Consistent
Dmitriy Yegarmin, posted on May 17, 2017
One of the important parameters of a VM backup is its consistency. The consistency guarantees that the virtual machine is fully functional after it was restored from a backup, instead of having just a bunch of files. A non-consistent backup will just copy files in their last-saved state without keeping any live changes. It is like when you are writing a blog post, and the power shuts down. When the power is back, you open Word and continue writing from the moment you last hit Ctrl+S.
To avoid such situation, VM backups are snapshot-based: when the backup process starts, a snapshot, i.e. the exact copy of the VM, is made. During this process, all VM disks (which are represented as .vmdk files) become read-only. Then the VMware ESXi creates a delta file connected to the main .vmdk file. This file stores all changes on the master disk made during the creation of a snapshot. After the delta file is created, the backup software starts reading the read-only .vmdk file. Later on, this delta file is merged with the .vmdk file.
This technique may work well when there is not such a heavy load on a file system, such as an internal file storage or a web server. However, if the backup process starts at the moment when some transactions are running, and I/O (input/output) operations are in place, data might be lost. In case you are using some application or database, such as SQL, Microsoft Active Directory, or Exchange, you cannot just close the file and then append data to it. Here, Volume Shadow Copy Service (VSS) can help. This technology allows copying the files which are currently opened and in use. It is available since Windows XP and Windows Server 2003.
Basic VSS Concepts
To understand how it works, at first, we need to clarify the terms. The high-level components of the Volume Shadow Copy Service are the provider, writers, and requestors.
The VSS provider is the core component of VSS which in fact creates snapshots of the volumes.
The writers (surprise!) write data to the files and databases. Each application with VSS support, adds its writer to an operating system during the installation.
The requestor is a component which commands the VSS provider to start or stop working. The requestor could be Windows itself or a third-party application like a backup software.
How Volume Shadow Copy Service Works
In a nutshell, the Volume Shadow Copy Service works as follows: The requestor initiates a VSS provider. The provider redirects the writers to write data into a log file and starts creating the volume snapshot. After the requestor sends the stop signal to the provider (usually, after the snapshot is ready), it begins moving data from the log file to the volume.
Let’s see how it works with the VM backups on an example of NAKIVO Backup & Replication. In this case, NAKIVO Backup & Replication becomes a VSS requestor.
- Before the VM backup starts, an application writes data via its VSS writer.
- As the backup starts, NAKIVO Backup & Replication requests a VSS provider to start working. The writer redirects data into the log file while the volume ‘freezes’.
- NAKIVO Backup & Replication starts creating a snapshot on the VM layer. It might take anywhere from a few seconds to several minutes, depending on the VMFS storage load. During this time, the writer continues writing data into the log file.
- A snapshot of the VM was created successfully. NAKIVO Backup & Replication, as the requestor, sends the signal to the Provider to stop working.
- The VSS Provider moves changes from the log file to the volume. NAKIVO Backup & Replication copies the data block of the VM snapshot to the backup repository.
As we could see here, Volume Shadow Copy Service (VSS) is a great technology to keep VM backups consistent, but it works only on Windows-based virtual machines. For Linux-based ones, it is necessary to implement special freeze-thaw scripts to keep backups consistent.