Using Microsoft’s Volume Shadow Copy Technology for Consistent Backups
By: NAKIVO Team
When creating a backup of a virtual or physical machine with running applications, especially applications that intensively write data to files and databases, ensuring backup data consistency is essential. Having consistent backup data facilitates recoveries without data corruption or delays. For this purpose, Microsoft developed the VSS technology.
This blog post explains what Volume Shadow Copy Service (VSS) is, how it works, and why this technology is important for backing up Windows-based virtual and physical machines.
What Is VSS?
Volume Shadow copy Service (VSS), a native feature of Windows operating systems, facilitates the creation of consistent application-aware backups. To achieve backup data consistency, the following happens:
- VSS temporarily stops writing operations performed by applications.
- Buffers from memory are written to disk, that is, flushed and the file system is frozen.
- A volume snapshot, also called a shadow copy, is created in the Windows operating system.
Volume Shadow Copy, also referred to as Volume Snapshot Service, is a set of COM (Component Object Model) interfaces in Windows, which provide the framework to create consistent backups for different applications. VSS was first released with Windows XP and Windows Server 2003.
Why You Need the VSS Technology
Application data consistency is important for VM and physical server backups. Consistency guarantees that the applications running on virtual or physical machines are fully functional after they are restored from a backup.
Without VSS, you get an inconsistent backup. Such a backup is created by just copying each block in the state that it is in at the time of being copied. Suppose that a file is continuously used by an application such as a database or a system application. This application is continuously changing data in the opened file by writing/deleting blocks. While the first part of the file is being copied, blocks used by other parts of the file are changing. By the time the other parts of the file are copied to the backup repository, the blocks in the first part have changed. What you get is data blocks that are not consistent and do not represent a single specific point in time.
One way to avoid this situation is snapshot-based VM backup. When the backup process starts, a snapshot, that is, an exact copy of the VM, is taken. During this process, all VM disks (which are represented as .vmdk files in VMware environments) become read-only. To store the changes made during the backup process on the master disk, VMware ESXi creates a delta file connected to the main .vmdk file. After the delta file is created, the backup software starts copying data from the read-only .vmdk file. Once the backup process is completed, the delta file is merged with the .vmdk file.
This approach may work well for file systems with not a very high load, such as internal file storage or web servers. However, if the backup process starts at the moment when certain transactions are running, and I/O (input/output) operations are in place, data may be lost. When you are using applications or databases like Microsoft SQL Server, Active Directory, or Exchange Server, you cannot just close the file and then append data to it. Here, Volume Shadow Copy Service (VSS) can help. This technology allows creating point-in-time copies of files that are open and in use without compromising integrity and usability of these copies.
How Volume Shadow Copy Service Works
To understand how VSS works, we first need to go over the components that make up this feature. The high-level components of VSS are:
- The VSS provider is the core component of VSS, which creates shadow copies (snapshots) of volumes. The VSS provider can have a software or hardware implementation. The copy-on-write software VSS provider is included in the Windows OS. Hardware providers are usually used with SAN storage (storage area network). Hardware providers offload the host OS when creating a shadow copy.
- The writers are software components that write data to files and databases to ensure consistency. Each application with VSS support adds its writer to the operating system during installation. Examples of applications providing a VSS writer are Microsoft SQL Server and Exchange Server.
- The requestor is a software component that commands the VSS provider to start or stop working (creating, deleting, or importing shadow copies). The requestor could be native Windows components (like NTBackup or Snapshot Manager for Hyper-V) or a third-party application like backup software.
- The VSS service is an OS component that ensures that all the other components can communicate and work with each other.
Snapshots are taken at the volume level, and VSS operates with blocks (and blocks are used by files). That is why you cannot take snapshots of files or folders. A shadow copy can be stored on the same volume or on another volume. The place on a volume allocated to storing shadow copies is called a diff area.
The System Volume Information folder is used to store VSS shadow copy files. The files have the identifiers like 3517271a-d214-3a47-c5ea-01137a4fe675.
The VSS technology can be used as a Windows native tool to save point-in-time copies (snapshots) of disk volumes. The copies allow you to revert changes and go back to the saved state of the entire volume or particular files. Nevertheless, creating a true backup to be saved on an external medium is recommended for more reliable data protection.
VSS is an incremental procedure – Windows can create multiple volume snapshots one after another. After creating the first shadow copy, VSS tracks changes on disks by dividing data into 16-KB blocks. If there are changes on the disk, the service writes this entire block to a shadow copy. As a result, there is no need to copy the full data set with every new snapshot, and only changed blocks are copied.
Main VSS requirements and limitations:
- At least 300 MB of disk space is required to create VSS snapshots.
- The maximum number of volume snapshots is 64 by default.
- 10% of volume space is allocated for shadow copies by the Windows system.
Note that VSS snapshots that are triggered by a backup application are usually deleted after the backup job is completed.
Native tools to manage VSS shadow copies
You can use Windows native tools to manage volume shadow copies. Knowing how they work with snapshots can help understand the VSS technology better and troubleshoot possible issues (for example, a temporary VSS snapshot not deleted after creating a completed backup).
Some VSS options are available in the Windows GUI, but all options are accessible in the command line, which is more interesting for us. There are two VSS tools you can use in Windows PowerShell or CMD:
- vssadmin – available on all Windows versions starting from Windows XP (including Windows 10) and except Windows 8.
- diskshadow – available only on Windows Server versions. This is an advanced implementation of vssadmin, which allows you not only to work in the interactive mode but also to create scripts.
Note that these two utilities are different and work in different contexts. A shadow copy created in vssadmin cannot be managed in diskshadow and vice versa. You can see the snapshot created with a different tool, but you cannot do any actions with it.
Volume Shadow Copy Service API is available to allow backup applications to use VSS and create application-aware backups for backup data consistency.
VMware VMs running Windows use the VMware VSS component as a VSS driver in a guest Windows operating system for quiescing. The driver is installed when you install VMware Tools. In other cases, when using old Windows versions, a SYNC driver is used. Here’s a list of supported guest operating systems for application-aware quiescing in VMware environments.
How a Shadow Copy Is Created
Once a shadow copy (volume snapshot) is created, its size is 0 bytes. When new data is written, the data is written on the disk (where it usually should be), but the old data is written to a shadow copy so you can restore that old data later. The size of the shadow copy grows as a result.
There are two main approaches to writing data when creating a snapshot:
- Redirect on write (RoW): Writing new blocks to a snapshot (shadow copy) and saving metadata with information to which blocks on the disk they must be written. This approach involves fast writing of data but slow reading of data. If you need to roll back to the initial state when the snapshot was created, it takes a few seconds by deleting a snapshot (near instantly). The RoW approach is used to create snapshots of VM virtual disks (VMDK) in VMware ESXi and VMware Workstation.
- Copy on Write (CoW): Writing new blocks to the needed place on the disk and sending the contents of rewritten blocks to a snapshot. Writing is slow, but reading is fast. Previous snapshots (previous data states) are deleted in a few seconds (near instantly). The CoW approach is used for VSS snapshots.
With so many different applications that can write data on disks, Microsoft created a unified interface with VSS to notify all applications that a snapshot is about to be created after initiating snapshot creation. The idea of the notification message is: A system will create a snapshot – stop your writing activities and flush writing buffers on disks to adopt the data into the consistent state.
The workflow to create a VSS snapshot is as follows:
- The VSS requestor checks the available services with which it can communicate, enumerates the writers, and gathers the metadata.
- After collecting a list of writers, the requestor communicates with the VSS provider and tells a snapshot which data it wants to create and where the snapshot should be located. In most cases, the snapshot is located on the same volume on the original disk. In alternative cases, SAN hardware providers can create a separate volume for a snapshot called a storage snapshot.
Preparing for backup. This step involves requesting the real status of VSS writers (after getting the metadata) and preparing for the most important operation when writers must act one after another. Writers are notified that they must prepare for snapshot creation. The system buffer is flushed, and the application is frozen to ensure that a consistent data copy can be made. Each writer must fit in its allotted time of 60 seconds by default. Microsoft Exchange Server has only 20 seconds for this operation (this time is set by Microsoft).
Note: 20 seconds is a short period. If an application cannot fit in this time, the writers return an error, and the snapshot is not taken. If the storage performance is not enough to fit in this time limit, you can try upgrading your hardware to fix this issue. For example, you can change HDD to SSD storage devices. Alternatively, you can try to migrate other workloads to another storage and use dedicated storage only for the machine running Microsoft Exchange Server. Check logs to detect which writer failed the snapshot creation task.
If everything is OK, and system activity is frozen, then VSS tells that the provider can create a shadow copy. There are only 10 seconds to create a snapshot. File system I/O requests are temporarily unavailable during this period. This is the time when you can also create a VM snapshot. Once the 10 seconds pass, all writers are unfrozen and input/output (I/O) operations are alive again.
Note: If the VSS providers take longer than 10 seconds to commit the shadow copy, the operation fails.
- VSS tells the writers that applications can unfreeze I/O requests and continue writing data to disks. If we use a backup application and a backup using the Windows VSS service has been created, the VSS snapshot can be deleted. This operation can be performed by the backup application itself. Alternatively, you can delete a volume snapshot with diskshadow or vssadmin.
If you have VSS issues, a machine reboot can fix the issues in many cases when other methods don’t work. Some issues related to time limitations (for example, 10 seconds to create a snapshot) can be fixed by upgrading hardware, including disk devices, or by reducing loads.
Try to create a snapshot manually and check whether you fit the 20 seconds allotted (using Exchange Server as an example). In case of failure, reboot, reduce the loads, and try again.
Starting from Windows Server 2012, VSS supports SMB file shares in Windows. The VSS technology is very useful with file shares as files may be continuously written by users and applications, making share backups a challenge without VSS. NAKIVO Backup & Replication 10.7 supports file share backup.
How Volume Shadow Copy Service Works in NAKIVO’s Solution
We have explored how VSS works in general. Now let’s look at how the VSS technology works when creating VMware vSphere VM backups with dedicated backup software like NAKIVO Backup & Replication.
In a nutshell, the Volume Shadow Copy Service works as follows: The requestor initiates the VSS provider. The provider redirects the writers to write data into a log file and starts creating the volume snapshot. After the requestor sends the stop signal to the provider (usually after the snapshot is ready), it begins moving data from the log file to the volume.
In this case, NAKIVO Backup & Replication becomes a VSS requestor when backing up a VMware vSphere VM using VSS.
- Before the VM backup starts, the NAKIVO solution writes data via its VSS writer.
- As the backup starts, NAKIVO Backup & Replication requests the VSS provider to start working. The writer redirects data into the log file while the volume “freezes”.
- NAKIVO Backup & Replication starts creating a snapshot on the VM layer. It might take anywhere from a few seconds to several minutes, depending on the VMFS storage load. During this time, the writer continues writing data to the log file.
- A snapshot of the VM was created successfully. NAKIVO Backup & Replication, as the requestor, sends the signal to the provider to stop working.
- The VSS Provider moves changes from the log file to the volume. NAKIVO Backup & Replication copies the data block of the VM snapshot to the backup repository.
Volume Shadow Copy Service (VSS) is a great technology to keep VM backups consistent, but it works only on Windows-based machines. To create consistent backups for Linux-based machines, you should implement special pre-freeze and post-thaw scripts. NAKIVO solution also provides automated consistent backups for Linux servers and workstations.