VM backup deduplication is a method of reducing the amount of storage space needed to save VM backups. In most organizations, VMs contain many duplicate copies of data, such as VMs deployed from the same template, VMs with the same OS, or VMs that have some (semi) identical files (database entries, etc.). With block-level data deduplication, only unique data blocks are saved to a backup repository, while duplicate data blocks (ones that are already available in the backup repository) are discarded, and a reference to existing data blocks is made.
NAKIVO Backup & Replication provides built-in data deduplication for VMware VM backups. This feature is available out of the box, enabled by default, and does not require any setup or configuration.
The product automatically deduplicates VM backups at the block level and ensures that only unique data is saved in the backup repository. All VM backups are deduplicated across entire backup repository – regardless of how many backup jobs you have, the product checks each new data block for duplicates across entire backup repository.
While this feature is turned on by default, this feature can be disabled if, for example, you want to use a hardware-based data deduplication device such as an EMC Data Domain.
VM backup deduplication can provide 10X to 30X reduction in storage capacity requirements. For example, you have 10 VMs running Windows 2008 server, and each VM OS occupies 10 GB. While the total amount of data is 100 GB, only one copy of OS data (10 GB) will be written to a backup repository with data deduplication, which provides 10 to 1 storage space savings.
This feature provides other benefits too. More efficient disk space utilization also allows storing more recovery points per VM backup. In addition, lower storage space requirements save money on direct storage costs (as fewer disks are needed) and also on related costs (such as cooling, electricity, maintenance, etc.).
NAKIVO Backup & Replication reads data from source VMs in 4 MB blocks. A Transporter compresses the data blocks (if compression is enabled) and writes them to the Backup Repository. Then, a quick hash is calculated for each new data block to determine whether the same block is already available in the backup repository. If the hash of the new data block matches the one of an existing data block, then the Transporter compares the two blocks byte by byte to ensure that they are 100% identical. If the blocks are identical, then the new data block is discarded and a reference to an existing one is made. This way VM backups are deduplicated across entire Backup Repository.