June 28, 2017
Differential Backup vs. Incremental Backup
In earlier posts, we have covered how the incremental backup works. Now we will focus on how the differential backup works and compare it with the incremental one.
Differential backup copies the differences in a VM made since a last full backup. This method is in between a full backup and a traditional incremental (not forever-incremental) one regarding backup and recovery speed, and the storage space it requires.
How Does Differential Backup Work?
Let’s recap how it works. We will use the same test setup as we did in the incremental backup overview: 3 files on a VM, each one contains blocks 1 through 4, and we already have a full backup of the VM, made on Sunday.
On Monday we have changed 1 to 5 in File 1. With the differential backup, a backup application copies the changed block of File 1 and informs the backup repository where it should be placed, likewise during the incremental backup.
On Tuesday, we have added blocks 6 and 7 to File 2. Along with this change, the block of File 1 changed on Monday will be copied.
On Wednesday, we have deleted File 3. During the backup all changes will be copied: the change in File 1, the additional two blocks in File 2, and the information that File 3 was deleted.
Differential Backup vs Incremental
Let’s compare the differential backup with the incremental by three parameters: backup speed, recovery speed, and the size required in the backup repository.
During the first backup, the time needed to complete the job is similar, as the first differential backup has only one change to copy. However, with time the differences become bigger, so more time is needed to complete the job, while the incremental backup will copy only changes made since the previous job run.
When a time for recovery comes, the differential backup may seem to be a winner because it requires only two operations: restoring the initial backup and applying a difference to it while the (legacy) incremental backup needs to rebuild all the increments. With the same amount of data, it requires more resource to put it in the right places. However, if the incremental backup is bundled with synthetic data storage, the backup application knows which blocks of data should be used to restore a VM. Thus, the recovery time is close to that if the VM would have been restored from the full backup.
However, the biggest drawback of the differential backup is the size it requires. It grows in time exponentially, so very soon it becomes more reliable just to perform one more full backup rather than continue making differential backups.
Here is the graph illustrating how drastically differential backup occupies space. The model is a 2 TB VM, with daily changes of 5% of its size (approximately 100 GB per day). In just a week the backup size will be twice as big as the source VM. At the same time, the forever-incremental backup will reach this point only in three weeks.
This leads to the fact that differential backup requires a periodical full backup, as it can occupy the whole backup repository in a matter of days. On some busy day, for example, when some major OS or software update arrives, a differential backup may just fail because there is not enough space. So, the incremental backup is a winner in all three nominations.