What Is a Snapshot in AWS? AWS EBS Snapshot Explained
Amazon Elastic Block Store (Amazon EBS) is a service that provides persistent block-level storage for Amazon Elastic Compute Cloud (Amazon EC2) instances. Simply speaking, this AWS service allocates reliable hard drives (that is, volumes) to cloud-based servers. One of the very useful features of Amazon EBS is volume snapshots.
According to the Amazon Knowledge Base, AWS EBS snapshots are point-in-time copies of EBS volumes. But are these copies equivalent to backups? Let’s find out how EBS snapshots work and what they can and can’t do.
What Is an EBS Snapshot?
An EBS snapshot is a point-in-time copy of an Amazon EBS volume, which is lazily copied to Amazon Simple Storage Service (Amazon S3). EBS snapshots are incremental copies of data, meaning that only unique blocks of EBS volume data that have changed since the last EBS snapshot was taken are stored in the following EBS snapshot.
Below is how incremental copies of data are created in an Amazon AWS EBS snapshot.
EBS snapshots are chained together, and you can use these snapshots to restore EBS volumes when needed. Each AWS snapshot contains the information needed to restore data at the moment of creating the EBS snapshot.
When you delete an EBS snapshot in a chain, you are removing only the data related to that specific snapshot. Blocks that were not included in the following snapshot because they were not changed are moved forward to the next snapshot, while unneeded blocks that are not referenced in later snapshots are discarded. Therefore, you can safely delete any snapshot with no impact on previous or subsequent snapshots and their validity for restoration.
According to Amazon, EBS snapshots are block-level copies of EBS volumes. Thus, EBS snapshots don’t “know” what the volume actually contains – filesystems, partitions, or software. AWS reads each block, determines if there is data on this block, and if so, includes it in the snapshot. You can safely write data to an EBS volume after snapshot creation has started, and this will not affect the EBS snapshot. Any data written to the volume after the snapshot creation starts is included in the EBS snapshot, even if the snapshot remains in a “pending” state.
Limitations of Amazon EBS Snapshots
As an incremental copy of data, EBS snapshots have a number of limitations especially when compared with backups. Backups are also point-in-time copies of EC2 instances, but they differ in major aspects:
- Backups are independent from source-workloads and provide greater flexibility for access and recovery.
- Application and database data in backups is consistent, speeding up recovery.
- You can automate backup copy creation to different targets, including onsite, offsite, a public cloud, tape, etc.
- Backup solutions offer several performance and resource optimization features like data deduplication, compression, network acceleration, etc.
- When using a dedicated backup solution, you can automate workflows with scheduling and chaining options.
- Granular recovery of files and application objects is straightforward and fast.
Let’s now look in more detail at the limitations of AWS snapshots compared to backups.
1. Snapshots are not application-aware
Different applications and databases write their data to the local storage filesystem. However, they cache their most used data to the RAM to achieve the best performance possible. Applications and databases “decide” when to write data from the RAM cache to the local disk (in case of AWS, this is mostly EBS volumes). It may happen that data leaves your application/database cache and is not written to the EBS volume at all. That is why you should flush your application/database cache to the EBS volume before creating an EBS snapshot. This pushes the data from the RAM cache to the actual EBS volume.
According to Amazon AWS recommendations, to create a proper application-aware snapshot you should:
- Stop an EC2 instance before creating an EBS snapshot; and
- Unmount and detach the EBS volume from the EC2 instance.
The EBS snapshot process does not perform flushing or locking operations automatically, and you should do this manually prior to creating the snapshot.
2. No on-premises data copy
Copying data to create backups to a different type of medium helps you avoid a single point of failure and ensure successful recovery in case of a disruption or an incident. Creating several copies of data on different media and in different locations helps prevent complete data loss and minimize downtime.
However, as mentioned before, EBS snapshots are stored in Amazon S3, and you cannot access them directly. Consequently, you are unable to copy EBS backups outside of AWS.
3. Lack of integrated EBS snapshot data deduplication
EBS snapshots can consume a large amount of cloud storage space, leading to very high costs for this space. Deduplication could resolve this issue, but there is no deduplication solution integrated with AWS.
4. Scheduling and retention issues
Scheduling is an essential part of creating backups. To keep your snapshots up to date, you need to create a specific schedule for snapshotting. However, you have to keep in mind that the more snapshots you make, the more space will be consumed. You need a retention plan for keeping only the set of actual snapshots for a specific period by rotating and discarding expired snapshots.
You can create a simple schedule by running Amazon AWS CloudWatch Events rules based on the required schedule and set a rule to automate EBS snapshot creation. To make your schedule more complex and flexible, in addition to AWS CloudWatch, you can use a Cron expression to ensure that the snapshot will be created at a specific time.
Note: Cron is a time-based scheduler. In AWS, you can use Cron-specific expressions to perform actions based on AWS events.
According to AWS documentation, Amazon CloudWatch is a monitoring service for cloud resources and the applications running in the AWS cloud. You can use CloudWatch events to trigger the creation of the EBS snapshot based on a schedule. However, this scheduling method is a workaround rather than a proper feature present in third-party backup software that allows you to schedule backups with a few clicks.
Restoring AWS Instance from EBS Snapshot and AWS Granular Recovery
As was mentioned above, EBS snapshots are block-level snapshots and don’t “care” about the type of data they store. This is why accessing individual files directly from an EBS snapshot is not possible. When you restore data from an EBS snapshot, a new EBS volume is created. That new EBS volume is an exact copy of the original EBS volume at the time when the snapshot was created.
According to Amazon documentation, EBS snapshots are stored in Amazon S3, but you will not find your snapshots in any S3 storage available to you. Consequently, you cannot perform granular recovery directly from EBS snapshots.
With the new EBS volume restored from an EBS snapshot, you can do everything that Amazon allows you to do with EBS volumes. For example, you can attach the new EBS volume to an existing EC2 instance and mount the filesystem (if the EBS snapshot is a copy of the entire EC2 instance). After recovery, you can easily sign in to it. Also, if the EBS snapshot was created from the root EBS volume, you will be able to mount your new EBS volume after recovery as a root volume to a compatible EC2 instance and then start the EC2 instance with the new root volume.
Learn about the differences between AWS EBS snapshot and backup.
As we found out, an EBS snapshot is a kind of an incremental copy of data. Some limitations of AWS snapshots are that they are not flexible to use and pretty costly for data protection. You can use a dedicated AWS EC2 backup solution such as NAKIVO Backup & Replication, which provides flexible support for the AWS platform with application-aware backup, compression and deduplication, backup data tiering and instant granular recovery of files and application objects.