October 6, 2017
AWS EBS Snapshot Explained
Amazon Elastic Block Store (Amazon EBS) is a service that provides persistent block-level storage for Amazon Elastic Compute Cloud (Amazon EC2) instances. Simply speaking, the service allocates reliable hard drives (aka volumes) to cloud servers. One of the very useful features of Amazon EBS is creating snapshots of EBS volumes. According to Amazon Knowledge Base, AWS EBS snapshots are backups of EBS volumes. But are they indeed backups? Let’s find out how EBS snapshots work, and what they can do.
What is EBS Snapshot?
An EBS snapshot is a point-in-time copy of your Amazon EBS volume, which is lazily copied to Amazon Simple Storage Service (Amazon S3). EBS snapshots are incremental copies of data. This means that only unique blocks of EBS volume data that have changed since the last EBS snapshot are stored in the next EBS snapshot. This is how incremental copies of data are created in Amazon AWS EBS Snapshot.
Each AWS snapshot contains all the information needed to restore your data starting from the moment of creating the EBS snapshot. EBS snapshots are chained together. By using them, you will be able to properly restore your EBS volumes, when needed.
Deletion of an EBS snapshot is a process of removing only the data related to that specific snapshot. Therefore, you can safely delete any old snapshots with no harm. If you delete an old snapshot, AWS will consolidate the snapshot data: all valid data will be moved forward to the next snapshot and all invalid data will be discarded.
Amazon states that EBS snapshots are block-level copies of EBS volumes. Thus, EBS volumes and EBS snapshots don’t “know” what the volume actually contains: filesystems, partitions, or software. AWS reads each block, determines, if there is any data on this block, and if so, includes it into the snapshot. You can safely write data to your EBS volume after the snapshot creation has started, and this will not affect the EBS snapshot. Any data written to the volume after the snapshot creation started is included in the EBS snapshot, even if the snapshot remains in a “pending” state.
As an incremental copy of data, EBS snapshot has a number of limitations. Let’s consider them in detail.
Limitations of Amazon EBS Snapshots
Various applications and databases write their data to the local storage filesystem, but they cache their most used data to the RAM to reach the best performance possible. Applications and databases “decide” when to write data from the RAM cache to the local disk (in case of AWS, this is mostly the EBS volume). It may happen that data leaves your application/database cache and is not written to the EBS volume at all. That is why you should flush your application/database cache before creating an EBS snapshot. This pushes application/database data from the RAM cache to the actual EBS volume.
According to Amazon AWS recommendations, to perform correct application-aware snapshot you should follow the simple rules below:
- Stop EC2 instance before creating an EBS snapshot;
- Unmount and detach the EBS volume from the EC2 instance.
EBS snapshot does not perform flushing or locking operations by itself, and you should do this manually, prior to creating the snapshot. Stopping an instance and unmounting volumes to make a consistent backup is like starting a car with a hand crank each time after stopping at a traffic light.
2. No on-prem data copy
Copying backups to a different type of media, for example, to your local backup storage, is one of the most important techniques, which prevents complete data loss and increases the chances to avoid long-term infrastructure downtime.
However, as mentioned before, EBS snapshots are stored in Amazon S3, and you cannot access them directly. Consequently, you are unable to copy EBS backups outside AWS.
3. Lack of integrated EBS snapshot deduplication
EBS snapshots can consume a very large amount of the cloud storage space, and you will have to pay for this space. Deduplication could resolve this issue. However, there is no deduplication solution integrated with AWS.
4. Scheduling and Retention
Scheduling is one of the most important things in creating backups. To keep your snapshots up to date, you need to create a specific schedule for snapshotting. However, you have to keep in mind that the more snapshots you make, the more space will be consumed. You need a retention plan for keeping only the set of actual snapshots for some period. You can create a simple schedule by running Amazon AWS CloudWatch Events rules according to your schedule and set the rule that will automate the EBS snapshot creation. To make your schedule more complex and flexible, in addition to AWS CloudWatch, you can use a Cron expression to ensure that the snapshot will be created at a specific time.
NOTE: Cron is a time-based scheduler. In AWS you can use Cron-specific expressions to perform actions based on AWS events.
According to the AWS documentation, Amazon CloudWatch is a monitoring service for cloud resources and the applications running in the AWS cloud. You can use CloudWatch events to create reaction, in particular, creation of the EBS snapshot according to the schedule.
However, the aforementioned snapshot scheduling scheme is a workaround, rather than a built-in feature which is present in third-party software and allows you to perform scheduling in a few clicks.
Restoring AWS Instance from EBS Snapshot and AWS Granular Recovery
As was mentioned above, EBS snapshots are block-level snapshots and don’t “care” about the type of data they store. This makes it impossible to access individual files directly from an EBS snapshot. Restoring data from an EBS snapshot creates a new EBS volume. That new EBS volume will be an exact point-in-time copy of the original EBS volume, based on when the EBS snapshot has been created.
According to Amazon documentation, EBS snapshots are stored in Amazon S3, but you will not find your snapshots in any S3 storage available to you. AWS uses such structure to store the EBS snapshots, but does not give you access to that special S3 storage. Consequently, you cannot perform granular recovery directly from EBS snapshots.
With the new EBS volume restored from an EBS snapshot, you can do everything that Amazon allows you to do with EBS volumes.
For example, you can attach the new EBS volume to the existing EC2 instance and mount the filesystem, if you created an EBS snapshot of the entire EC2 instance. After recovery, you can easily sign in to it. Also, if you created an EBS snapshot from the root EBS volume, after recovery you will be able to mount your new EBS volume as a root volume to a compatible EC2 instance and then start the EC2 instance with the new root volume.
As we found out, an EBS snapshot is a kind of an incremental copy of data. Now you know which limitations in data protection AWS snapshots have. Amazon EBS snapshot are difficult to use and pretty costly for data protection.
In one of our next articles, we will describe another method to perform backups – Amazon EC2 Instance Backup with NAKIVO Backup & Replication. So, stay tuned for our Blog updates!