Using Amazon S3 and AWS EC2 as Cloud Backup Storage
Amazon provides compute and storage services in the cloud. Different AWS services can be used to store backups in the cloud and ensure that data is protected, including Amazon EC2 and Amazon S3. However, these services differ in how they work and which backup data they can store and how.
This blog post explains how EBS volumes attached to Amazon EC2 instances and Amazon S3 can be used to store backup data.
Note: This post is not a full comparison of EC2 and S3, as EC2 is a cloud computing platform used for running workloads in the cloud. Instead, it focuses on Elastic Block Storage (EBS) used for storing EC2 data and compares it with S3.
Why Use AWS for Data Backup
According to the 3-2-1 backup rule, you should maintain at least three copies of critical data, one of which should be stored offsite. Cloud storage, including Amazon S3 and EBS volumes in Amazon EC2 instances can be used as offsite storage for backup data. But how and when to use them? Let’s start by going over what the two AWS services offer.
- Amazon EC2 (Elastic Cloud Computing) instances are virtual machines running in the AWS cloud on AWS servers and in AWS data centers. The container that stores instance data is called an EBS (Elastic Block Storage) volume, which is the equivalent of a virtual disk.
EBS volumes are classified as block storage as the basic unit of data storage is a block. EBS volumes are connected to instances and store the primary data of these instances. You can choose between hard disk drives (HDD) and solid-state drives (SSD) for EBS volumes.
- Amazon S3 (Simple Storage Service) is the AWS object storage service. The container used to store data is called a bucket. Data is stored in buckets as objects and not as blocks. Object-based storage allows the use of versioning, which is a useful feature for data recovery.
Using Amazon EC2 for Recovery of Workloads
Amazon EC2 instances can be used for cloud recovery of on-premises virtual machines when a local production data center becomes unavailable. To be prepared for such a scenario, you should migrate local workloads to the cloud using backups or replicas as part of a disaster recovery plan.
In a disaster scenario, replication to EC2 is preferable to backups. Resuming the operation of applications and servers using replicas requires less time than using backups (that is, lower RTO). For example, you can replicate databases from on-premises servers or VMs to EC2 instances and perform almost instant failover when you need to recover from a disaster.
EBS volumes used by EC2 instances provide high-speed block storage. Below are some advantages and disadvantages of using EBS volumes for backup and restore.
- Advantages: EBS can be used for high-performance workloads. Data can be copied with high speed from one EBS volume to another. You can also achieve a better database performance with EBS volumes of EC2 instances than with Amazon S3 storage.
If you need to transfer data from your local (on-premises) servers, the internet connection speed is a bottleneck, and using Amazon S3 may be more rational in this case.
- Disadvantages: With the exception of using EBS volumes for backing up other EBS volumes, EBS is too expensive as a go-to backup destination, especially for workloads other than EC2 instances.
EC2 Instance Data Protection in AWS
Amazon EC2 instances and the EBS volumes connected to them are not backed up automatically in AWS. Data is only replicated across servers within the same availability zone in Amazon data centers for data center redundancy in case of disk or server failure (hardware failure).
Although backup functionality is not available, AWS provides three native methods to protect data in EC2 instances:
- Taking EBS snapshots. This is an easy way to create recovery points for EBS volumes (including for encrypted EBS volumes). Snapshots can be used to restore data to new EBS volumes. When using logical volume managers such as LVM or mdadm, consider performing backup on the volume manager layer instead of using EBS snapshots to preserve data consistency and coherency of subcomponent volumes. Note that when you restore EBS volumes by using snapshots, these EBS volumes must be mounted to a prepared EC2 instance.
- Creating an Amazon Machine Image (AMI). AMI is an image that contains the operating system, all configuration settings, and the data needed for running the EC2 instance. Using an AMI, you can create a new EC2 instance based on this AMI. This approach is used to recover EC2 instances and to clone them. Note that you should first stop a running EC2 instance and then create a new AMI.
The advantage of this method over using EBS volume snapshots is that the entire EC2 instance is restored and not just EBS volumes (which should be manually mounted to a created EC2 instance after the restore process). Configuring AMI images to be used for recovery takes more time but works well in terms of recovery and scalability.
- Copying EC2 Instance to S3. Copying data stored on EBS volumes to Amazon S3 buckets is an alternative method to protect the data in EC2 instances. As Amazon S3 is object-based storage and Amazon EBS is block storage, FUSE (filesystem in userspace) is required to read/write files in/to S3 buckets. FUSE can be installed on the operating system running on an EC2 instance, virtual machine, and physical computer to access Amazon S3 buckets.
However, when it comes to data consistency, copying files that are in use by applications (such as databases) to S3 may cause data corruption.
Using Amazon S3 for Data Recovery
Amazon S3 provides a versioning feature for objects stored in buckets. By default, versioning is disabled in Amazon S3, but you can easily enable this feature. When versioning is enabled, previous object versions are preserved after writing changes to them. Changes are saved as a new version of the object, and deleted objects are not permanently removed.
Using Amazon S3 for storing versions has its advantages and disadvantages.
- Advantages: Amazon S3 provides different storage classes at different price points depending on how frequently data needs to be accessed and retrieval times. Amazon also provides a flexible pricing policy for using Amazon S3 storage, which makes S3 affordable for many users. Read more about Amazon S3 to learn how AWS S3 works.
Amazon S3 also supports object lock to provide storage immutability and protect objects against undesired changes or deletion. This storage configuration is also called write once, read many (WORM).
- Disadvantages: Special tools are needed if you don’t want to copy files manually by using the web interface of AWS. In addition, AWS charges egress fees for data transfers from Amazon S3 (these fees depend on the amount of data transferred from AWS and the storage tier used).
Amazon S3 data protection use cases
There are a lot of use cases for using Amazon S3 as a backup destination. Consider choosing this storage type for:
- Copies of data stored on EBS volumes. As mentioned above, you can copy EBS volumes used by EC2 instances to Amazon S3 storage.
- Copying data between S3 buckets.
- Backups of data stored on physical computers and virtual machines running on-premises.
How to Protect Data in Amazon S3
Different AWS tools can be used to protect data in Amazon S3 against loss. Enable and configure object versioning to have different versions of objects stored in S3 buckets for recovery and review the available tools below:
Command line tools. Copy objects from one S3 bucket to another using AWS SDK or other tools. Other popular command line tools for Linux and Windows such as s3cmd, s4cmd and AWS CLI can also be used. Install one of these tools to transfer data to and from S3 buckets. You can also use these command line tools to copy data between buckets for recovery purposes.
Using scripts for backup to S3 is a common approach but requires a lot of effort. This approach is applicable for data backup from S3 buckets, EC2 instances, and physical and virtual machines.
- Advantages CLI tools and scripts are available for free.
- Disadvantages Configuring data copying using scripts is complicated. In addition, before any copy process, you should stop running applications and using features such as volume snapshots inside operating systems to preserve data consistency.
- AWS Storage Gateway. You can use AWS Storage Gateway to transfer data from on-premises physical and virtual machines to Amazon S3 buckets.
AWS Storage Gateway is a hybrid storage service that is deployed as a VM and provides caching options for faster access to files. There are three types of AWS Storage Gateway: a file gateway, volume gateway, and tape gateway.
After deploying the storage gateway, standard sharing protocols, such as SMB, NFS, and iSCSI, can be used to access Amazon S3 storage. AWS Storage Gateway is provided as a virtual appliance for VMware vSphere and Hyper-V platforms and can be downloaded for free if you have a subscription plan to use Amazon S3.
Solutions for Direct Data Backup to AWS
A more efficient and reliable way of protecting your data in AWS is deploying a third-party data protection solution that offers integration with AWS, such as NAKIVO Backup & Replication. The NAKIVO solution is a universal data protection solution that supports:
- Amazon EC2 backup. Consistent backup and recovery of EC2 instances (to EBS and S3). You don’t need to create and configure new EC2 instances and mount recovered EBS volumes manually. You can start the recovery of files and application objects as soon as you need them.
- Amazon EC2 replication. Replicate important EC2 instances in AWS and use EC2 replicas according to your data recovery scenarios and disaster recovery plans providing a high RTO.
- Backup to Amazon S3. Backup Microsoft Hyper-V and VMware vSphere VMs, physical Windows and Linux machines, and EC2 instances to Amazon S3 buckets. Direct backup to Amazon S3 buckets is supported without the need for deploying AWS Storage Gateway. A special Amazon S3 backup repository is created in an S3 bucket.
- Backup to Amazon EC2. You can create a backup repository on an EC2 instance, configure a network connection between your data center and the network used by your EC2 instances, and back up data to the EC2 instance.
NAKIVO Backup & Replication provides a set of useful features that make configuring backups to AWS faster, more convenient, and reliable. These features include:
- Amazon S3 storage with immutability support can be used as a backup destination to protect against unwanted data changes, whether accidental or caused by ransomware.
- Site Recovery allows you to automate and orchestrate disaster recovery workflows for EC2 instances and other virtual environments depending on defined conditions and actions. Complex disaster recovery scenarios can be easily implemented with the Site Recovery feature.
- Job scheduling. Backup jobs can be scheduled to run automatically. Flexible retention settings allow you to preserve multiple recovery points for different recovery cases.
- Application-aware backup is important for data consistency. NAKIVO Backup & Replication uses features such as VSS (Volume Shadow Copy) on Windows-based machines to preserve data consistency when applications (for example, a database server, Active Directory server, etc.) perform write operations to files.