Infrastructure Automaton: Automating EBS Snapshot Deletion with AWS Lambda.

You are currently viewing Infrastructure Automaton: Automating EBS Snapshot Deletion with AWS Lambda.

Managing cloud costs effectively is crucial for any organization using AWS cloud platform. One effective way to cut costs is by identifying and removing unused resources.

In this project, I’ll show you how to save on storage expenses by detecting and deleting stale EBS (Elastic Block Store) snapshots.

What are EBS Snapshots?

EBS snapshots are backups of your EBS volumes and can also be used to create new EBS volumes or Amazon Machine Images (AMIs). However, they can become orphaned when instances are terminated or volumes are deleted. These unused snapshots take up space and incur unnecessary costs.

We’ll create a Lambda function that automatically finds and deletes these stale snapshots, helping you manage your AWS expenses more efficiently.

AWS Lambda

Creating an AWS Lambda function allows you to execute code in response to events without provisioning or managing servers.

In our project, we aim to automate the identification and deletion of stale EBS snapshots using an AWS Lambda function.

AWS Lambda creation.

Here’s how you can create this Lambda function using the AWS Management Console:

Log in to the AWS Management console and open the AWS Lambda dashboard by typing lambda in the search bar then select Lambda under services.

In the Lambda dashboard, click on Create Function.

For the creation option, select the radio button for Author from scratch, which will create a new Lambda function from scratch.

Next, configure the basic information by giving your Lambda function a meaningful name; let’s call it cost-optimization. Then, select the runtime environment. Since we are using Python, choose Python 3.12.

These are the only settings required to create your Lambda function. Click on Create Lambda Function.

Our function has been successfully created.

By default, the Lambda timeout is set to 3 seconds, which is the maximum amount of time the function can run before being terminated. We will adjust this timeout to 10 seconds.

To make this adjustment, navigate to the Configuration tab, then click on General configuration. From there, locate and click the Edit button.

In the edit basic settings dashboard, name your basic settings then scroll down.

Under the Timeout section, adjust the value to 10 seconds, then click Save. Note: It’s good practice to keep the execution time as short as possible, as AWS charges based on this parameter.

Writing the Lambda Function

import boto3

def lambda_handler(event, context):
ec2 = boto3.client('ec2')

# Get all EBS snapshots
response = ec2.describe_snapshots(OwnerIds=['self'])

# Get all active EC2 instance IDs
instances_response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
active_instance_ids = set()

for reservation in instances_response['Reservations']:
for instance in reservation['Instances']:
active_instance_ids.add(instance['InstanceId'])

# Iterate through each snapshot and delete if it's not attached to any volume or the volume is not attached to a running instance
for snapshot in response['Snapshots']:
snapshot_id = snapshot['SnapshotId']
volume_id = snapshot.get('VolumeId')

if not volume_id:
# Delete the snapshot if it's not attached to any volume
ec2.delete_snapshot(SnapshotId=snapshot_id)
print(f"Deleted EBS snapshot {snapshot_id} as it was not attached to any volume.")
else:
# Check if the volume still exists
try:
volume_response = ec2.describe_volumes(VolumeIds=[volume_id])
if not volume_response['Volumes'][0]['Attachments']:
ec2.delete_snapshot(SnapshotId=snapshot_id)
print(f"Deleted EBS snapshot {snapshot_id} as it was taken from a volume not attached to any running instance.")
except ec2.exceptions.ClientError as e:
if e.response['Error']['Code'] == 'InvalidVolume.NotFound':
# The volume associated with the snapshot is not found (it might have been deleted)
ec2.delete_snapshot(SnapshotId=snapshot_id)
print(f"Deleted EBS snapshot {snapshot_id} as its associated volume was not found.")

Our Lambda function, powered by Boto3, automates the identification and deletion of stale EBS snapshots. Key features include:

Summary of the Workflow:

  1. The script retrieves all EBS snapshots owned by the AWS account.
  2. It fetches details of all running EC2 instances.
  3. For each snapshot:
    • If the snapshot isn’t associated with a volume, it deletes the snapshot.
    • If the snapshot is tied to a volume, it checks if the volume exists and is attached to a running instance.
    • If the volume is either non-existent or not attached to any running instance, the snapshot is deleted.

This code is useful for cleaning up unused or “orphaned” EBS snapshots that are no longer needed.

It is pivotal in our AWS cost optimization strategy, showcasing the effectiveness of serverless computing for streamlining operations.

Navigate to the code section then paste in this code.

after pasting the code click on test.

In the test dashboard, fill in event name, you can save or just click on test.

Our test execution is successful

If you expand the view to check the execution details, you should see a status code of 200, indicating that the function executed successfully.

You can also view the log streams to debug any errors that may arise, allowing you to troubleshoot effectively if needed.

IAM Role

In our project, the Lambda function is central to optimizing AWS costs by identifying and deleting stale EBS snapshots. To accomplish this, it requires specific permissions, including the ability to describe and delete snapshots, as well as to describe volumes and instances.

Roles are used to securely delegate access to AWS resources without the need to share long-term credentials like access keys. To ensure our Lambda function has the necessary permissions to interact with other AWS resources, follow these steps: In the Lambda function details page, click on the Configuration tab, scroll down to the Permissions section and expand it then click on the execution role link to open the IAM role configuration in a new tab.

In the new tab that opens, you’ll be directed to the IAM Console with the details of the IAM role associated with your Lambda function. Scroll down to the Permissions section of the IAM role details page, and then click on the Add inline policy button to create a new inline policy.

Choose EC2 as the service to filter permissions. Then, search for “Snapshot” and add the following options: DescribeSnapshots and DeleteSnapshots.

Also add these permissions as well Describe Instances and Describe Volume.

Under the Resources section, select “All” to apply the permissions broadly. Then, click the “Next” button to proceed.

Give the name of the policy then click the “Create Policy” button.

Our policy has been successfully created.

Under permissions we can see it there.

If you navigate to trust relationship, we can see that the principal AWS lambda and action is security token service assume role.

Roles are used to delegate access to AWS resources securely, without the need to share long-term credentials like access keys.

Once we update our Lambda function’s permissions, click ‘Deploy.’ After the deployment is complete, the Lambda function will be ready for invocation. You can invoke it directly via the AWS CLI using an API call, or indirectly through other AWS services.

Our function has been successfully updated.

Create an EC2 Instance

This instance will serve as a resource to interact with during the setup of our Lambda function.

Note that this step can also be automated using Terraform, but for the sake of simplicity, we will do it manually here.

Here is a detailed step-by-step guide to creating an EC2 instance:

Log into AWS Management Console at AWS Management Console.

In the search bar, type EC2 then select EC2 under services.

In the EC2 dashboard, click on instances then click Create instance.

In the launch instance dashboard, under name, provide a suitable name for your instance.

Under application and OS images, select the QuickStart tab then select an appropriate AMI for your needs. For this example, you can choose Ubuntu.

Select an instance type. For most use cases, the t2. Micro instance type is sufficient and is free tier eligible.

Select an existing key pair or create a new key pair to securely access your instance. For this case, I already have a key pair.

Configure Instance Details:

Optionally, configure instance details such as network settings, subnets, and this can be done under the bellow networking tab, but we will leave this as default for now.

Configure Security Group

Create a new security group or select an existing one, for this project I will move with anew security group. So, click the radio button on create new security group,

I will have rule for SSH port 22 port 80 for HTTP and port 443 for HTTPS, remember as a best practice from security point of view, only open port 22 to your specific IP address.

Add Storage:

The default storage settings should be sufficient, so we will move with the default storage settings. But you can adjust the size if needed.

Review and Launch:

Review your instance configuration under summery then click Launch to proceed.

Verify the Volume

Navigate to the created EC2 Instance under the storage section.

This will open the details page for the volume, where you can see information like size, state, and type.

Notice that this volume was automatically created during the instance setup process and is used as the root volume for your EC2 instance.

Creating Snapshots of the root Volume

Currently we don’t have any snapshots available for our instance.

To confirm this, navigate to the EC2 Dashboard. For snapshots, we can we have zero.

To create a snapshot of the instance, click on the Create Snapshot button.

For resource type, select volume. Choose the EBS volume for which you want to create a snapshot from the dropdown menu.

Optionally, add a description for the snapshot to provide more context.

Double-check the details you’ve entered to ensure accuracy.

Once you’re satisfied, click on the Create Snapshot button to initiate the snapshot creation process.

Our snapshot has been successfully created.

Taking a look at the EC2 dashboard, we can see w have one volume and one snapshot.

Testing the Lambda Function

To simulate a real-world scenario, delete the existing EC2 instance.

When an EC2 instance is deleted, AWS automatically removes the attached EBS volume.

However, any EBS snapshots associated with that volume remain in storage, even though they are no longer needed.

These snapshots, termed orphaned or stale, incur additional storage costs without serving any purpose.

Therefore, it’s crucial to regularly identify and remove such stale snapshots to optimize AWS storage costs effectively.

Once the instance is deleted, by observing the EC2 dashboard, we can see that the volume is gone but the snapshot still exists. We can go ahead and delete the snapshot manually since it’s only one, but in the case of large enterprises, manually deleting snapshots would be a daunting task. Additionally, due to human error, it’s likely that some snapshots would be forgotten, leading to higher costs. This is where the power of AWS Lambda comes in.

We will use our previously created Lambda function, which we want to run on a schedule, either daily or weekly. To achieve this, we will leverage AWS Event Bridge Scheduler to create a schedule that triggers our Lambda function. This will automate the process of identifying and deleting stale snapshots on a regular basis, helping to optimize storage costs and reduce manual effort.

To proceed in the search bar type event bridge then select Amazon EventBridge Schedular.

Scroll down and click create schedule.

Enter your schedule name and an optional description. Each schedule must be placed in a schedule group. By default, schedules are placed in the “default” group. While you have the option to create a custom schedule group, for now, let’s proceed with the default group.

For the schedule pattern, in a real production environment with recurring tasks, we would normally choose a recurring schedule. However, for this demo, we’ll select a one-time schedule. Set the date and time for the target invocation, and repeat the process for the flexible time window. Once done, click Next to proceed.

Under target API, look for Lambda function then select it.

Under the Invoke AWS Lambda section, click the drop-down menu and select the Lambda function you previously created. Then, scroll down and click Next to continue.

For the Action After Schedule Completion section, if you choose DELETE, EventBridge Scheduler will automatically delete the schedule once it completes its invocation and no future invocations are planned. For this demo, open the drop-down menu and select None.

Keep the retry policy section set to its default settings.

Leave encryption as default then scroll down.

In the set permision Excecution Role section, select Create a new role for this schedule. You can either provide a custom role name or proceed with the default name. Once done, click Next to continue.

Review all your settings carefully, and once everything looks correct, click Create Schedule to finalize the process.

While the schedule is being created and set to invoke our Lambda function after the specified time elapses, navigate to the EC2 dashboard. Once again, check the number of snapshots before the Lambda function is triggered by the schedule. As we can see, there is currently one snapshot.

Our main objective was to demonste how to remove stale snapshots using a Lambda function and EventBridge Scheduler. After the scheduled time has elapsed, revisit the EC2 dashboard to check the number of snapshots. This time, you’ll notice there are zero snapshots, indicating that the Lambda function was successfully invoked and deleted the stale snapshot. This confirms that our objective was successfully achieved, automating snapshot cleanup and optimizing storage management.

In conclusion, this demo effectively showcased how to automate infrastructure management by using a Lambda function and EventBridge Scheduler to delete stale snapshots, thereby maximizing cost efficiency.