https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/TakeScheduledSnapshot.html
The problem i found however was that the Snapshots created were without description and it provided no means for clean up.
A better approach will be to use lambda function to make API calls to take EBS of these volumes and then another Lambda function to clean up older snapshots. This is what i ended up doing.
In Summary, this is what we are trying to achieve
- Filter information about the volumes we want to take snapshots of
- Take the snapshots of these volumes
- Label and tag these snapshots with names of their parent volume
- Delete Snapshots that past the specified retention period
- Write logs to cloud watch
IAM Role Assignment
First we need to create a IAM Role that has permission to do the following
- retrieve information about EC2 instances
- take snapshots of these EC2 instance
- Create Tags for the snapshots
- delete snapshots
- Make log entries to CloudWatch
In your AWS Management console, select IAM > Roles > Create Role
Under "AWS service"; select Lambda. Click on "Next: Permissions".
Skip the Next Page, we will create a policy later and attach to this Role.
Name you roles, and also a quick description.

In your AWS Management console, select IAM > Policies > Create policy
This policy would include the tasks listed in the IAM role assignment; paste the JSON below in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:*" ], "Resource": "arn:aws:logs:*:*:*" }, { "Effect": "Allow", "Action": "ec2:Describe*", "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:CreateTags", "ec2:ModifySnapshotAttribute", "ec2:ResetSnapshotAttribute" ], "Resource": [ "*" ] } ] }
select the policy you just created and attach it to the role you created earlier

Now that we have our permissions in place, we will go create out Lambda function
Select Lambda in the list of AWS services , click on "Create a Function"
Type in the name of your function, for runtime; select python 2.7.
for Role, select existing role and choose the IAM role you created earlier from the list
Click on create on function and the next screen takes you to the functions main page.
We will be using both library for the lambda function; paste the following in the function section.
import boto3 import collections import datetime def lambda_handler(event, context): ec2 = boto3.client('ec2') # Get list of regions regions = ec2.describe_regions().get('Regions',[] ) # Iterate over regions for region in regions: print "Checking region %s " % region['RegionName'] reg=region['RegionName'] # Connect to region ec2 = boto3.client('ec2', region_name=reg) # Get all in-use volumes in all regions result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['keypads']}]) for volume in result['Volumes']: print "Backing up %s in %s" % (volume['VolumeId'], volume['AvailabilityZone']) # Create snapshot result = ec2.create_snapshot(VolumeId=volume['VolumeId'],Description='Created by Lambda backup function ebs_snapshot_consul') # Get snapshot resource ec2resource = boto3.resource('ec2', region_name=reg) snapshot = ec2resource.Snapshot(result['SnapshotId']) volumename = 'N/A' # Find the following tags for volume if it exists if 'Tags' in volume: for tags in volume['Tags']: if tags["Key"] == 'Name': volumename = tags["Value"] if tags["Key"] == 'service': servicename = tags["Value"] # Add volume name to snapshot for easier identification snapshot.create_tags(Tags=[{'Key': 'Name','Value': volumename},{'Key': 'service','Value': servicename}])
The code above scans through all regions and filters volume with that are tagged with "service"=="keypads"; as seen below
result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['encryption']}])
Also it tags the eventual snapshot with the name of the parent volume and parent service tag, as seen below
# Find the tags for volume if it exists if 'Tags' in volume: for tags in volume['Tags']: if tags["Key"] == 'Name': volumename = tags["Value"] if tags["Key"] == 'service': servicename = tags["Value"]
For your function, adjust some basic settings
- set timeout value to 2mins ( enough time for your code to run)
- Memory set to 128MB
go ahead and save, then test your function
Log output will look like this if the run was a success
Checking region ap-south-1
Checking region eu-west-3
Checking region eu-west-2
Checking region eu-west-1
Checking region ap-northeast-2
Checking region ap-northeast-1
Checking region sa-east-1
Checking region ca-central-1
Checking region ap-southeast-1
Checking region ap-southeast-2
Checking region eu-central-1
Checking region us-east-1
Checking region us-east-2
Checking region us-west-1
Checking region us-west-2
END RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3e
REPORT RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3 Duration: 13258.99 ms Billed Duration: 13300 ms Memory Size: 128 MB Max Memory Used: 63 Mns().get('Regions',[] )
Now our snapshot creation function is complete. However, we need to create a schedule around our function; this is where triggers come into play. On the list of trigger on the left hand side; click on cloud watch events.
Create a new rule, enter the rule name, enter rule description, choose the CRON expression.
Click Add, and save your function. And you are all done with snapshot creation. Now we need to create another function for clean up of older snapshots.
Follow the same process for creating Lambda function as above, here is what the lambda_function.py will look like
import boto3 from botocore.exceptions import ClientError from datetime import datetime,timedelta def delete_snapshot(snapshot_id, reg): print "Deleting snapshot %s " % (snapshot_id) try: ec2resource = boto3.resource('ec2', region_name=reg) snapshot = ec2resource.Snapshot(snapshot_id) snapshot.delete() except ClientError as e: print "Caught exception: %s" % e return def lambda_handler(event, context): # Get current timestamp in UTC now = datetime.now() # AWS Account ID account_id = '1234567890' # Define retention period in days retention_days = 3 # Create EC2 client ec2 = boto3.client('ec2') # Get list of regions regions = ec2.describe_regions().get('Regions',[] ) # Iterate over regions for region in regions: print "Checking region %s " % region['RegionName'] reg=region['RegionName'] # Connect to region ec2 = boto3.client('ec2', region_name=reg) # Lets grab all snapshot id's by Tags result = ec2.describe_snapshots( Filters=[{'Name': 'tag:service', 'Values': ['keystore']}] ) for snapshot in result['Snapshots']: print "Checking snapshot %s which was created on %s" % (snapshot['SnapshotId'],snapshot['StartTime']) # Remove timezone info from snapshot in order for comparison to work below snapshot_time = snapshot['StartTime'].replace(tzinfo=None) # Subtract snapshot time from now returns a timedelta # Check if the timedelta is greater than retention days if (now - snapshot_time) > timedelta(retention_days): print "Snapshot is older than configured retention of %d days" % (retention_days) delete_snapshot(snapshot['SnapshotId'], reg) else:print "Snapshot is newer than configured retention of %d days so we keep it"(retention_days)
When you test your lambda function, the logs will look like this
Deleting snapshot snap-010444196f3597b6b Checking region us-east-2 Checking region us-west-1 Checking region us-west-2 END RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4 REPORT RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4 Duration: 17820.37 ms Billed Duration: 17900 ms Memory Size: 128 MB Max Memory Used: 67
P.S.
As you will notice, i am terrible at editing. Let me know how else i could make this easier to read and understand.
References
http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#snapshot
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/TakeScheduledSnapshot.html
https://www.codebyamir.com/blog/automated-ebs-snapshots-using-aws-lambda-cloudwatch