Tuesday, January 30, 2018

Automating backups of your AWS EBS volumes using Lambda and Cloudwatch

I found this write up below very insightful when you try to create schedule snapshots of EBS volumes in AWS.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/TakeScheduledSnapshot.html

The problem i found however was that the Snapshots created were without description and it provided no means for clean up.

A better approach will be to use lambda function to make API calls to take EBS of these volumes and then another Lambda function to clean up older snapshots. This is what i ended up doing.

In Summary, this is what we are trying to achieve

- Filter information about the volumes we want to take snapshots of
- Take the snapshots of these volumes
- Label and tag these snapshots with names of their parent volume
- Delete Snapshots that past the specified retention period
- Write logs to cloud watch

IAM Role Assignment
First we need to create a IAM Role that has permission to do the following
- retrieve information about EC2 instances
- take snapshots of these EC2 instance
- Create Tags for the snapshots
- delete snapshots
- Make log entries to CloudWatch

In your AWS Management console, select IAM  > Roles  > Create Role
Under "AWS service"; select Lambda. Click on "Next: Permissions".
Skip the Next Page, we will create a policy later and attach to this Role.
Name you roles, and also a quick description.



In your AWS Management console, select IAM  > Policies > Create policy

This policy would include the tasks listed in the IAM role assignment; paste the JSON below in the JSON editor.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:*"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:CreateTags",
                "ec2:ModifySnapshotAttribute",
                "ec2:ResetSnapshotAttribute"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

select the policy you just created and attach it to the role you created earlier 








Now that we have our permissions in place, we will go create out Lambda function

Select Lambda in the list of AWS services , click on "Create a Function"
Type in the name of your function, for runtime; select python 2.7.
for Role, select existing role and choose the IAM role you created earlier from the list









Click on create on function and the next screen takes you to the functions main page.
We will be using both library for the lambda function; paste the following in the function section.

import boto3
import collections
import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    
    # Get list of regions
    regions = ec2.describe_regions().get('Regions',[] )

    # Iterate over regions
    for region in regions:
        print "Checking region %s " % region['RegionName']
        reg=region['RegionName']

        # Connect to region
        ec2 = boto3.client('ec2', region_name=reg)
    
        # Get all in-use volumes in all regions  
        result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['keypads']}])
        
        for volume in result['Volumes']:
            print "Backing up %s in %s" % (volume['VolumeId'], volume['AvailabilityZone'])
        
            # Create snapshot
            result = ec2.create_snapshot(VolumeId=volume['VolumeId'],Description='Created by Lambda backup function ebs_snapshot_consul')
        
            # Get snapshot resource 
            ec2resource = boto3.resource('ec2', region_name=reg)
            snapshot = ec2resource.Snapshot(result['SnapshotId'])
        
            volumename = 'N/A'
        
            # Find the following tags for volume if it exists
            if 'Tags' in volume:
                for tags in volume['Tags']:
                    if tags["Key"] == 'Name':
                        volumename = tags["Value"]
                    if tags["Key"] == 'service':
                        servicename = tags["Value"]
        
            # Add volume name to snapshot for easier identification
            snapshot.create_tags(Tags=[{'Key': 'Name','Value': volumename},{'Key': 'service','Value': servicename}])





The code above scans through all regions and filters volume with that are tagged with "service"=="keypads"; as seen below 

result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['encryption']}])

Also it tags the eventual snapshot with the name of the parent volume and parent service tag, as seen below 

# Find the tags for volume if it exists
  if 'Tags' in volume:
      for tags in volume['Tags']:
          if tags["Key"] == 'Name':
              volumename = tags["Value"]
          if tags["Key"] == 'service':
              servicename = tags["Value"]

Note that you can use other tags for filtering your volume, and you can leave out filters if you want to take a snapshot of all volumes.

For your function, adjust some basic settings

- set timeout value to 2mins ( enough time for your code to run)
- Memory set to 128MB

go ahead and save, then test your function 
        
Log output will look like this if the run was a success


Checking region ap-south-1 
Checking region eu-west-3 
Checking region eu-west-2 
Checking region eu-west-1 
Checking region ap-northeast-2 
Checking region ap-northeast-1 
Checking region sa-east-1 
Checking region ca-central-1 
Checking region ap-southeast-1 
Checking region ap-southeast-2 
Checking region eu-central-1 
Checking region us-east-1 
Checking region us-east-2 
Checking region us-west-1 
Checking region us-west-2 
END RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3e
REPORT RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3 Duration: 13258.99 ms Billed Duration: 13300 ms Memory Size: 128 MB Max Memory Used: 63 Mns().get('Regions',[] )
Now our snapshot creation function is complete. However, we need to create a schedule around our function; this is where triggers come into play. On the list of trigger on the left hand side; click on cloud watch events. 
Create a new rule, enter the rule name, enter rule description, choose the CRON expression. 







Click Add, and save your function.
And you are all done with snapshot creation. 
Now we need to create another function for clean up of older snapshots.

Follow the same process for creating Lambda function as above, here is what the lambda_function.py will look like 

import boto3
from botocore.exceptions import ClientError

from datetime import datetime,timedelta

def delete_snapshot(snapshot_id, reg):
    print "Deleting snapshot %s " % (snapshot_id)
    try:  
        ec2resource = boto3.resource('ec2', region_name=reg)
        snapshot = ec2resource.Snapshot(snapshot_id)
        snapshot.delete()
         except ClientError as e:
        print "Caught exception: %s" % e
        
    return
    
def lambda_handler(event, context):
    
    # Get current timestamp in UTC
    now = datetime.now()

    # AWS Account ID    
    account_id = '1234567890'
    
    # Define retention period in days
    retention_days = 3
    
    # Create EC2 client
    ec2 = boto3.client('ec2')
    
    # Get list of regions
    regions = ec2.describe_regions().get('Regions',[] )

    # Iterate over regions
    for region in regions:
        print "Checking region %s " % region['RegionName']
        reg=region['RegionName']
        
        # Connect to region
        ec2 = boto3.client('ec2', region_name=reg)
        
        # Lets grab all snapshot id's by Tags
        result = ec2.describe_snapshots( Filters=[{'Name': 'tag:service', 'Values': ['keystore']}] )
    
        for snapshot in result['Snapshots']:
            print "Checking snapshot %s which was created on %s" % (snapshot['SnapshotId'],snapshot['StartTime'])
       
            # Remove timezone info from snapshot in order for comparison to work below
            snapshot_time = snapshot['StartTime'].replace(tzinfo=None)
        
            # Subtract snapshot time from now returns a timedelta 
            # Check if the timedelta is greater than retention days
            if (now - snapshot_time) > timedelta(retention_days):
                print "Snapshot is older than configured retention of %d days" % (retention_days)
                delete_snapshot(snapshot['SnapshotId'], reg)
            else:
                print "Snapshot is newer than configured retention of %d days so we keep it"(retention_days)

When you test your lambda function, the logs will look like this 
      
   Deleting snapshot snap-010444196f3597b6b 
    Checking region us-east-2 
    Checking region us-west-1 
    Checking region us-west-2 
    END RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4
    REPORT RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4 Duration: 17820.37 ms Billed Duration: 17900 ms  Memory Size: 128 MB Max Memory Used: 67 





3 comments:

  1. Very helpful blog post... Automated snapshots AWS backup is good idea to protect data. I found this post very useful. Thanks for sharing.

    ReplyDelete
  2. Thanks for reading and for the feedback

    ReplyDelete
  3. Great post I must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. 4 ps of marketing

    ReplyDelete