Tuesday, January 30, 2018

Automating backups of your AWS EBS volumes using Lambda and Cloudwatch

I found this write up below very insightful when you try to create schedule snapshots of EBS volumes in AWS.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/TakeScheduledSnapshot.html

The problem i found however was that the Snapshots created were without description and it provided no means for clean up.

A better approach will be to use lambda function to make API calls to take EBS of these volumes and then another Lambda function to clean up older snapshots. This is what i ended up doing.

In Summary, this is what we are trying to achieve

- Filter information about the volumes we want to take snapshots of
- Take the snapshots of these volumes
- Label and tag these snapshots with names of their parent volume
- Delete Snapshots that past the specified retention period
- Write logs to cloud watch

IAM Role Assignment
First we need to create a IAM Role that has permission to do the following
- retrieve information about EC2 instances
- take snapshots of these EC2 instance
- Create Tags for the snapshots
- delete snapshots
- Make log entries to CloudWatch

In your AWS Management console, select IAM  > Roles  > Create Role
Under "AWS service"; select Lambda. Click on "Next: Permissions".
Skip the Next Page, we will create a policy later and attach to this Role.
Name you roles, and also a quick description.



In your AWS Management console, select IAM  > Policies > Create policy

This policy would include the tasks listed in the IAM role assignment; paste the JSON below in the JSON editor.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:*"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:CreateTags",
                "ec2:ModifySnapshotAttribute",
                "ec2:ResetSnapshotAttribute"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

select the policy you just created and attach it to the role you created earlier 








Now that we have our permissions in place, we will go create out Lambda function

Select Lambda in the list of AWS services , click on "Create a Function"
Type in the name of your function, for runtime; select python 2.7.
for Role, select existing role and choose the IAM role you created earlier from the list









Click on create on function and the next screen takes you to the functions main page.
We will be using both library for the lambda function; paste the following in the function section.

import boto3
import collections
import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    
    # Get list of regions
    regions = ec2.describe_regions().get('Regions',[] )

    # Iterate over regions
    for region in regions:
        print "Checking region %s " % region['RegionName']
        reg=region['RegionName']

        # Connect to region
        ec2 = boto3.client('ec2', region_name=reg)
    
        # Get all in-use volumes in all regions  
        result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['keypads']}])
        
        for volume in result['Volumes']:
            print "Backing up %s in %s" % (volume['VolumeId'], volume['AvailabilityZone'])
        
            # Create snapshot
            result = ec2.create_snapshot(VolumeId=volume['VolumeId'],Description='Created by Lambda backup function ebs_snapshot_consul')
        
            # Get snapshot resource 
            ec2resource = boto3.resource('ec2', region_name=reg)
            snapshot = ec2resource.Snapshot(result['SnapshotId'])
        
            volumename = 'N/A'
        
            # Find the following tags for volume if it exists
            if 'Tags' in volume:
                for tags in volume['Tags']:
                    if tags["Key"] == 'Name':
                        volumename = tags["Value"]
                    if tags["Key"] == 'service':
                        servicename = tags["Value"]
        
            # Add volume name to snapshot for easier identification
            snapshot.create_tags(Tags=[{'Key': 'Name','Value': volumename},{'Key': 'service','Value': servicename}])





The code above scans through all regions and filters volume with that are tagged with "service"=="keypads"; as seen below 

result = ec2.describe_volumes( Filters=[{'Name': 'tag:service', 'Values': ['encryption']}])

Also it tags the eventual snapshot with the name of the parent volume and parent service tag, as seen below 

# Find the tags for volume if it exists
  if 'Tags' in volume:
      for tags in volume['Tags']:
          if tags["Key"] == 'Name':
              volumename = tags["Value"]
          if tags["Key"] == 'service':
              servicename = tags["Value"]

Note that you can use other tags for filtering your volume, and you can leave out filters if you want to take a snapshot of all volumes.

For your function, adjust some basic settings

- set timeout value to 2mins ( enough time for your code to run)
- Memory set to 128MB

go ahead and save, then test your function 
        
Log output will look like this if the run was a success


Checking region ap-south-1 
Checking region eu-west-3 
Checking region eu-west-2 
Checking region eu-west-1 
Checking region ap-northeast-2 
Checking region ap-northeast-1 
Checking region sa-east-1 
Checking region ca-central-1 
Checking region ap-southeast-1 
Checking region ap-southeast-2 
Checking region eu-central-1 
Checking region us-east-1 
Checking region us-east-2 
Checking region us-west-1 
Checking region us-west-2 
END RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3e
REPORT RequestId: d610f18d-05c9-11e8-a062-d72177d9ae3 Duration: 13258.99 ms Billed Duration: 13300 ms Memory Size: 128 MB Max Memory Used: 63 Mns().get('Regions',[] )
Now our snapshot creation function is complete. However, we need to create a schedule around our function; this is where triggers come into play. On the list of trigger on the left hand side; click on cloud watch events. 
Create a new rule, enter the rule name, enter rule description, choose the CRON expression. 







Click Add, and save your function.
And you are all done with snapshot creation. 
Now we need to create another function for clean up of older snapshots.

Follow the same process for creating Lambda function as above, here is what the lambda_function.py will look like 

import boto3
from botocore.exceptions import ClientError

from datetime import datetime,timedelta

def delete_snapshot(snapshot_id, reg):
    print "Deleting snapshot %s " % (snapshot_id)
    try:  
        ec2resource = boto3.resource('ec2', region_name=reg)
        snapshot = ec2resource.Snapshot(snapshot_id)
        snapshot.delete()
         except ClientError as e:
        print "Caught exception: %s" % e
        
    return
    
def lambda_handler(event, context):
    
    # Get current timestamp in UTC
    now = datetime.now()

    # AWS Account ID    
    account_id = '1234567890'
    
    # Define retention period in days
    retention_days = 3
    
    # Create EC2 client
    ec2 = boto3.client('ec2')
    
    # Get list of regions
    regions = ec2.describe_regions().get('Regions',[] )

    # Iterate over regions
    for region in regions:
        print "Checking region %s " % region['RegionName']
        reg=region['RegionName']
        
        # Connect to region
        ec2 = boto3.client('ec2', region_name=reg)
        
        # Lets grab all snapshot id's by Tags
        result = ec2.describe_snapshots( Filters=[{'Name': 'tag:service', 'Values': ['keystore']}] )
    
        for snapshot in result['Snapshots']:
            print "Checking snapshot %s which was created on %s" % (snapshot['SnapshotId'],snapshot['StartTime'])
       
            # Remove timezone info from snapshot in order for comparison to work below
            snapshot_time = snapshot['StartTime'].replace(tzinfo=None)
        
            # Subtract snapshot time from now returns a timedelta 
            # Check if the timedelta is greater than retention days
            if (now - snapshot_time) > timedelta(retention_days):
                print "Snapshot is older than configured retention of %d days" % (retention_days)
                delete_snapshot(snapshot['SnapshotId'], reg)
            else:
                print "Snapshot is newer than configured retention of %d days so we keep it"(retention_days)

When you test your lambda function, the logs will look like this 
      
   Deleting snapshot snap-010444196f3597b6b 
    Checking region us-east-2 
    Checking region us-west-1 
    Checking region us-west-2 
    END RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4
    REPORT RequestId: 5630a0fb-05e4-11e8-8aa1-3d6c72fb82d4 Duration: 17820.37 ms Billed Duration: 17900 ms  Memory Size: 128 MB Max Memory Used: 67 





Monday, January 29, 2018

New Season, New Me

So my old posts were basically papers i wrote while studying for my Masters degree in information systems management.

I finally completed my studies sometime in 2013; which i immediately changed fields and started working IT full time. My previous role was more like a Hybrid business systems analyst. I was acting as an intermediary between core system/software developers and the front end users of these products.

I have always wanted a full blown IT position which was why i took IT certifications (self paces learning) while in my old role. I bagged a CCNA certification even without touching a physical switch or router.

I got a chance in 2013 to jump ship; it was a fresh start for me. 6 years after my first degree, i was starting a Job as a "Storage engineer". It was a fertile learning ground for me as i put everything i got into it. I bagged 8 various certifications in storage in a space of 2 years and i soon became a SNIA certified Storage Network Expert.

With all my storage/server/network knowledge; i moved again and this time into Cloud Engineering working with AWS Automation tools. I also have a CISSP certification as well.

I love what i do.

I remembered i had this blog a long time ago, so i have decided it might be a good platform to share some of the interesting things i have been working on.

Watch out for more cool stuff.