Amazon Elastic Beanstalk

Elastic Beanstalk (EB) is a service which helps you easily manage and deploy your application code into an automated application environment. It handles provisioning AWS resources like EC2 instances, ELB instances, and RDS databases, and let’s you focus on writing your code and deploying with a git push style deployment when you’re ready to deploy to development, staging, or production environments.

What does Elastic Beanstalk offer me? FAQ

Getting Started Walkthrough

What Is AWS Elastic Beanstalk and Why Do I Need It?


AWS Elastic Beanstalk is free, but the AWS resources that AWS Elastic Beanstalk provides are live (and not running in a sandbox). You will incur the standard usage fees for any resources your environment uses, until you terminate them.

The total charges for the activity we’ll do during this blog post will be minimal (typically less than a dollar). It is possible to do some testing of EB in Free tier by following this guide.

Further Reading on Pricing

Key Concepts

The key concepts when trying to understand and use Elastic Beanstalk are

  • Application
  • Environment
  • Version
  • Environment Configuration
  • Configuration Template

The primary AWS services that Elastic Beanstalk can/will use are

  • Amazon Elastic Compute Cloud (Amazon EC2)
  • Amazon Relational Database Service (Amazon RDS)
  • Amazon Simple Storage Service (Amazon S3)
  • Amazon Simple Notification Service (Amazon SNS)
  • Amazon CloudWatch
  • Elastic Load Balancing
  • Auto Scaling

It’s important to understand what each of the main components in Elastic Beanstalk, so let’s explore them in a little more depth.


An AWS Elastic Beanstalk application is a logical collection of AWS Elastic Beanstalk components, including environments, versions, and environment configurations. In AWS Elastic Beanstalk an application is conceptually similar to a folder.


In AWS Elastic Beanstalk, a version refers to a specific, labeled iteration of deployable code. A version points to an Amazon Simple Storage Service (Amazon S3) object that contains the deployable code (e.g., a Java WAR file). A version is part of an application. Applications can have many versions.


An environment is a version that is deployed onto AWS resources. Each environment runs only a single version, however you can run the same version or different versions in many environments at the same time. When you create an environment, AWS Elastic Beanstalk provisions the resources needed to run the application version you specified. For more information about the environment and the resources that are created, see Architectural Overview.

Environment Configuration

An environment configuration identifies a collection of parameters and settings that define how an environment and its associated resources behave. When you update an environment’s configuration settings, AWS Elastic Beanstalk automatically applies the changes to existing resources or deletes and deploys new resources (depending on the type of change).

Configuration Template

A configuration template is a starting point for creating unique environment configurations. Configuration templates can be created or modified only by using the AWS Elastic Beanstalk command line utilities or APIs.

Further Reading


The typical workflow for using Elastic Beanstalk is that you’ll create one or more environments for a given application. Commonly development, staging, and production environments are created.

As you’re ready to deploy new versions of your application to a given environment, you’ll upload a new version and deploy it to that environment via the AWS console, the CLI tools, an IDE, or the an EB API library.

Supported Languages

Elastic Beanstalk currently supports the following languages:

Getting Started

To get started with Elastic Beanstalk, we’ll be using the AWS console.

  1. Login to the console and choose the Elastic Beanstalk service.
  2. Select your application platform, we’ll use Python for this example, then click Start
  3. AWS will begin provisioning you a new application environment. This can take a few minutes since it involves provisioning at least one new EC2 instance. EB is performing a number of steps while you wait, including
  4. Creating an AWS Elastic Beanstalk application named “My First Elastic Beanstalk Application.”
  5. Creating a new application version labeled “Initial Version” that refers to a default sample application file.
  6. Launching an environment named “Default-Environment” that provisions the AWS resources to host the application.
  7. Deploying the “Initial Version” application into the newly created “Default-Environment.”
  8. Once the provisioning is finished you are able to view the default application by expanding the Environment Details and clicking on the URL

At this point we have a deployed EB managed application environment.

Further Reading

Deploying an application

There are two ways to deploy applications to your EB environments

  1. Manually through the AWS console
  2. Using the AWS DevTools, in conjunction with Git or an IDE like Visual Studio or Eclipse.

Manual Deploy

Let’s do a manual update of the application through the console

  1. Since we’re using Python as our example framework, I am using the python sample from the Getting Started walk through
  2. Login to the EB console
  3. Click on the Versions tab
  4. Click Upload New Version
  5. Enter Second Version for the Version Label
  6. Choose the and Upload it
  7. Under Deployment choose Upload, leave the environment set to Default-Environment
  8. Click Upload New Version
  9. You should now see Second Version available on the Versions tab

Now we can deploy the new version of the application to our environment

  1. Check the box next to Second Version
  2. Click the Deploy button
  3. Set Deploy to: to Default-Environment
  4. Click Deploy Version
  5. Below your list of Versions it will now display the Default-Environment.
  6. You can click on the Events tab to watch the progress of this deploy action.
  7. Wait for the Environment update completed successfully. event to be logged.

Once the deployment is finished, you can check it by

  1. Clicking on the Overview tab
  2. Expanding your application environment, e.g. Default-Environment
  3. Reviewing the Running Version field.
  4. It should now say Second Version

CLI Deploy

Being able to deploy from the command line and with revision control is ideal. So Amazon has written a set of tools, the AWS DevTools, that integrate with Git to help get this workflow up and running.

Let’s walk through doing a CLI deploy. I am going to assume you already have Git installed.

  1. Get the ELB command line tools downloaded and installed
  2. Unzip the python sample into a directory and initialize that directory as a Git repository with git init
  3. Adding everything in the directory to the repo with git add * and commit it with git commit -a -m "all the things"
  4. From your Git repository directory, run You can find in the AWS DevTools/Linux directory. You need to run this script for each Git repository.
  5. Follow the Git setup steps to setup the DevTools with your AWS credentials, application, and environment names.
  6. Edit in your Git repo and add a comment like # I was here
  7. Commit this change with git commit -a -m "I was here"
  8. Push your change to your EB application environment with git aws.push, you can see what this should look like in the example on deploying a PHP app with Git and DevTools
  9. If you’re Push succeeds, you should see the Running Version of your application show the Git SHA1 of your commit.

You can now continue to work and deploy by committing to Git and using git aws.push

Further Reading

Application Environment Configurations

Once you’re familiar and comfortable with deploying applications, you’ll likely want to customize your application environments. Since EB uses EC2 instances running Linux or Windows, you have a certain amount of flexibility in what customizations you can make to the environment.

Customizing Instance Options

You’re able to tweak many options regarding your instances, ELB, Auto-Scaling, and Database options, including

  • Instance type
  • EC2 security group
  • Key pairs
  • Port and HTTPS options for ELB
  • Auto-Scaling instance settings and AZ preference
  • Setting up your environment to use RDS resources.

To customize these things, you

  1. Login to the AWS console
  2. Locate your environment and click on Actions
  3. Make your desired changes to the settings
  4. Click Apply Changes

As mentioned, some changes can be done on the fly, like ELB changes, while others, like changing the instance size or migrating to RDS, require a longer period of time and some application downtime.

Further reading on Environment Customization

Customizing Application Containers

At this time, the following container types support customization

  • Tomcat 6 and 7 (non-legacy container types)
  • Python
  • Ruby 1.8.7 and 1.9.3
  • Currently, AWS Elastic Beanstalk does not support configuration files for PHP, .NET, and legacy Tomcat 6 and 7 containers.

You’re able to customize a number of things, including

  • Packages
  • Sources
  • Files
  • Users
  • Groups
  • Commands
  • Container_commands
  • Services
  • Option_settings

Further Readingon Application Container Customization

Where to go from here

Now that you’re used Elastic Beanstalk, seen how to deploy code, learned how to customized your instances, you may be considering running your own application with Elastic Beanstalk.

Some of the things you’ll want to look into further if you want to deploy your application to EB are:

Bootstrapping Config Management on AWS

When using a cloud computing provider like AWS’s EC2 service, being able to ensure that all of your instances are running the same configuration and being able to know that new instances you create can be quickly configured to meet your needs is critical.

Configuration Management tools are the key to achieving this. In my experience so far, the two most popular open source configuration management tools are PuppetLabs’ Puppet and Opscode’s Chef products. Both are open source, written in Ruby, and you’re able to run your own server and clients without needing to purchase any licensing or support. Both also have vibrant and passionate communities surrounding them. These are the two we will focus on for the purposes of this post.

Getting started with using Puppet or Chef itself and/or building the Puppet or Chef server will not be the focus of this post, but I will provide some good jumping off points to learn more about this. I am going to focus specifically on some techniques for bootstrapping the Puppet and Chef clients onto Linux EC2 instances.

user-data and cloud-init

Before getting into the specifics of bootstrapping each client, let’s take a look at two important concepts/tools for Linux AMIs


user-data is a piece of instance metadata that is available to your EC2 instances at boot time and during the lifetime of your instance.

At boot time for Ubuntu AMIs and the Amazon Linux AMI, this user-data is passed to cloud-init during the first bootup of the EC2 instance, and cloud-init will read the data and can execute it.

So a common technique for bootstrapping instances is to pass the contents of a shell script to the EC2 API as the user-data, the shell code is executed during boot, as the root user, and your EC2 instance is modified accordingly.

This is the technique we will use to help bootstrap our config management clients.


cloud-init is the Ubuntu package that handles early initialization of a cloud instance. It is installed in the official Ubuntu images available on EC2 and Amazon also includes it in their Amazon Linux AMI.

It provides a wide variety of built-in functions you can use to customize and configure your instances during bootup, which you send to cloud-init via user-data. It also supports the ability to run arbitrary shell commands.

It’s definitely possible to use cloud-init as a lightweight way to do config management style actions at bootup, but you’re left to build your own tools to make additional modifications to your EC2 instances during their lifecycle.

In our case we’re going to take advantage of user-data and cloud-init to use curl to download a shell script from S3 that takes care of our client bootstrapping, as this technique translates well to any Linux distributions, not just those which include cloud-init. And this is also easily re-usable in other cloud provider environments, your own data center, or home lab/laptop/local dev environement(s).

Bootstrapping Puppet

To bootstrap Puppet, you’ll need two things

  1. A Puppetmaster where you can sign the certificate the client generates
  2. A shell script,, which installs the Puppet agent and connecting it to the puppetmaster

The process of bootstrapping works as follows

  1. You provision an EC2 instance, passing it user-data with the shell script
  2. The EC2 instance runs the script on the instance
  3. The shell script installs the Puppet client, sets the server in puppet.conf, and starts the Puppet service.

So Michtell Hashimoto of Vagrant fame has recently started an amazing puppet-bootstrap repository on Github. So grab the script for the distribution type, RHEL, Debian/Ubuntu, etc, and save it locally.

Then add the following two lines to the script

echo "" >> /etc/puppet/puppet.conf echo "listen=true" >> /etc/puppet/puppet.conf

Save the script and pass it in as your user-data.

Client certificate signing

The final step is to sign the client’s certificate on your Puppetmaster.

You can do this with the following command

At this point you can give the instance a node definition and begin applying your classes and modules.

Bootstrapping Chef

To bootstrap Chef onto you’re going to need five things

  1. A Chef server or Hosted Chef account
  2. A client.rb in an S3 bucket, with what you want your instance default settings to be
  3. Your validation.pem (ORGNAME-validator.pem if you’re using Hosted Chef), in an S3 bucket
  4. A shell script,, to install the Omnibus installer and drop your files in place, which you pass in as user-data
  5. An IAM role that includes read access to the above S3 bucket

The process of bootstrapping works as follows

  1. You provision an EC2 instance, passing it the IAM role and user-data with the shell script
  2. The EC2 instance runs the script on the instance
  3. The shell script installs the Omnibus Chef client, drops the .pem, and client.rb in place and kicks off the first chef-client run

Creating the IAM Role

To create the IAM role you do the following

  1. Login to the AWS console
  2. Click on the IAM service
  3. Click on Roles, set the Role Name
  4. Click on Create New Role
  5. Select AWS Service Roles, click Select
  6. Select Policy Generator, click Select
  7. In the Edit Permissions options
    1. Set Effect to Allow
    2. Set AWS Service to Amazon S3
    3. For Actions, select ListAllMyBuckets and GetObject
    4. For the ARN, use arn:aws:s3:::BUCKETNAME, e.g. arn:aws:s3:::meowmix
    5. Click Add Statement
  8. Click Continue
  9. You’ll see a JSON Policy Document, review it for correctness, then Click Continue
  10. Click Create Role

Files on S3

There are many tools for uploading the files mentioned to S3, including the AWS console. I’ll leave the choice of tool up to the user.

If you’re not familiar with uploading to S3, see the Getting Started Guide.

The script is very simple, an example is included in the Github repo and shown below, the .pem and client.rb are geared towards Hosted Chef.


The client.rb is very simple, an example is included in the Github repo and shown below, this client.rb is geared towards Hosted Chef

At this point you’ll have a new EC2 instance that’s bootstrapped with the latest Omnibus Chef client and is connected to your Chef Server. You can begin applying roles, cookbooks, etc your new instance(s) with knife

In conclusion

You’ve now seen some ideas and two practical applications of automating as much of the configuration management bootstrapping process as is easily possible with Puppet and Chef. These can be easily adapted for other distributions and tools and customized to suit your organizations needs and constraints.

Options for Automating AWS

As we’ve seen in previous posts, boto and cloudformation are both options for helping automate your AWS resources, and can even compliment each other.

But not everyone will want to use Amazon’s CFN (which we covered in depth in the day 6 post) or a Python library, so I thought we’d explore some of the options for automating your usage of AWS in various programming languages.

Python – boto, libcloud

Python has a few options for libraries. The most actively developed and used one’s I’ve seen are boto and libcloud


Boto is meant to be a Python library for interacting with AWS service. It mirrors the AWS APIs in a Pythonic fashion and gives you the ability to build tools in Python on top of it, to manipulate and manage your AWS resources.

The project is lead by Mitch Garnaat, who is currently a Sr Engineer at Amazon.

Boto has a number of tutorials to get you started, including

Currently the following AWS services are supported:


  • Amazon Elastic Compute Cloud (EC2)
  • Amazon Elastic Map Reduce (EMR)
  • AutoScaling
  • Elastic Load Balancing (ELB)

Content Delivery

  • Amazon CloudFront


  • Amazon Relational Data Service (RDS)
  • Amazon DynamoDB
  • Amazon SimpleDB

Deployment and Management

  • AWS Identity and Access Management (IAM)
  • Amazon CloudWatch
  • AWS Elastic Beanstalk
  • AWS CloudFormation

Application Services

  • Amazon CloudSearch
  • Amazon Simple Workflow Service (SWF)
  • Amazon Simple Queue Service (SQS)
  • Amazon Simple Notification Server (SNS)
  • Amazon Simple Email Service (SES)


  • Amazon Route53
  • Amazon Virtual Private Cloud (VPC)

Payments and Billing

  • Amazon Flexible Payment Service (FPS)


  • Amazon Simple Storage Service (S3)
  • Amazon Glacier
  • Amazon Elastic Block Store (EBS)
  • Google Cloud Storage


  • Amazon Mechanical Turk


  • Marketplace Web Services


libcloud is a mature cloud provider library that is an Apache project. It’s meant to provide a Python interface to multiple cloud providers, with AWS being one of the first it supported and among the most mature in those that libcloud supports.

libcloud is organized around four components

  • Compute – libcloud.compute.*
  • Storage –*
  • Load balancers – libcloud.loadbalancer.*
  • DNS – libcloud.dns.*

Given the above components and my review of the API docs, libcloud effectively supports the following AWS services

  • EC2
  • S3
  • Route53

If you’re interested in learning more about libcloud, take a look at the Getting Started guide and the API documentation

Ruby – fog, aws-sdk gem

The main Ruby options seem to be Fog and the aws-sdk gem.


Similar to libcloud, Fog’s goal is to a be a mature cloud provider library with support for many providers. It provides a Ruby interface to them, with AWS being one of the first it supported and among the most mature in those that Fog supports. It’s also used to provide EC2 support for Opscode Chef’s knife

Fog is organized around four components

  • Compute
  • Storage
  • CDN
  • DNS

Based on a review of the supported services list and the aws library code, Fog currently has support for all the major AWS services.

If you’re interested in learning more about Fog, take a look at the Getting Started tutorial and the source code


The aws-sdk gem is the official gem from Amazon that’s meant to help Ruby developers integrate AWS services into their applications, with special support for Rails applications in particular.

It currently supports the following AWS services:

  • Amazon Elastic Compute Cloud (EC2)
  • Amazon SimpleDB (SDB)
  • Amazon Simple Storage Service (S3)
  • Amazon Simple Queue Service (SQS)
  • Amazon Simple Notifications Service (SNS)

If you’re interested in learning more about the ruby sdk, see the Getting Started guide and the FAQ

Java – jclouds, AWS SDK for Java

The Java world has a number of options, including jclouds and the official SDK for Java


jclouds is a Java and Clojure library whose goal is to a be a mature cloud provider library with support for many providers. It provides a Java interface to them, with AWS being one of the first it supported and among the most mature in those that jclouds supports.

jclouds is organized into two main components

  • Compute API
  • Blobstore API

jclouds currently has support for the following AWS services

  • EC2
  • SQS
  • EBS
  • S3
  • CloudWatch

SDK for Java

The SDK for Java is the official Java library from Amazon that’s meant to help Java developers integrate AWS services into their applications.

It currently supports all the AWS services.

If you’re interested in learning more about the Java sdk, see the Getting Started guide and the API documentation


The only PHP full featured PHP library I could find was the official SDK for PHP

The SDK for PHP is the official PHP library from Amazon that’s meant to help PHP developers integrate AWS services into their applications.

It currently supports all the AWS services.

If you’re interested in learning more about the PHP sdk, see the Getting Started guide and the API documentation

JavaScript – AWS SDK for Node.JS, AWSLib

There seem to be two JavaScript options, the AWS SDK for Node.js and aws-lib

SDK for Node.js

The SDK for Node.js is the official JavaScript library from Amazon that’s meant to help Javascript and Node.js developers integrate AWS services into their applications. This SDK is currently considered a developer preview

It currently supports the following AWS services

  • EC2
  • S3
  • DynamoDB
  • Simple Workflow

If you’re interested in learning more about the Node.js sdk, see the Getting Started guide and the API documentation


aws-lib is a simple Node.js library to communicate with the Amazon Web Services API.

It currently supports the following services

  • EC2
  • Product Advertising API
  • SimpleDB
  • SQS (Simple Queue Service)
  • SNS (Simple Notification Service)
  • SES (Simple Email Service)
  • ELB (Elastic Load Balancing Service)
  • CW (CloudWatch)
  • IAM (Identity and Access Management)
  • CFN (CloudFormation)
  • STS (Security Token Service)
  • Elastic MapReduce

If you’re interested in learning more about aws-lib, see the Getting started page and read the source code.

AWS Backup Strategies

Inspired by today’s SysAdvent post on Backups for Startups, I wanted to discuss some backup strategies for various AWS services.

As the Backups for Startups post describes

a backup is an off-line point-in-time snapshot – nothing more and nothing less. A backup is not created for fault tolerance. It is created for disaster recovery.

There are three common backup methods for achieving these point in time snapshots

  • Incremental
  • Differential
  • Full

The post explains each as well as I could, some I’m just going to share with you how Joseph Kern describes them

Incremental backups are all data that has changed since the last incremental or full backup. This has benefits of smaller backup sizes, but you must have every incremental backup created since the last full. Think of this like a chain, if one link is broken, you will probably not have a working backup.

Differential backups are all data that has changed since the last full backup. This still benefits by being smaller than a full, while removing the dependency chain needed for pure incremental backups. You will still need the last full backup to completely restore your data.

Full backups are all of your data. This benefits from being a single source restore for your data. These are often quite large.

A traditional scheme uses daily incremental backups with weekly full backups. Holding the fulls for two weeks. In this way your longest restore chain is six media (one weekly full, six daily incremental), while your shortest restore chain is only one media (one weekly full).

Another similar method uses daily differentials with weekly fulls. Your longest chain is just 2 media (one differential, and one full), While your shortest is still just a single full backup.

The article also has some good suggestions on capacity planning and cost estimation, which I suggest you review before implementing the AWS backup strategies we’ll learn in this post.

Let’s explore, at a high level, how we can apply these backup methods to some of the most commonly used AWS services. A future post will provide some hands-on examples of using specific tools and code to do some of these kinds of backups.

Backing up Data with S3 and Glacier

Amazon S3 has been a staple of backups for many organizations for years. Often people are using S3 even when they don’t use any other AWS services, because S3 provides a simple and cost effective solution to redundantly store your data off-site. A couple months ago Amazon introduced their Glacier service, which provides archival storage to tape drives for very low cost, but at the expense of having slow (multi-hour) retrieval times. Amazon recently integrated S3 and Glacier to provide the best of both worlds through one API interface.

S3 is composed of two things, buckets and objects. A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket and each object is available via a unique HTTP url. You’re able to manage access to your buckets and objects through a variety of tools, including IAM Policies, Bucket Policies, and ACLs.

As described above, you’re going to want your backup strategy to include full backups, at least weekly, and either incremental or differential backups on at least a daily basis. This will provide you with a number of point-in-time recovery options in the event of a disaster.

Getting data into S3

There are a number of options for getting data into S3 for backup purposes. If you want to roll your own scripts, you can use one of the many AWS libraries to develop code for storing your data in S3, performing full, incremental, and differential backups, and handling purging data older than the retention period you want to use, e.g. older than 30, 60, 90 days.

If you’re using a Unix-like operating systems, tools like s3cmd, duplicity (using it with S3), or Amanda Backup (using it with S3) provide a variety of options for using S3 as your backup storage, and these tools take care of a lot of the heavy lifting around doing the full, incremental, differential dance, as well as handling purging data beyond your retention period. Each tool has pros and cons in terms of implementation details and complexity vs ease of use.

If you’re using Windows, tools like Cloudberry S3 Backup Server Edition (a commercial tool), S3Sync (a free tool), or S3.exe (an open source cli tool) provide a variety of options for using S3 as your backup storage, and these tools take care of a lot of the heavy lifting around doing the full, incremental, differential dance, as well as handling purging data beyond your retention period. Each tool has pros and cons in terms of implementation details and complexity vs ease of use.

Managing the amount of data in S3

To implement a cost effective backup strategy with S3, I recommend that you take advantage of the Glacier integration when creating the lifecycle policies for each of your buckets. This effectively automates the moving of older data into Glacier and handles the purging off data beyond your retention period automatically.

Backing up EC2 instance data

When considering how to backup your EC2 instance data, there are a number of considerations, including the amount of data that needs to be backed up. Ideally things like your application source code, service configurations (e.g. Apache, Postfix, MySQL), and your configuration management code will all be stored in a version control system, such as Git (and on Github or Bitbucket), so that these are already backed up. But this can leave a lot of application data on file systems and in database that still needs to be backed up.

For this I’d suggest a two pronged approach, using EBS and S3. Since EBS has built-in support for Snapshots, I suggest using EBS volumes as a place to store a near real time copy of your application data and properly quiesced dumps of your database data. And using the snapshotting to provide a sensible number of recovery points for being able to quickly restore data. Getting the data from ephemeral disks or your primary EBS volumes can easily be done with some scripting and tools like rsync or robocopy.

Secondly, using one of the previously discussed tools, you should be doing more long term archives from the secondary EBS volumes to S3, and optionally you can use lifecycle policies on your S3 buckets to move data into Glacier for your longest term archives.

This approach involves some cost and complexity, but will provide you with multiple copies of your data and multiple options for recovery with different speed trade-offs. Specific implementation details are left as an exercise for the reader and some pragmatic examples will be part of a future post.

Backing up RDS data and configs

RDS provides built-in backup and snapshotting features to help protect your data. As discussed in the RDS post, I recommend you deploy RDS instances in a multi-AZ scenario whenever possible, as this reduces the uptime impact of performing backups.

RDS has a built in automated backups feature that will automatically perform daily backups at a scheduled time, with the caveat that it will cause an I/O pause of your RDS instance during the snapshot, for up to 35 days. These backups are stored on S3 storage for additional protection against data-loss.

RDS also supports making user initiated snapshots at any point in time, with the caveat that it will cause an I/O pause of your RDS instance during the snapshot, which can be mitigated with multi-AZ deployments. These snapshots are stored on S3 storage for additional protection against data-loss.

Additionally, because of how RDS instances do transaction logging, you’re able to do point-in-time restores to any point within the automated backup recovery window.

The only potential downside to these backup and snapshot features is that they’re isolated to the region your RDS instances run in. To provide DR copies of your database data to another region you’ll need to create a solution for this. One approach that is relatively low cost is to run a t1.micro in another region with a scheduled job that connects to your main RDS instance, performs a native SQL backup to local storage, then uploads the native SQL backup to S3 storage in your DR region. This kind of solution can have performance and cost considerations for large amounts of database data and so must be considered carefully before implementing.

Backing up AWS service configurations

While Amazon has built their services to be highly available and protect your data, it’s always important to ensure you have your own backups of any critical data.

Services like Route53 or Elastic Load Balancing (ELB) don’t store application data, but they do store data critical to rebuilding your application infrastructure in the event of a major failure, or if you’re trying to do disaster recovery in another region.

Since these services are all accessible through HTTP APIs, there are opportunities to roll your own backups of your AWS configuration data.

Route 53

With Route 53, you could get a list of your Hosted Zones, then get the details of each Zone. Finally, you could get the details of each DNS record. Once you have all this data, you can save it into a text format of your choice and upload it to S3 in another region. A ruby implementation of this idea is already available.


With ELB, you could get a list of all your Load Balancer instances, then store the data in a text format of your choice and finally upload it to S3 in another region. I did not find any existing implementations of this with some quick searching, but one could quickly be built using the AWS library of your choosing.


In summary, there are a number of great options for building a backup strategy and implementation that will meet your organizations retention, disaster recovery, and cost needs. Most of which are free and/or open source, and can be built in a highly automated fashion.

In a future post we’ll get hands-on about implementing some of these ideas in an automated fashion with the Boto Python library for AWS.

Amazon Simple Notification Service

Amazon Simple Notification Service (SNS) is a web service which helps you easily publish and deliver notifications to a variety of end points in an automated and low cost fashion. SNS currently supports sending messages to Email, SMS, HTTP/S, and SQS Queue endpoints.

You’re able to use SNS through the AWS console, the SNS CLI tools or through the SNS API.

The moving parts

SNS is composed of three main parts

  1. A topic
  2. A subscription
  3. Published messages

A topic is a communication channel to send messages and subscribe to notifications. Once you create a topic, you’re provided with a topic ARN (Amazon Resource Name), which you’ll use for subscriptions and publishing messages.

A subscription is done to a specific endpoint of a topic. This can be a web service, an email address or an SQS queue.

Published messages are generated by publishers, which can be scripts calling the SNS API, users using the AWS console, or using the CLI tools. Once a new message is published, Amazon SNS attempts to deliver that message to every endpoint that is subscribed to the topic.


SNS has a number of cost factors, including API requests, notifications to HTTP/S, notifications to Email, notifications to SMS, and data transferred out of a region.

You can get started using SNS with AWS’s Free Usage tier though. So you won’t have to pay to play right away.

Using SNS

To get started with using SNS, we’ll walk through making a topic, creating an email subscription, and publishing a message, all through the AWS console.

Making a topic

  1. Login to the AWS Console
  2. Click Create New Topic.
  3. Enter a topic name in the Topic Name field.
  4. Click Create Topic.
  5. Copy the Topic ARN for the next task.

You’re now ready to make a subscription.

Creating an email subscription

  1. In the AWS Console click on My Subscriptions
  2. Click the Create New Subscription button.
  3. In the Topic ARN field, paste the topic ARN you created in the previous task, for example: arn:aws:sns:us-east-1:054794666397:MyTopic.
  4. Select Email in the Protocol drop-down box.
  5. Enter your email address for the notification in the Endpoint field.
  6. Click Subscribe.
  7. Go to your email and open the message from AWS Notifications, and then click the link to confirm your subscription.
  8. You should see a confirmation response from SNS

You’re now ready to publish a message.

Publishing a message

  1. In the AWS Console click the topic you want to publish to, under My Topics in the Navigation pane.
  2. Click the Publish to Topic button.
  3. Enter a subject line for your message in the Subject field.
  4. Enter a brief message in the Message field.
  5. Click Publish Message.
  6. A confirmation dialog box will appear, click Close to close the confirmation dialog box.
  7. You should get the email shortly.

The SNS documentation has more details on:

Automating SNS

You’ve learned how to manually work with SNS, but as with all AWS services, things are best when automated.

Building on Day 4’s post, Getting Started with Boto, we’ll walk through automating SNS with some boto scripts.

Making a topic

This script will connect to us-west–2 and create a topic named adventtopic.

If the topic is successfully created, it will return the topic ARN. Otherwise, it will log any errors to sns-topic.log.

Making an email subscription

This script will connect to us-west–2 and create an email subscription to the topic named adventtopic for the email address you specify.

If the subscription is successfully created, it will return the topic ARN. Otherwise, it will log any errors to sns-topic.log.

  • Note: You’ll need to manually confirm the subscription in your email client before you can move on to using the pre for publishing a message.

Publishing a message

This script will connect to us-west–2 and create a message to the topic named adventtopic with the subject and message body you specify.

If the publication is successfully performed, it will return the topic ARN. Otherwise, it will log any errors to sns-publish.log.

Final thoughts

At this point you’ve successfully automated the main use cases for SNS.

As you can see, SNS can be very useful for sending notifications and with a little automation, can quickly become a part of your application infrastructure toolkit.

All the code samples are available in the Github repository

AWS re: Invent Recap

Amazon recently held their first AWS specific conference, on Nov. 27th through 29th. Featuring a series of sessions with technical content on cloud use cases, new AWS services, cloud migration best practices, architecting for scale, operating at high availability and making your cloud apps secure.

While I was unable to attend, I’ve been watching some of the session videos and talking with folks who were able to go.

I wanted to highlight a few of the talks I found informative and interesting.

Day 1 Keynote

The day 1 keynote contains a lot of great information on where AWS is headed and what they’re planning for the future.

Failures at Scale and How to Ignore Them

While this talk didn’t necessarily cover anything groundbreaking, it covered a lot of good ideas on failure and scaling.

Highly Available Architecture at Netflix

Netflix is definitely leading the way in building a distributed systems on AWS and building tools, many open source, for running systems on AWS. This talk gives a nice overview of how they’re doing this.

Building Web-Scale Applications With AWS

Lots of good information, the Top 5 list is particularly good.

Zero to Millions of Requests

This talk was explaining how NASA software engineers were able to architect and deploy a full solution to stream the curiosity landing to millions of users, with a timeline of one week from start to finish.

Big Data and the US Presidential Campaign

There was recently a great write up on the Obama campaign’s technology and this presentation goes into a lot of the same details.

Pinterest Pins AWS! Running Lean on AWS Once You’ve Made It

A great talk on how Pinterest has used AWS to quickly and efficiently scale their site.

All the videos

There is an excellent playlist with all the videos as well.

Amazon CloudFormation Primer

Amazon CloudFormation (CFN) is an AWS service which provides users with a way to describe the AWS resources they want created, in a declarative and re-usable fashion, through the use of simple JSON formatted text files. It supports a wide variety of AWS services, includes the ability to pass in user supplied paramaters, has a nice set of CLI tools, and a few handy functions you are able to use in the JSON files.


CloudFormation starts with the idea of a stack. A stack is a JSON formatted file with the following attributes

  • A template the stack will be based on
  • A list of template parameters, user supplied inputs, such as a EC2 instance or VPC id
  • An optional list of mappings which are used to lookup values, such as AMI ids for different regions
  • An optional list of data tables used to lookup static configuration values (e.g. AMI names)
  • The list of AWS resources and their configuration values
  • A list of outputs, such as the id of an ELB instance that was created

A stack can be created using the AWS Console, the CLI tools, or an API library. Stacks can be as re-usable or as monolithic as your choose to make them. A future post will cover some ideas on CFN stack re-usable, design goals, and driving CFN stacks with Boto, but this post is going to focus on getting you up and running with a simple stack and give you some jumping off points for further research on your own.

You’re able to use templates to create new stacks or update existing stacks.


CloudFormation itself does not cost anything. You are charged the normal AWS costs for any resources created as part of creating a new stack or updating an existing stack.

It’s important to note you’re charged a full hour for any resources costs, even if you’re stack gets rolled back due to an error during stack creation.

* This can mean it could become costly if you’re not careful while testing and building your templates and stacks. *

Getting Started

We’re going to assume you already have an AWS account and are familiar with editing JSON files.

To get started you’ll need to install the CFN CLI tools.

Writing a basic template

A template is a JSON formatted text file. Amazon ends theirs with a .template, while I prefer to name mine .json as, for naming and syntax highlighting reasons, but ultimately this is arbitrary.

A template begins with the AWSTemplateFormatVersion and a Description, and must contain at least one Resources block with a single Resource.

A most basic template only needs what is show below


A template can contain Parameters for user input. An example of this would be a parameter for the instance type.

As you’ll see in the example below, you refer to paramaters or other values using a special function, called Ref.


Sometimes Mappings are a better option than Parameters, a common pattern you’ll see in CFN templates is using a Mapping for the AMI ids in various AWS rgions, as shown below


Finally, you’re usually going to want to use one or more Outputs in your template to provide you with information about the resources the creation of stack made.


Once you’ve created a template, you’ll want to validate that it works with the cfn-validate-template command from the CLI tools.

An example of using it with a local file is shown below

cfn-validate-template --template-file basic-output.template

PARAMETERS InstanceTypeInput false EC2 instance type

After you’ve verified the template is valid, you can try creating it using the cfn-create-stack command. The you give the command a stackname and a file or URL for the template you want to use. The command will return some info, including the new stack id

Note:__Running this command with a template will create AWS resources, that you will be billed for if they exceed your free tier__

An example of creating a stack is shown below

cfn-create-stack basic-test-1 --template-file basic.template


You can check the progress of the stack creation with the cfn-describe-stack-events, which you give the stackname.

An example of a stack creation in progress

cfn-describe-stack-events basic-test-1

STACK_EVENT basic-test-1 Ec2Instance AWS::EC2::Instance 2012-12-07T06:35:42Z CREATE_IN_PROGRESS

STACK_EVENT basic-test-1 basic-test-1 AWS::CloudFormation::Stack 2012-12-07T06:35:37Z CREATE_IN_PROGRESS User Initiated

An example of the stack creation finished

cfn-describe-stack-events basic-test-1

STACK_EVENT basic-test-1 basic-test-1 AWS::CloudFormation::Stack 2012-12-07T06:36:24Z CREATE_COMPLETE

STACK_EVENT basic-test-1 Ec2Instance AWS::EC2::Instance 2012-12-07T06:36:24Z CREATE_COMPLETE

STACK_EVENT basic-test-1 Ec2Instance AWS::EC2::Instance 2012-12-07T06:35:42Z CREATE_IN_PROGRESS

STACK_EVENT basic-test-1 basic-test-1 AWS::CloudFormation::Stack 2012-12-07T06:35:37Z CREATE_IN_PROGRESS User Initiated

To delete the stack, you use the cfn_delete_stack command and give it the stackname. An example run is shown below.

cfn-delete-stack basic-test-1

Warning: Deleting a stack will lead to deallocation of all of the stack's resources. Are you sure you want to delete this stack? [Ny]y

At this point we’ve covered writing some basic templates and how to get started using a template with the CLI tools.

Where to go from here

To start you should read the Learn Template Basics and Working with Templates documentation.

While writing and exploring templates, I highly recommend getting familiar with the Template Reference which has detailed docs on the various Template types, their properties, return values, etc.

Finally, Amazon has provided a wide variety of templates in the Sample Templates library, ranging from single EC2 instances, to Drupal or Redmine application stacks, and even a full blown multi-tier application in a VPC, which you’re able to download and run.

I’ve put the samples from this article in the Github repository as well.

I hope you’ve found this post helpful in getting started with CloudFormation.

Amazon Relational Database Service

Amazon Relational Database Service

Amazon’s Relational Database Service (RDS) allows you to create and run MySQL, Oracle, or SQL Server database servers without the need to manually create EC2 instances, manage the instance operating system, and install, then manage the database software itself. Amazon has also done the work of automating synchronous replication and failover so that you can run a pair of database instances in a Multi-AZ (for MySQL and Oracle) with a couple clicks/API calls. And through CloudWatch integration, you’re able to get metrics and alerting for your RDS database instances. As with all AWS services, you pay for your RDS instances by the hour, with some options for paying ahead and saving some cost, the cost of an RDS instance depends on the instance size, if you use Multi-AZ, the database type, if you use Provisioned IOPS, and any data transferred to the Internet or other AWS regions.

This post will take you through getting started with RDS, some of the specifics of each database engines, and some suggestions on using RDS in your application’s infrastructure.

RDS instance options

RDS instances come in two flavors, On Demand and Reserved. On Demand instances are paid for by the hour, based on the size of the instance, while Reserved instances are paid for based on a one or three year basis.

RDS instance classes mirror those of normal EC2 instances and are described in detail on Amazon’s site.

A couple compelling features of RDS instance types are that

  1. You’re able to scale your RDS instances up in memory and compute resources on the fly, and with MySQL and Oracle instances, you’re also able to grow your storage size on the fly, from 100GB to 1TB of space.
  2. You’re able to use Provisioned IOPS to provide guaranteed performance to your database storage. You can provision from 1,000 IOPS to 10,000 IOPS with corresponding storage from 100GB to 1TB for MySQL and Oracle engines but if you are using SQL Server then the maximum IOPS you can provision is 7,000 IOPS.

RDS instances are automatically managed, including OS and database server/engine updates, which occur during your weekly scheduled maintenance window.

Further Reading

Creating RDS instances

We’re going to assume you’ve already setup with an AWS IAM account and API key to manage your resources.

You can get started with creating RDS instances through one of three methods

  1. The AWS Console
  2. AWS’s command line tools
  3. The RDS API or one of the API’s many libraries

To create an RDS instance through the console, you do the following:

  1. Select your region, then select the RDS service
  2. Click on database instances
  3. Select Launch a database instance
  4. Select the database engine you need
  5. Select the instance details, this may include the DB engine version
  6. Select the instance class desired. If you’re just experiment, a db.t1.micro is a low-cost option for this.
  7. Select if you want this to be a Multi-AZ deployment
  8. Choose the amount of allocated storage in GB
  9. Select if you wish to use Provisioned IOPS (this costs extra)
  10. Fill in your DB identifier, username, and desired password.
  11. Choose your database name
  12. You can modify the port your database service listens on, customize if you want to use a VPC, or choose your AZ. I would consider these advanced topics and details on some will be covered in future AWS Advent posts.
  13. You can choose to disable backups (you really shouldn’t) and then set the details of how many backups you want and how often they should be made.
  14. At this point you are ready to launch the database instance, start using it (and paying for it).

To create a database instance with AWS’s cli tools, you do the following:

  1. Download and Install the CLI tools

  2. Once you have the tools installed and working, you’ll use the rds-create-db-instance tool to create your instance

  3. An example usage of the command can be found below

    rds-create-db-instance SimCoProd01 -s 10 -c db.m1.large -e mysql -u master -p Kew2401Sd

To create a database instance using the API, you do the following:

  1. Review the API docs to familiarize yourself with the API or obtain the library for the programming language of your choice and review it’s documentation.

  2. If you want to try creating an instance directly from the API, can do so with the CreateDBInstance API call.

An example of calling the API directly can be found below

curl -v<AWS Access Key ID>&Signature=<Signature>

Modifying existing instances

There are a number of modifications you can make to existing instances. Including:

  • Changing the engine version of a specific database type, e.g. going from MySQL 5.1 to MySQL 5.5
  • Converting a single instance to a Multi-AZ deployment.
  • Increasing the allocated storage
  • Changing your backup options
  • Changing the scheduled maintenance window

All these kinds of changes can be made through the console, via the cli tools, or through the API/libraries.

Further Reading

Things to consider when using RDS instances

There are a number of things to consider when using RDS instances, both in terms of sizing your instances, and AWS actions that can affect your instances.


Since RDS instances are easily resizable and include CloudWatch metrics, it is relatively simple to start with a smaller instance class and amount of storage, and grow as needed. If possible, I recommend doing some benchmarking with what you think would be a good starting point and verify if the class and storage you’ve chosen does meet your needs.

I would also recommend that you choose to start with using Provisioned IOPS and a Multi-AZ setup. While this is more expensive, you’re guaranteeing a level of performance and reliability from the get-go, and will help mitigate some of the things below that can affect your RDS instances.

Further Reading


Backup storage up to the amount of your instance’s allocated storage is included at no additional cost, so you should at least leave the default of 1 day of backups, but should consider using a longer window, at least 7 days.

Per the RDS FAQ on Backups:

The automated backup feature of Amazon RDS enables point-in-time recovery of your DB Instance. When automated backups are turned on for your DB Instance, Amazon RDS automatically performs a full daily snapshot of your data (during your preferred backup window) and captures transaction logs (as updates to your DB Instance are made). When you initiate a point-in-time recovery, transaction logs are applied to the most appropriate daily backup in order to restore your DB Instance to the specific time you requested. Amazon RDS retains backups of a DB Instance for a limited, user-specified period of time called the retention period, which by default is one day but can be set to up to thirty five days. You can initiate a point-in-time restore and specify any second during your retention period, up to the Latest Restorable Time. You can use the DescribeDBInstances API to return the latest restorable time for you DB Instance(s), which is typically within the last five minutes.

So have a good window of point-in-time and daily backups will ensure you have sufficient recovery options in the case of disaster or any kind of data loss.

The point in time recovery does not affect instances, but the daily snapshots do cause a pause in all IO to your RDS instance in the case of single instances, but if you’re using a Multi-AZ deployment, this snapshot is done from the hidden secondary, causing the secondary to fall slightly behind your primary, but without causing IO pauses to the primary. This is an additional reason I recommend accepting the cost and using Multi-AZ as your default.

Further Reading


You can initiate additional snapshots of the database at any time, via the console/CLI/API, which will cause a pause in all IO to single RDS instances and a pause to the hidden secondary of Multi-AZ instances.

All snapshots are stored to S3, and so are insulated from RDS instance failure. However, these snapshots are not accessible to other services, so if you’re wanting backups for offsite DR, you’ll need to orchestrate your own SQL level dumps via another method. A t1.micro EC2 instance that makes dumps and stores to S3 in another region is a relatively straightforward strategy for accomplishing this.

Further Reading

Upgrades and Maintenance Windows_

Because RDS instances are meant to be automatically managed, each RDS instance will have a weekly scheduled maintenance window. During this window the instance becomes unavailable while OS and database server/engine updates are applied. If you’re using a Multi-AZ deployment, then your secondary will be updated, failed over to, then your previous primary is upgraded as the new secondary, this is another reason I recommend accepting the cost and using Multi-AZ as your default.

Further Reading



MySQL RDS instances support a Multi-AZ deployment. A Multi-AZ deployment is comprised of a primary server which accepts reads and writes and a hidden secondary, in another AZ within the region, which synchronously replicates from the primary. You send your client traffic to a CNAME, which is automatically failed over to the secondary in the event of a primary failure.

Backups and snapshots are performed against the hidden secondary, and automatic failover to the secondary occurs during maintenance window activities.

Further Reading

Read Replicas

MySQL RDS instances also support a unique feature called Read Replicas. These are additional replicas you create, within any AZ in a region, which asynchronously replicate from a source RDS instance. The primary in the case of Multi-AZ deployments.

Further Reading



Oracle RDS instances support a Multi-AZ deployment. Similar in setup to the MySQL Multi-AZ setup, there is a primary server which accepts reads and writes and a hidden secondary, in another AZ within the region, which synchronously replicates from the primary. You send your client traffic to a CNAME, which is automatically failed over to the secondary in the event of a primary failure.

Further Reading

SQL Server


Unfortunately, SQL Server RDS instances do not have a Multi-AZ option at this time.

Further Reading

Welcome to AWS Advent 2012

Welcome to the AWS Advent calendar for 2012.

We’ll be exploring a variety of things from AWS ecosystem, including RDS, using VPCs, CloudFormation, strategies for bootstrapping Puppet/Chef onto new EC2 instances, automating your AWS usage with Boto, and some of the exciting announcements/talks that just came out of AWS Re:Invent.

The goal of this advent calendar is to help folks new to AWS services and concepts learn more about them in a practical way, as well as expose and enlighten seasoned AWS users to some things they have missed.

You can follow along here or on Twitter