AWS Advent 2014 is a wrap!

AWS Advent 2014: Repeatable Infrastructure with CloudFormation and YAML

Ted Timmons is a long-term devops nerd and works for Stanson Health, a healthcare startup with a fully remote engineering team.

One key goal of a successful devops process – and successful usage of AWS – is to create automated, repeatable processes. It may be acceptable to spin up EC2 instances by hand in the early stage of a project, but it’s important to convert this from a manual experiment to a fully described system before the project reaches production.

There are several great tools to describe the configuration of a single instance- Ansible, Chef, Puppet, Salt- but these tools aren’t well-suited for describing the configuration of an entire system. This is where Amazon’s CloudFormation comes in.

CloudFormation was launched in 2011. It’s fairly daunting to get started with, errors in CloudFormation templates are typically not caught until late in the process, and since it is fed by JSON files it’s easy to make mistakes. Proper JSON is unwieldy (stray commas, unmatched closing blocks), but it’s fairly easy to write YAML and convert it to JSON.

EC2-VPC launch template

Let’s start with a simple CloudFormation template to create an EC2 instance. In this example many things are hardcoded, like the instance type and AMI. This cuts down on the complexity of the example. Still, it’s a nontrivial example that creates a VPC and other resources. The only prerequisite for this example is to create a keypair in the US-West-2 region called “advent2014”.

As you look at this template, notice both the quirks of CloudFormation (especially “Ref” and “Fn::GetAtt”) and the quirks of JSON. Even with some indentation the brackets are complex, and correct comma placement is difficult while editing a template.


Next, let’s convert this JSON example to YAML. There’s a quick converter in this article’s repository, with python and pip installed, the only other dependency should be to install PyYAML with pip.

Since JSON doesn’t maintain position of hashes/dicts, the output order may vary. Here’s what it looks like immediately after conversion:

Only a small amount of reformatting is needed to make this file pleasant: I removed unnecessary quotes, combined some lines, and moved the ‘Type’ line to the top of each resource.

YAML to JSON to CloudFormation

It’s fairly easy to see the advantages of YAML in this case- it has a massive reduction in brackets and quotes and no need for commas. However, we need to convert this back to JSON for CloudFormation to use. Again, the converter is in this article’s repository.

That’s it!

Ansible assembly

If you would like to use Ansible to prepare and publish to CloudFormation, my company shared an Ansible module to compile YAML into a single JSON template. The shared version of the script is entirely undocumented, but it compiles a full directory structure of YAML template snippets into a template. This significantly increases readability. Just placecloudformation_assemble in your library/ folder and use it like any other module.

If there’s interest, I’ll help to document and polish this module so it can be submitted to Ansible. Just fork and send a pull request.


AWS Advent 2014: CloudFormation woes: Keep calm and use Ansible

Today’s post on using Ansible to help you get the most out of CloudFormation comes to use from Soenke Ruempler, who’s helping keep things running smoothly at Jimdo.

No more outdated information, a single source of truth. Describing almost everything as code, isn’t this one of the DevOps dreams? Recent developments have made this dream even closer. In the Era of APIs, tools like TerraForm and Ansible have evolved which are able to codify the creation and maintenance of entire “organizational ecosystems”.

This blog post is a brief description of the steps we have taken to come closer to this goal at my employer Jimdo. Before we begin looking at particular implementations, let’s take the helicopter view and have a look at the current state and the problems with it.

Current state

We began to move to AWS in 2011 and have been using CloudFormation from the beginning. While we currently describe almost everything in CloudFormation, there are some legacy pieces which were just “clicked” through the AWS console. In order to to have some primitive auditing and documentation for those, we usually document all “clicked” settings with a Jenkins job, which runs Cucumber scenarios that do a live inspection of the settings (by querying the AWS APIs with a read-only user).

While this setup might not look that bad and has a basic level of codification, there are several drawbacks, especially with CloudFormation itself, which we are going to have a look at now.

Problems with the current state

Existing AWS resources cannot be managed by CloudFormation

Maybe you have experienced this same issue: You start off with some new technology or provider and initially use the UI to play around. And suddenly, those clicked spikes are in production. At least this is the story how we came to AWS at Jimdo 😉

So you might say: “OK, then let’s rebuild the clicked resources into a CloudFormation stack.” Well, the problem is that we didn’t describe basic components like VPC and Subnets as CloudFormation stacks in the first place, and as other production setups rely on those resources, we cannot change this as easily anymore.

Not all AWS features are immediately available in CloudFormation

Here is another issue: The usual AWS feature release process is that a component team releases a new feature (e.g. ElastiCache replica groups), but the CloudFormation part is missing (the CloudFormation team at AWS is a separate team with its own roadmap). And since CloudFormation isn’t open source, we cannot add the missing functionality by ourselves.

So, in order to use those “Non-CloudFormation” features, we used to click the setup as a workaround, and then again document the settings with Cucumber.

But the click-and-document-with-cucumber approach seems to have some drawbacks:

  • It’s not an enforced policy to document, so colleagues might miss the documentation step or see no value in it
  • It might be incomplete as not all clicked settings are documented
  • It encourages a “clicking culture”, which is the exact opposite of what we want to achieve

So we need something which could be extended as a CloudFormation stack with resources that we couldn’t (yet) express in CloudFormation. And we need them to be grouped together semantically, as code.

Post processors for CloudFormation stacks

Some resources require post-processing in order to be fully ready. Imagine the creation of an RDS MySQL database with CloudFormation. The physical database was created by CloudFormation, but what about databases, users, and passwords? This cannot be done with CloudFormation, so we need to work around this as well.

Our current approaches vary from manual steps documented in a wiki to a combination of Puppet and hiera-aws: Puppet – running on some admin node – retrieves RDS instance endpoints by tags and then iterates over them and executes shell scripts. This is a form of post-processing entirely decoupled from the CloudFormation stack, actually in terms of time (hourly Puppet run) and in also in terms of “location” (it’s in another repository). A very complicated way just for the sake of automation.

Inconvenient toolset

Currently we use the AWS CLI tools in a plain way. Some coworkers use the old tools, some use the new ones. And I guess there are even folks with their own wrappers / bash aliases.

A “good” example is the missing feature of changing tags of CloudFormation stacks after creation. So if you forgot to do this in the first place, you’d need to recreate the entire stack! The CLI tools do not automatically add tags to stacks, so this is easily forgotten and should be automated. As a result we need to think of a wrapper around CloudFormation which automates those situations.

Hardcoded / copy and pasted data

The idea of “single source information” or “single source of truth” is to never have a representation of data saved in more than one location. In the database world, it’s called “database normalization”. This is a very common pattern which should be followed unless you have an excellent excuse.

But, if you may not know better, you are under time pressure, or your tooling is still immature, it’s hard to keep the data single-sourced. This usually leads to copying and pasting hardcoding data.

Examples regarding AWS are usually resource IDs like Subnet-IDs, Security Groups or – in our case- our main VPC ID.

While this may not be an issue at first, it will come back to you in the future, e.g. if you want to rollout your stacks in another AWS region, perform disaster recovery, or you have to grep for hardcoded data in several codebases when doing refactorings, etc.

So we needed something to access information of other CloudFormation stacks and/or otherwise created resources (from the so called “clicked infrastructure”) without ever referencing IDs, Security Groups, etc. directly.

Possible solutions

Now we have a good picture of what our current problems are and we can actually look for solutions!

My research resulted in 3 possible tools: AnsibleTerraForm and Salt.

As of writing this Ansible seems to be the only currently available tool which can deal with existing CloudFormation stacks out of the box and also seems to meet the other criteria at first glance, I decided to move on with it.

Spiking the solution with Ansible

Describing an existing CloudFormation stack as Ansible Playbook

One of the mentioned problems are the inconvenient CloudFormation CLI tools: To create/update a stack, you would have to synthesize at least the stack name, template file name, and parameters, which is no fun and error-prone. For example:

With Ansible, we can describe a new or existing CloudFormation stack with a few lines as an Ansible Playbook, here one example:

Creating and updating (converging) the CloudFormation stack becomes as straightforward as:

Awesome! We finally have great tooling! The YAML syntax is machine and human readable and our single source of truth from now on.

Extending an existing CloudFormation stack with Ansible

As for added power, it should be easier to implement AWS functionality that’s currently missing from CloudFormation as an Ansible module than a CloudFormation external resource […] and performing other out of band tasks, letting your ticketing system know about a new stack for example, is a lot easier to integrate into Ansible than trying to wrap the cli tools manually.

— Dean Wilson

The above example stack uses the AWS ElastiCache feature of Redis replica groups, which unfortunately isn’t currently supported by CloudFormation. We could only describe the main ElastiCache cluster in CloudFormation. As a workaround, we used to click this missing piece and documented it with Cucumber as explained above.

A short look at the Ansible documentation reveals there is currently no support for ElastiCache replica groups in Ansible as well. But a quick research shows we have the possibility to extend Ansible with custom modules.

So I started spiking my own Ansible module to handle ElastiCache replica groups, inspired by the existing “elasticache” module. This involved the following steps:

  1. Put the module under “library/”, e.g. (I published the unfinished skeleton as a Gist for reference)
  2. Add an output to the existing CloudFormation stack which is creating the ElastiCache cluster, in order to return the ID(s) of the cache cluster(s): We need them to create the read replica group(s). Register the output of the cloudformation Ansible task:
  1. Extend the playbook to create the ElastiCache replica group by reusing the output of thecloudformation task:

Pretty awesome: Ansible works as a glue language while staying very readable. Actually it’s possible to read through the playbook and have an idea what’s going on.

Another great thing is that we can even extend core functionality of Ansible without any friction (as waiting for upstream to accept a commit, build/deploy new packages, etc) which should increase the tool acceptance across coworkers even more.

This topic touches another use-case: The possibility to “chain” CloudFormation stacks with Ansible: Reusing Outputs from Stacks as parameters for other stacks. This is especially useful to split big monolithic stacks into smaller ones which as a result can be managed and reused independently (separation of concerns).

Last but not least, it’s now easy to extend the Ansible playbook with post processing tasks (remember the RDS/Database example above).

Describing existing AWS resources as a “Stack”

As mentioned above, one issue with CloudFormation is a a way to import existing infrastructure into a stack. Luckily, Ansible supports most of the AWS functionality so we can create a playbook to express existing infrastructure as code.

To discover the possibilities, I converted a fraction of our current production VPC/subnet setup into an Ansible playbook:

As you can see, there is not even a hardcoded VPC ID! Ansible identifies the VPC by a Tag-CIDR tuple, which meets our initial requirement of “no hardcoded data”.

To stress this, I changed the aws_region variable to another AWS region, and it was possible to create the basic VPC setup in another region, which is another sign for a successful single-source-of-truth.

Single source information

Now we want to reuse the information of the VPC which we just brought “under control” in the last example. Why should we do this? Well, in order to be fully automated (which is our goal), we cannot afford any hardcoded information.

Let’s start with the VPC ID, which should be one of the most requested IDs. Getting it is relatively easy because we can just extract it from the ec2_vpc module output and assign it as a variable with the set_fact Ansible module:

OK, but we also need to reuse the subnet information – and to avoid hardcoding, we need to address them without using subnet IDs. As we tagged the subnets above, we could use the tuple (name-tag, Availability zone) to identify and group them.

With the awesome help from the #ansible IRC channel folks, I could make it work to extract one subnet by ID and Tag from the output:

While this satisfies the single source requirement, it doesn’t seem to scale very well with a bunch of subnets. Imagine you’d have to do this for each subnet (we already have more than 50 at Jimdo).

After some research I found out that it’s possible to add custom filters to Ansible that allow to manipulate data with Python code:

We can now assign the subnets for later usage like this in Ansible:

This is a great way to prepare the subnets for later usage, e.g. in iterations, to create RDS or ElastiCache subnet groups. Actually, almost everything in a VPC needs subnet information.

Those examples should be enough for now to give us confidence that Ansible is a great tool which fits our needs. Takeaways

As of of writing this, Ansible and CloudFormation seem to be a perfect fit for me. The combination turns out to be a solid solution to the following problems:

  • Single source of information / no hardcoded data
  • Combining documentation and “Infrastructure as Code”
  • Powerful wrapper around basic AWS CLI tooling
  • Inception point for other orchestration software (e. g. CloudFormation)
  • Works with existing AWS resources
  • Easy to extend (Modules, Filters, etc: DSL weaknesses can be worked around by hooking in python code)

Next steps / Vision

After spiking the solution, I could imagine the following next steps for us:

  • Write playbooks for all existing stacks and generalize concepts by extracting common concepts (e.g. common tags)
  • Transform all the tests in Cucumber to Ansible playbooks in order to have a single source
  • Remove hardcoded IDs from existing CloudFormation stacks by parameterizing them via Ansible.
  • Remove AWS Console (write) access to our Production AWS account in order to enforce the “Infrastructure as Code” paradigm
  • Bring more clicked infrastructure / ecosystem under IaC-control by writing more Ansible modules (e.g. GitHub Teams and Users, Fastly services, Heroku Apps, Pingdom checks)
  • Spinning up the VPC including some services in another region in order to prove we are fully single-sourced (e. g. no hardcoded IDs) and automated.
  • Trying out Ansible Tower for:
    • Regular convergence runs in order to avoid configuration drift and maybe even revert clicked settings (similar to “Simian army” approach)
    • A “single source of Infrastructure updates”
  • Practices like Game Days to actually test Disaster recovery scenarios

I hope this blog post has brought some new thoughts and inspirations to the readers. Happy holidays!


AWS Advent 2014 – Using Terraform to build infrastructure on AWS

Today’s post on using Terraform to build infrastructure on AWS comes from Justin Downing.


Building interconnected resources on AWS can be challenging. A simple web application can have a load balancer, application servers, DNS records, and a security group. While a sysadmin can launch and manage these resources from the AWS web console or a CLI tool like fog, doing so can be time consuming and tedious considering all the metadata you have to shuffle amongst the other resources.

An elegant solution to this problem has been solved by the fine folks at Hashicorp: Terraform. This tool aims to take the concept of “infrastructure as code” and add the missing pieces that other provisioning tools like fog miss, namely the glue to interconnect your resources. For anyone with a background in software configuration management (Chef, Puppet), then using Terraform should be a natural fit for describing and configuring infrastructure resources.

Terraform can be used with several different providers including AWS, GCE, and Digital Ocean. We will be discussing provisioning resources on AWS. You can read more about the built-in AWS provider here.


Terraform is written in go and distributed as a package of binaries. You can download the appropriate package from the website. If you are using OSX and homebrew, you can simply brew install terraform to get everything installed and setup.


Now, that you have Terraform installed, let’s build some infrastructure! Terraform configuration files are text files that resemble JSON, but are more readable and can include comments. These files should end in .tf (more details on configuration is available here). Rather than invent an example to use Terraform with AWS, I’m going to step through the example published by Hashicorp.

NOTE: I am assuming here that you have AWS keys capable of creating/terminating resources. Also, it would help if had the AWS CLI is installed and configured as Terraform will use those credentials to interract with AWS. The example below is using AWS region us-west-2.

Let’s use the AWS Two-Tier example to build an ELB and EC2 instance:

Here, we initialized a new directory with the example. Then, we created a new keypair and saved the private key to our directory. Here, you will note the files with the tf extension. These are the configuration files used to describe the resources we want to build. As the name indicates, one is the main configuration, one contains the variables used, and one describes the desired output. When you build this configuration, Terrraform will combine all .tf files in the current directory to greate theresource graph.

Make a Plan

I encourage you to review the configuration details in, and With the help of comments and descriptions, it’s very easy to learn how different resources are intended to work together. You can also run plan to see how Terraform intends to build the resources you declared.

This also doubles as a linter by checking the validity of your configuration files. For example, if I comment out the instance_type in, we receive an error:


You will note that some pieces of the configuration are parameterized. This is very useful when sharing your Terraform plans, committing them to source control, or protecting sensitive data like access keys. By using variables and setting defaults for some, you allow for better portability when you share your Terraform plan with other members of your team. If you define a variable that does not have a default value, Terraform will require that you provide a value before proceeding. You can either (a) provide the values on the command line or (b) write them to a terraform.tfvars file. This file acts like a “secrets” file with a key/value pair on each line. For example:

Due to the sensitive information included in this file, it is recommended that you includeterraform.tfvars in your source control ignore list (eg: echo terraform.tfvars >> .gitignore) if you want to share your plan.

Build Your Infrastructure

Now, we can build the resources using apply:

The output above is truncated, but Terraform did a few things for us here:

  1. Created a ‘terraform-example’ security group allowing SSH and HTTP access
  2. Created an EC2 instances from the Ubuntu 12.04 AMI
  3. Created an ELB instance and used the EC2 instance as its backend
  4. Printed the ELB public DNS address in the Outputs section
  5. Saved the state of your infrastructure in a terraform.tfstate file

You should be able to open the ELB public address in a web browser and see “Welcome to Nginx!” (note: this may take a minute or two after initialization in order for the ELB health check to pass).

The terraform.tfstate file is very important as it tracks the status of your resources. As such, if you are sharing your configurations, it is recommended that you include this file in source control. This way, after initializing some resources, another member of your team will not try and re-initialize those same resources. In fact, she can see the status of the resources with terraform show. In the event the state has not been kept up-to-date, you can use terraform refresh to update the state file.

And…that’s it! With a few descriptive text files, Terraform is able to build cooperative resources on AWS in a matter of minutes. You no longer need complicated wrappers around existing AWS libraries/tools to orchestrate the creation or destruction of resources. When you are finished, you can simply run terraform destroy to remove all the resources described in your .tf configuration files.


With Terraform, building infrastructure resources is as simple as describing them in text. Of course, there is a lot more you can do with this tool including managing DNS records and configure Mailgun. You can even mix these providers together in a single plan (eg: EC2 instances, DNSimple records, Atlas metadata) and Terraform will manage it all! Check out the documentation and examples for the details.

Terraform Docs:
Terraform Examples:

AWS Advent 2014 – Exploring AWS Lambda

Today’s post comes to us from Mark Nunnikhoven, who is the VP of Cloud & Emerging Technologies .

At this year’s re:Invent, AWS introduced a new service (currently in preview) call Lambda. Mitch Garnaat already introduced the service to the advent audience in the first post of the month.

Take a minute to read Mitch’s post if you haven’t already. He provides a great overview of the service, it’s goals, and he’s created a handy tool, Kappa, that simplifies using the new service.

Going Deeper

Of course Mitch’s tool is only useful if you already understand what Lambda does and where best to use it. The goal of this post is to provide that understanding.

I think Mitch is understating things when he says that “there are some rough edges”. Like any AWS service, Lambda is starting out small. Thankfully–like other services–the documentation for Lambda is solid.

There is little point in creating another walk through setting up a Lambda function. This tutorial from AWS does a great job of the step-by-step.

What we’re going to cover today are the current challenges, constraints, and where Lambda might be headed in the future.


1. Invocation vs Execution

During a Lambda workflow 2 IAM roles are used. This is the #1 area where people get caught up.

A role is an identity used in the permissions framework of AWS. Roles typically have policies attached that dictate what the role can do within AWS.

Roles are a great way to provide (and limit) access within passed access and secret keys around.

Lambda uses 2 IAM roles during it’s workflow, an invocation role and an execution role. While the terminology is consistent within computer science it’s needlessly confusing for some people.

Here’s the layman’s version:

  • invocation role => the trigger
  • execution role => the one that does stuff

This is an important difference because while the execution role is consistent in the permissions it needs, the invocation role (the trigger) will need different permissions depending on where you’re using you Lambda function.

If you’re hooking your Lambda function to an S3 bucket, the invocation role will need the appropriate permissions to have S3 call your Lambda function. This typically includes the lambda:InvokeAsync permission and a trust policy that allows the bucket to assume the invocation role.

If you’re hooking your function into a Kinesis event stream, the same logic applies but in this case you’re going to have to allow the invocation role access to your Kinesis stream since it’s a pull model instead of the S3 push model.

The AWS docs sum this up with the following semi-helpful diagrams:

S3 push model for Lambda permissionsS3 push model for Lambda permissions Kinesis pull model for Lambda permissionsKinesis pull model for Lambda permissions

Remember that your invocation role always needs to be able to assume a role (sts:AssumeRole) and access the event source (Kinesis stream, S3 bucket, etc.)

2. Deployment of libraries

TL:DR Thank Mitch for starting Kappa.

The longer explanation is that packaging up the dependencies of your code can be a bit of the pain. That’s because we have little to no visibility into what’s happening.

Until the service and associated tooling matures a bit, we’re back to world of printf or at least

For Lambda a deployment package is your javascript code and any supporting libraries. These need to be bundled into a .zip file. If you’re just deploying a simple .js file, .zip it and you’re good to go.

If you have addition libraries that you’re providing, buckle up. This ride is about to get real bumpy.

The closest things we have to a step-by-step on providing additional libraries is this step from one of the AWS tutorials.

The instructions here are to install a separate copy of node.js, create a subfolder, and then install the required modules via npm.

Now you’re going to .zip your code file and the modules from the subfolder but not the folder itself. From all appearances the .zip needs to be a flat file.

I’m hopeful there will be more robust documentation on this soon but in the meantime please share your experiences in the AWS forums or on Twitter.


As Lambda is in preview there are additional constraints beyond what you can expect when it is launched into production.

Current constraints:

  1. functions must executed in <= 1GB of memory
  2. functions must complete execution in <= 60 seconds
  3. functions must be written in Javascript (run on node.js)
  4. functions can only access 512 MB of temp disk space
  5. functions can only open 1024 file descriptors
  6. functions can only use 1024 threads+processes

These constraints also leads to some AWS recommendations that are worth reading and taking to heart however one stands out above all the others.

Write your Lambda function code in a stateless style”, AWS Lambda docs.

This is by far the best piece of advice that one can offer when it comes to Lambda design patterns. Do not try to bolt state on using another service or data store. Treat Lambda as an opportunity to manipulate data mid-stream. Lambda functions execute concurrently.Thinking of it in functional terms will save you a lot of headaches down the road.

The Future?

One of the most common reactions I’ve heard about AWS Lambda is, “So what?”. That’s understandable but if you look at AWS’ track record, they ship very simple but useful services and iterate very quickly on them.

While Lambda may feel limited today, expect things to change quickly. Kinesis, DynamoDB, and S3 are just the beginning. The “custom” route today provides a quick and easy way to offload some data processing to Lambda but that will become exponentially more useful as “events” start popping up in other AWS services.

Imagine trigger Lambda functions based on SNS messages, CloudWatch Log events, Directory Service events, and so forth.

Look to tagging in AWS as an example. It started very simple in EC2 and over the past 24 months has expanded to almost every service and resource in the environment. Event’s will most likely follow the same trajectory and with every new event Lambda gets even more powerful.

Additional Reading

Getting in on the ground floor of Lambda will allow you to shed more and more of your lower level infrastructure as more events are rolled out to production.

Here’s some holiday reading to ensure you’re up to speed:

AWS Advent 2014 – A Quick Look at AWS CodeDeploy

Today’s AWS Advent post comes to us from Mitch Garnaat, the creator of the AWS python library boto and who is currently herding clouds and devops over at Scopely. He’s gonna walk us through a quick look at AWS CodeDeploy

Software deployment. It seems like such an easy concept. I wrote some new code and now I want to get it into the hands of my users. But there are few areas in the world of software development where you find a more diverse set of approaches to a such a simple-sounding problem. In all my years in the software business, I don’t think I’ve ever seen two deployment processses that are the same. So many different tools. So many different approaches. It turns out it is a pretty complicated problem with a lot of moving parts.

But there is one over-arching trend in the world of software deployment that seems to have almost universal appeal. More. More deployments. More often.

Ten years ago it was common for a software deployment to happen a few times a year. Software changes would be batched up for weeks or months waiting for a release cycle and once the release process started, development stopped. All attention was focused on finding and fixing bugs and, eventually, releasing the code. It was very much a bimodal process: develop for a while and then release for a while.

Now the goal is to greatly shorten the time it takes to get a code change deployed, to make the software deployment process quick and easy. And the best way to get good at something is to do it a lot of times.

“Repetition is the mother of skill.” – Anthony Robbins

If we force ourselves to do software deployment frequently and repeatedly we will get better and better at it. The process I use to put up holiday lights is appallingly inefficient and cumbersome. But since I only do it once a year, I put up with it. If I had to put those lights up once a month or once a week or once a day, the process would get better in a hurry.

The ultimate goal is Continuous Deployment, a continuous pipeline where each change we commit to our VCS is pushed through a process of testing and then, if the tests succeed, is automatically released to production. This may be an aspirational goal for most people and there may be good reasons not to have a completely automated pipeline (e.g. dependencies on other systems) but the clear trend is towards frequent, repeatable software deployment without the bimodal nature of traditional deployment techniques.

Why AWS CodeDeploy Might Help Your Deployment Process

Which brings us to the real topic of today’s post. AWS CodeDeploy is a new service from AWS specifically designed to automate code deployment and eliminate manual operations.

This post will not be a tutorial on how to use AWS CodeDeploy. There is an excellent hands-on sample deployment available from AWS. What this post will focus on is some of the specific features provided by AWS CodeDeploy that might help you achieve the goal of faster and more automated software deployments.

Proven Track Record

This may seem like a contradiction given that this is a new service from AWS but the underlying technology in AWS CodeDeploy is not new at all. This is a productization of an internal system calledApollo that has been used for software deployments within Amazon and AWS for many years.

Anyone who has worked at Amazon will be familiar with Apollo and will probably rave about it. Its rock solid and has been used to deploy thousands of changes a day across huge fleets of servers within Amazon.

Customizable Deployment Configurations

You can control how AWS CodeDeploy will roll out the deployment to your fleet using a deployment configuration. There are three built-in configurations:

  • All At Once – Deploy the new revision to all instances in the deployment group at once. This is probably not a good idea unless you have a small fleet or you have very good acceptance tests for your new code.

  • Half At A Time – Deploy the new revision to half of the instances at once. If a certain number of those instances fail then fail the deployment.

  • One At A Time – Deploy the new revision to one instance at a time. If deployment to any instance fails, then fail the deployment.

You can also create custom deployment configurations if one of these models doesn’t fit your situation.

Auto Scaling Integration

If you are deploying your code to more than one instance and you are not currently using Auto Scaling you should stop reading this article right now and go figure out how to integrate it into your deployment strategy. In fact, even if you are only using one instance you should use Auto Scaling. Its a great service that can save you money and allow you to scale with demand.

Assuming that you are using Auto Scaling, AWS CodeDeploy can integrate with your Auto Scaling groups. By using lifecycle hooks in Auto Scaling AWS CodeDeploy can automatically deploy the specified revision of your software on any new instances that Auto Scaling creates in your group.

Should Work With Most Apps

AWS CodeDeploy uses a YAML-format AppSpec file to drive the deployment process on each instance. This file allows you to map source files in the deployment package to their destination on the instance. It also allows a variety of hooks to be run at various times in the process such as:

  • Before Installation
  • After Installation
  • When the previous version of your Application Stops
  • When the Application Starts
  • After the service has been Validated

These hooks can be arbitrary executables such as BASH scripts or Python scripts and can do pretty much anything you need them to do.

Below is an example AppSpec file.


AWS CodeDeploy can be driven either from the AWS Web Console or from the AWS CLI. In general, my feeling is that GUI interfaces are great for monitoring and other read-only functions but for command and control I strongly prefer CLI’s and scripts so its great that you can control every aspect of AWS CodeDeploy via the AWSCLI or any of the AWS SDK’s. I will say that the Web GUI for AWS CodeDeploy is quite well done and provides a really nice view of what is happening during a deployment.

Free (like beer)

There is no extra charge for using AWS CodeDeploy. You obviously pay for all of the EC2 instances you are using just as you do now but you don’t have to pay anything extra to use AWS CodeDeploy.

Other Things You Should Know About AWS CodeDeploy

The previous section highlights some features of AWS CodeDeploy that I think could be particularly interesting to people considering a new deployment tool.

In this section, I want to mention a couple of caveats. These are not really problems but just things you want to be aware of in evaluating AWS CodeDeploy.

EC2 Deployments Only

AWS CodeDeploy only supports deployments on EC2 instances at this time.


AWS CodeDeploy requires an agent to be installed on any EC2 instance that it will be deploying code to. Currently, they support Amazon Linux, Ubuntu, and Windows Server.

No Real Rollback Capability

Because of the way AWS CodeDeploy works, there really isn’t a true rollback capability. You can’t deploy code to half of your fleet and then undeploy that latest revision. You can simulate a rollback by simply creating a new deployment of your previous version of software but there is no Green/Blue type rollback available.


We just created a new deployment pipeline at work that implements a type of BLUE/GREEN deployment and is based on Golden AMI’s. We are very happy with that and I don’t think we will be revisiting that anytime soon. However if I was starting that project today, I would certainly give a lot of thought to using AWS CodeDeploy. It has a nice feature set, can be easily integrated into most environmenets and code bases, and is based on rock-solid, proven technology. And the price is right!


AWS Advent 2014 – Managing EC2 Security Groups using Puppet

Today’s post on managing EC2 Security Groups with Puppet comes to use from Gareth Rushgrove, the awesome curator of DevOps Weekly and who is currently an engineer at PuppetLabs.

At Puppet Labs we recently shipped a module to make managing AWS easier. This tutorial shows how it can be used to manage your security groups. EC2 Security groups act as a virtual firewall and are used to isolate instances and other AWS resources from each other and the internet.

An example

You can find the full details about installation and configuration for the module in the official READMEbut the basic version, assuming a working Puppet and Ruby setup, is:

You’ll also want to have your AWS API credentials in environment variables (or use IAM if you’re running from within AWS).

First lets create a simple security group called test-sg in the us-east-1 region. Save the following to a file called securitygroup.pp:

Now lets run Puppet to create the group:

You should see something like the following output:

We’re running here with apply and the --test flag so we can easily see what’s happening, but if you have a Puppet master setup you can run with an agent too.

You will probably change your security groups over time as you’re infrastructure evolves. And managing that evolution is where Puppet’s declarative approach really shines. You can have confidence in the description of your infrastructure in code because Puppet can tell you about any changes when it runs.

Next lets add a new ingress rule to our existing group. Modify the securitygroup.pp file like so:

And again lets run Puppet to modify the group:

You should see something like the following output:

Note the information about changes to the ingress rules as we expected. You can also check the changes in the AWS console.

The module also has full support for the Puppet resource command, so all of the functionality is available from the command line as well as the DSL. As an example lets clean-up and delete the group created above.

Hopefully that’s given you an idea of what’s possible with the Puppet AWS module. You can see more examples of the module in action in the main repository.


Some of the advantages of using Puppet for managing AWS resources are:

  • The familiar DSL – if you’re already using Puppet the syntax will already be familiar, if you’re not already using Puppet you’ll find lots of good references and documentation
  • Puppet is a declarative tool – Puppet is used to declare the desired state of the world, this means it’s useful for maintaining state and changing resources over time, as well as creating new groups
  • Existing tool support – whether it’s the Geppetto IDE, testing tools like rspec-puppet or syntax highlighting for your favourite editor lots of supporting tooling already exists

The future

The current preview release of the module supports EC2 instances, security groups and ELB load balancers, with work on support for VPC, Route53 and Autoscaling Groups available soon. We’re looking for as much feedback as possible at the moment so feel free to report issues on GitHub), ask questions on the puppet-user mailing list or contact me on twitter at @garethr

AWS Advent 2014 – Finding AWS Resources Across Regions, Services, and Accounts with skew

Our first AWS Advent post comes to us from Mitch Garnaat, the creator of theAWS python library boto and who is currently herding clouds and devops over at Scopely. He’s gonna walk us through how we can discover more about our Amazon Resources using the awesome tool he’s been building, called skew.

If you only have one account in AWS and you only use one service in one region, this article probably isn’t for you. However, if you are like me and manage resources in many accounts, across multiple regions, and in many different services, stick around.

There are a lot of great tools to help you manage your AWS resources. There is the AWS Web Console, the AWSCLI, various language SDK’s like boto, and a host of third-party tools. The biggest problem I have with most of these tools is that they limit your view of resources to a single region, a single account, and a single service at a time. For example, you have to login to the AWS Console with one set of credentials representing a single account. And once you are logged in, you have to select a single region. And then, finally, you drill into a particular service. The AWSCLI and the SDK’s follow this same basic model.

But what if you want to look at resources across regions? Across accounts? Across services? Well, that’s where skew comes in.


Skew is a Python library built on top of botocore. The main purpose of skew is to provide a flat, uniform address space for all of your AWS resources.

The name skew is a homonym for SKU (Stock Keeping Unit). SKU’s are the numbers that show up on the bar codes of just about everything you purchase and that SKU number uniquely identifies the product in the vendor’s inventory. When you make a purchase they scan the barcode containing the SKU and can instantly find the pricing data for the item.

Similary, skew uses a unique identifier for each one of your AWS resources and allows you to scanthe SKU and quickly find the details for that resource. It also provides some powerful mechanisms to find sets of resources by allowing wildcarding and regular expressions within the SKU’s.

ARN’t You Glad You Are Reading This?

So, what do we use for a unique identifier for all of our AWS resources? Well, as it turns out, AWS has already solved that problem for us. Each resource in AWS can be identified by an Amazon Resource Name or ARN. The general form for ARN’s are:

So, the ARN for an EC2 instances might look like this:

This tells us the instance is in the us-west-2 region, running in the account identified by the account number 123456789012 and the instance has an instance ID of i-12345678.

Getting Started With Skew

The easiest way to install skew is via pip.

Because skew is based on botocore, as is AWSCLI, it will use the same credentials as those tools. You need to make a small addition to your ~/.aws/config file to help skew map AWS account ID’s to the profiles in the config file. Check the README for details on that.

Let’s Find Some Stuff

Once we have skew installed and configured, we can use it to find resources based on their ARN’s. For example, using the example ARN above:

Ok, that wasn’t very exciting. How do I get at my actual resource in AWS? Well, the scan method returns an ARN object and this object supports the iterator pattern in Python. This makes sense since as we will see later this ARN can actually return a lot of objects, not just one. So if we want to get our object we can:

Iterating on an ARN returns a list of Resource objects and each of these Resource objects represents one resource in AWS. Resource objects have a number of attributes like id and they also have an attribute called data that contains all of the data about that resource. This is the same information that would be returned by the AWSCLI or an SDK.

Wildcards And Regular Expressions

Finding a single resource in AWS is okay but one of the nice things about skew is that it allows you to quickly find lots of resources in AWS. And you don’t have to worry about which region those resources are in or in which account they reside.

For example, let’s say we want to find all EC2 instances running in all regions and in all of my accounts:

In that one little line of Python code, a lot of stuff is happening. Skew will iterate through all of the regions supported by the EC2 service and, in each region, will authenticate with each of the account profiles listed in your AWS config file. It will then find all EC2 instances and finally return the complete list of those instances as Resource objects.

In addition to wildcards, you can also use regular expressions as components in the ARN. For example:

This will find all DynamoDB tables in all US regions for all accounts.

Some Useful Examples

Here are some examples of things you can do quickly and easily with skew that would be difficult in most other tools.

Find all unattached EBS volumes across all regions and accounts and tally the size of wasted space.

Audit all EC2 security groups to find CIDR rules that are not whitelisted.

Find all EC2 instances that are not tagged in any way.

Building ARN’s Interactively

The ARN provides a great way to uniquely identify AWS resources but it doesn’t exactly roll off the tongue. Skew provides some help for constructing ARN’s interactively.

First, start off with a new ARN object.

Each ARN object contains 6 components:

  • scheme – for now this will always be arn
  • provider – again, for now always aws
  • service – the Amazon Web Service
  • region – the AWS region
  • account – the ID of the AWS account
  • resource – the resource type and resource ID

All of these are available as attributes of the ARN object.

If you want to build up the ARN interactively, you can ask each of the components what choices are available.

You can also try out your regular expressions to make sure they return the results you expect.

To set the value of a particular component, use the pattern attribute.

Once you have the ARN that you want, you can enumerate it like this:

Running Queries Against Returned Data

A recent feature of skew allows you to run queries against the resource data. This feature makes use of jmespath which is a really nice JSON query engine. It was originally written in Python for use on the AWSCLI but is now available in a number of other languages. If you have ever used the --queryoption of the AWSCLI, then you have used jmespath.

If you append a jmespath query to the end of the ARN (using a | as a separator) skew will send the data for each of the returned resources through the jmespath query and store the result in thefiltered_data attribute of the resource object. The original data is still available as the dataattribute. For example:

Then each resource returned would have the instance type store in the filtered_data attribute of theResource object. This is obviously a very simple example but jmespath is very powerful and the interactive query tool available on allows you to try your queries out beforehand to get exactly what you want.

CloudWatch Metrics

One other feature of skew is easy access to CloudWatch metrics for AWS resources. If we refer back to the very first interative session in the post, we can show how you would access those CloudWatch metrics for the instance.

We can find the available CloudWatch metrics with the metric_names attribute and then we can retrieve the desired metric using the get_metric_data method. The README for skew contains a bit more information about accessing CloudWatch metrics.

Wrap Up

Skew is pretty new and is still changing a lot. It currently supports only a subset of available AWS resource types but more are being added all the time. If you manage a lot of AWS resources, I encourage you to give it a try. Feedback, as always, is very welcome as are pull requests!

AWS Advent 2014 – High-Availability in AWS with keepalived, EBS and Elastic Network Interfaces

Today’s post on how to achieve high availability in AWS with keepalived comes to us from Julian Dunn, who’s currently helping improve things at Chef.


By now, most everyone knows that running infrastructure in AWS is not the same as a traditional data center, thus putting a lie to claims that you can just “lift and shift to the cloud”. In AWS, one normally achieves “high-availability” by scaling horizontally. For example, if you have a WordPress site, you could create several identical WordPress servers and put them all behind an Elastic Load Balancer (ELB), and connect them all to the same database. That way, if one of these servers fails, the ELB will stop directing traffic to it, but your site will still be available.

But about that database – isn’t it also a single-point-of-failure? You can’t very well pull the same horizontal-redundancy trick for services that explicitly have one writer (and potentially many readers). For a database, you could probably use Amazon Relational Database Server (RDS), but suppose Amazon doesn’t have a handy highly-available Platform-as-a-Service variant for the service you need?

In this post, I’ll show you how to use that old standby, keepalived, in conjunction with Virtual Private Cloud (VPC) features, to achieve real high-availability in AWS for systems that can’t be horizontally replicated.

Kit of Parts

To create high-availability out of two (or more) systems, you need the following components:

  • A service IP (commonly referred to as a VIP, for virtual IP) that can be moved between the systems to which client systems will communicate
  • A block device containing data served by the currently-active system that can be detached and reattached to others, should the active one fail
  • Some kind of cluster coordination system to handle master/backup election, as well as doing all the housekeeping to move the service IP and block device to the active node.

In AWS, we’ll use:

  • Private secondary addresses on an Elastic Network Interface (ENI) as the service IP.
  • A separate Elastic Block Storage (EBS) volume as the block device
  • keepalived as the cluster coordination system.

There are a few limitations to this approach in AWS. Most important is that all instances and the block storage device must live in the same VPC subnet, which implies that they live in the same availability zone (AZ).

Just Enough keepalived for HA

Keepalived for Linux has been around for over ten years, and while it is very robust and reliable, it can be very difficult to grasp because it is designed for a variety of use cases, some very distinct from the one we are going to implement. Software design diagrams like this one do not necessarily aid in understanding how it works.

For the purposes of building an HA system, you need only know a few things about keepalived:

  • As previously mentioned, keepalived serves as a cluster coordination system between two or more peers.
  • Keepalived uses the Virtual Router Redundancy Protocol (VRRP) for assigning the service IP to the active instance. It does this by talking to the Linux netlink layer directly. Thus, don’t try to useifconfig to examine whether the master’s interface has the VIP, as ifconfig doesn’t use netlink system calls and the VIP won’t show up! Use ip addr instead.
  • VRRP is normally run over multicast in a closed network segment. However, in a cloud environment where multicast is not permitted, we must use unicast, which implies that we need to list all peers participating in the cluster.
  • Keepalived has the ability to invoke external scripts whenever a cluster member transitions from backup to master (or vice-versa). We will use this functionality to associate and mount the EBS block device (or the inverse, when transitioning from master to backup).

Building the HA System

We’ll spin up two identical systems in the same VPC subnet for our master and backup nodes. To avoid passing AWS access and secret keys to the systems, I’ve created an IAM instance profile & role called awsadvent-ha with a policy document to let the systems manage ENI addresses and EBS volumes:

For this exercise I used Fedora 21 AMIs, because Fedora has a recent-enough version of keepalived with VRRP-over-unicast support:

You’ll notice that one of the security groups I’ve placed the machines into is entitled internal-icmp, which is a group I created to allow the instances to ping each other (send ICMP Echo Request and receive ICMP Echo Reply). This is what keepalived will use as a heartbeat mechanism between nodes.

We also need a separate EBS volume for the data, so let’s create one in the same AZ as the instances:

Note that the volume needs to be partitioned and formatted at some point; I don’t do that in this tutorial.

Installing and configuring keepalived

Once the two machines are up and reachable, it’s time to install and configure keepalived. SSH to them and type:

I intend to write the external failover scripts called by keepalived in Ruby, so I’m going to install that, and the fog gem that will let me communicate with the AWS API:

keepalived is configured using the /etc/keepalived/keepalived.conf file. Here’s the configuration I used for this demo:

A couple of notes about this configuration:

  • is the current machine; is its peer. The peer has the IPs reversed in the unicast_srcip and unicast_peer clauses, so make sure to change this. (A configuration management system sure would help here…)
  • is the virtual IP address which will be bound as a secondary IP address to the active master’s Elastic Network Interface. You can pick anything unused in your subnet.

The notify script, awsha.rb

As previously mentioned, the external script is invoked whenever a master-to-backup or backup-to-master event occurs, via the notify_backup and notify_master directives in keepalived.conf. Upon receiving an event, it will associate and mount (or unmount and disassociate) the EBS volume from the instance, and attach or release the ENI secondary address.

The script is too long to reproduce inline here, so I’ve included it as a separate Gist.

Note: For brevity, I’ve eliminated a lot of error-handling from the script, so it may or may not work out-of-the-box. In a real implementation, you need to check for many error conditions like open files on a disk volume, poll for the EC2 API to attach/release the volume, etc.

Putting it all together

Start keepalived on both servers:

One of them will elect itself the master, assign the ENI secondary IP to itself, and attach and mount the block device on /mnt. You can see which is which by checking the service status:

The other machine will say that it’s transitioned to backup state:

To force a failover, stop keepalived on the current master. The backup system will detect that the master went away, and transition to primary:

After a while, the backup should be reachable on the VIP, and have the disk volume mounted under/mnt.

If you now start keepalived on the old master, it should come back online as the new backup.

Wrapping Up

As we’ve seen, it’s not always possible to architect systems in AWS for horizontal redundancy. Many pieces of software, particularly those involving one writer and many readers, cannot be set up this way.

In other situations, it’s not desirable to build horizontal redundancy. One real-life example is a highly-available large shared cache system (e.g. squid or varnish) where it would be costly to rebuild terabytes of cache on instance failure. At Chef Software, we use an expanded version of the tools shown here to implement our Chef Server High-Availability solution.

Finally, I also found this presentation by an AWS solutions architect in Japan very useful in identifying what L2 and L3 networking technologies are available in AWS: