AWS Advent 2012 Recap

It’s hard to believe that the 2012 AWS Advent is drawing to close. This all started because on 11/30 I found myself explaining what EC2 and this “cloud business” was to my father-in-law who is an old school C/C++ software developer and this got me thinking that an advent calendar explaining and exploring AWS would be beneficial, so I made a Tumblr blog and a Twitter account and dove in.


In 24 days we’ve done 21 posts with great content. Eighteen of which were written by yours truly, and we had three contributed posts, as well as a number of Tweets and RTs.

The topics covered were:


I’d like to thank everyone who followed this on Twitter, followed on Tumblr, posted your own tweets, or gave me feedback elsewhere.

A special thanks to Joshua Timberman, Erik Hollensbe, and Benjamin Krueger for contributing articles.

All the articles and sample code has been posted to a Github repository.

If you liked the content you saw here, follow me on Twitter or on my (hopefully updated more in 2013) blog.

Have a Merry Christmas and a Happy New Year!


If you have any feedback please contact me on on Twitter, @solarce or email me solarce+awsadvent2012 at gmail dot com. I’d love to hear what you liked, didn’t like, would like to see, or maybe contribute, for next year.

Strategies for re-usable CloudFormation Templates

In day 7’s post we learned about how CloudFormation (CFN) can help you to automate the creation and management of your AWS resources. It supports a wide variety of AWS services, includes the ability to pass in user supplied paramaters, has a nice set of CLI tools, and a few handy functions you are able to use in the JSON files.

In today’s post we’ll explore some strategies for getting the most out of your CFN stacks by creating re-usable templates.

Down with Monolithic

A lot of the CFN sample templates are monothithic, meaning that the template defines all the resources needed for an application’s infrastructure in a single template and so they all get created as part of a single stack. Examples of this are the Multi-tier VPC example or the Redmine Multi-AZ with Multi-AZ RDS example.

In keeping with the ideas of agile operations or infrastructure as code, I think that the way we should use CFN templates is as re-usable bits of infrastructure code to manage our AWS resources.

Layer cake

The approach that I’ve come up with for this is a series of layers, as outlined below:

  • VPC template – this defines the VPC for a region, your set of subnets (private and public), an internet gateway, any NAT instances you may want, your initial security groups, and possibly network ACLs
  • EC2 instance template – here you define the kinds of instances you want to run, passing in a VPC id, one or more AMI ids, security groups, EBS volumes, etc that are needed to run your infrastructure. Whether you make a template that is takes everything as parameters or one that is more monolithic in defining your instances is up to you
  • ELB (+ Auto-scaling) template – this defines one or more ELB instances needed, passing in your EC2 instances ids, listener ports, subnets, etc as parameters. Optionally, if you’re going to use auto-scaling, I’d include that in this template, along with the parameters needed it for it, since AS makes most sense when used with ELB for web facing applications.
  • S3 (+ CloudFront) template – this defines the buckets, ACLs, lifecycle policies, etc that are needed. Parameters
  • RDS template – this defines your RDS instances, taking a VPC id, subnet, RDS instance class, etc as parameters
  • If you’re using Route53 for DNS, I recommend putting the needed Route53 resources in each layer’s template.

These covers the most common resources you’re likely to use in a typical web applications, if you’re using other services like DynamoDB or Simple Notification Services then you should make additional templates as needed.

The overall approach is that your templates should have sufficient parameters and outputs to be re-usable across environments like dev, stage, qa, or prod and that each layer’s template builds on the next.

Some examples

As with any new technique, it is useful to have some examples.

Example VPC template

This template does not require any inputs, it will make a VPC with a network of, a public subnet of , a private subnet of, with default inbound 22 and 80 rules, and typical outbound rules.

It returns the newly created VPC’s id.

Example EC2 instance template

This template will create an EC2 instance, it requires you give it an ssh keypair name, a VPC id, a Subnet id within your VPC, an AMI id, and a Security Group.

It returns the EC2 instance id, the subnet, and the security group id.


Hopefully this has provided you with some strategies and examples for how to create re-usable CFN templates and build your infrastructure from a series of layered stacks.

As you build your templates, you’ll want to build some automation with a language library to drive the creation of each stack and manage passing your inputs from one stack to the next or see the earlier AWS Advent post on Automating AWS.

To explore this further I recommend you play with and tear apart the CloudFormation example templates Amazon has made available.

Exploring aws-cli

Yesterday Mitch Garnaat, a Senior Engineer at Amazon, announced the developer candidate release of a new AWS cli tool, awscli.

The tool is open source, available under the Apache 2.0 license, written in Python, and the code is up on Github.

The goal of this new cli tool is to provide a unified command line interface to Amazon Web Services.

It currently supports the following AWS services:

  • Amazon Elastic Compute Cloud (Amazon EC2)
  • Elastic Load Balancing
  • Auto Scaling
  • AWS CloudFormation
  • AWS Elastic Beanstalk
  • Amazon Simple Notification Service (Amazon SNS)
  • Amazon Simple Queue Service (Amazon SQS)
  • Amazon Relational Database Service (Amazon RDS)
  • AWS Identity and Access Management (IAM)
  • AWS Security Token Service (STS)
  • Amazon CloudWatch
  • Amazon Simple Email Service (Amazon SES)

This tool is still new, but it looks very promising. Let’s explore some of the ways we can use it.

Getting Started

To get started with awscli you’ll install it, create a configuration file, and optionally add some bash shell completions.


awscli can be quickly installed with either easy_install or pip

Once it is installed it you should have a aws tool available to use. You can confirm this with the command shown below:

If you run it without any arguments if should look like this:


You’ll need to make a configuration file for it. I am assuming you’ve already created and know your AWS access keys.

I created my configuration file as ~/.aws, and when you create yours, it should look like

You’ll want to set the region to the region you have AWS resources running in.

Once you’ve created it, you’ll set an environement variable to tell the aws tool where to find your configuration, you can do this with the following command

bash Completions

If you’re a bash shell user, you can install some handy tab completions with the following command

zsh shell users should look at for how to try to get completion working.

While I am a zsh user, I am still on 4.3.11 so I used bash for the purposes of testing out the awscli.

Let’s test it out, the following command should return a bunch of JSON output describing any instances in the region you’ve put in your configuration file. You can also tell aws to return text output by using the –output text argument at the end of your command.

Since all the sample output is very instance specific, I don’t have a good example of the output to share, but if the command works, you’ll know you got the right output. 😉

Now that we have the aws tool installed and we know it’s working, let’s take a look at some of the ways we can use it for fun and profit.

Managing EC2

The primary way a lot of you may use the aws tool is to manage EC2 instances.

To do that with the aws command, you use the ec2 service name.

With the tab completion installed, you can quickly see that aws ec2 <tab><tab> has 144 possible functions to run.

To view your EC2 resources you use the describe- commands, such as describe-instances which lists all your instances, describe-snapshots which lists all your EBS snapshots, or describe-instance-status which you give the argument –instance-id to see a specific instance.

To create new resources you use the create- commands, such as create-snapshot to create a new snapshot of an EBS volume or create-vpc to create a new VPC.

To launch a new EC2 instance you use the run-instances command, which you give a number of arguments including –instance-type, –key-name (your ssh keypair), or –user-data. aws ec2 run-instances --<tab><tab> is a quick way to review the available options.

There are a number of other kinds of commands available, including attach-, delete-, and modify. You can use the bash completion or the documentation to learn and explore all the available commands and each command’s arguments.

Managing S3

Unfortunately the aws tool does not support S3 yet, but boto has great S3 support, s3cmd is popular, or you can use the AWS S3 Console.

Managing CloudFormation

The aws tool supports managing CloudFormation.

You can see your existing stacks with list-stacks or see an a specific stack’s resources with list-stacks-resources and the –stack-name argument.

You can create or delete a stack with the aptly named create-stack and delete-stack commands.

You can even use the handy estimate-template-cost command to get a template sent through the AWS calculator and you’ll get back a URL with all your potential resources filled out.

Managing ELB

The aws tool supports managing Elastic Load Balancer (ELB).

You can see your existing load balancers with the describe-load-balancers command. You can create a new load balancer with the create-load-balancer, which takes a number of arguments, including –availability-zones, –listeners, –subnets or –security-groups. You can delete an existing load balancer with the delete-load-balancer command.

You can add or remove listeners to an existing load balancer with the create-load-balancer-listeners and delete-load-balancer-listeners.

Managing CloudWatch

The aws tool supports managing CloudWatch.

You can review your existing metrics with the list-metrics command and your existing alarms with the describe-alarms command. You can look at the alarms for a specific metric by using describe-alarms-for-metric and the –metric-name argument.

You can enable and disable alarm actions with the enable-alarm-actions and disable-alarm-actions commands.

Where to go from here?

You should make sure you’ve read the README.

To get more familiar with the commands and arguments, you should use both the bash completions and the built-in help.

To see the help for a specific command you invoke it like shown below:

An example is

You’ll get some details on each of the available commands for a given service.

From there, if you encounter issues or had ideas for feedback you should file an issue on Github.

While not an official channel, I idle in ##aws on and am happy to answer questions/provide help when I have time.

EC2 In-depth

In day 1’s post on AWS Key Concepts we learned a little about EC2, but as you’ve come to see in these past few posts, anyone using seriously AWS is likely using EC2 as a major part of their application infrastructure.

Let’s review what we learned about EC2 previously. EC2 is the Elastic Compute Cloud. It provides you with a variety of compute instances with set levels of CPU, RAM, and Disk allocations. You utilize these instances on demand, with hourly based pricing, but you can also pay to reserve instances.

An EC2 instance operating system is cloned from an AMI (Amazon Machine Images). These are the base from which your instances will be created. A number of operating systems are supported, including Linux, Windows Server, FreeBSD (on some instance types), and OmniOS.


EC2 instance pricing is per hour and varies by instance type and class. You begin paying for an instance as soon as you launch it. You also pay AWS’s typical bandwidth charges for any traffic that leaves the region your EC2 instance is running in.

For more details, consult the EC2 FAQ on Pricing.

Storage Options

There are two types of storage available for your instance’s root device volume:

  1. Instance store: In this case the root device for an instance launched from the AMI is an instance store volume created from a template stored in Amazon S3. An instance store is not persistent and has a fixed size, but it uses storage local to the instance’s host server. You’re not able to derive new AMIs from instance store backed instances.

  2. Elastic Block Store (EBS): EBS is a separate AWS service, but one of it’s uses is for the root storage of instances. These are called EBS backed instances. EBS volumes are block devices of N gigabytes that are available over the network and have some advanced snapshotting and performance features. This storage persists even if you terminate the instance, but this incurs additional costs as well. We’ll cover more EBS details below. If you choose to use EBS optimized instance types, your instance will be provisioned with a dedicated NIC for your EBS traffic. Non-EBS optimized instanced share EBS traffic with all other traffic on the instance’s primary NIC.

There are also two types of storage available to instances for additional storage needs:

  1. Ephemeral storage: Ephemeral storage are disks that are local to the instance host and the number of disks you get depends on the size of your instance. This storage is wiped whenever there is an event that terminates an instance, whether an EC2 failure or an action by a user.

  2. EBS: As mentioned, EBS is a separate AWS service, you’re able to create EBS volumes of N gigabytes and attach them over the network. You’re also able to take advantage ot their advanced snapshotting and performance features. This storage persists even if you terminate the instance, but this incurs additional costs as well.

Managing Instances

Managing instances can be done through the AWS console, the EC2 API tools, or the API itself.

The lifecycle of a EC2 instance is typically

  • Creation from an AMI
  • The instances runs, you may attach [EBS volumes] or [Elastic IPs], you may also restart the instance
  • You may stop an instance.
  • Eventually you may terminate the instance or the instance may go away due to a host failure.

It’s important to note that EC2 instances are meant to be considered disposable and that you should use multiple EC2 instances in multiple Availability Zones to ensure the availability of your applications.

Instance IPs and ENIs

So once you’ve begun launching instances, you’ll want to login and access them, and you’re probably wondering what kind of IP addresses your instances come with.

The EC2 FAQ on IP addresses tells us:

By default, every instance comes with a private IP address and an internet routable public IP address. The private address is associated exclusively with the instance and is only returned to Amazon EC2 when the instance is stopped or terminated. The public address is associated exclusively with the instance until it is stopped, terminated or replaced with an Elastic IP address.

If you’re deploying your instances in a VPC, you’re also able to use Elastic Network Interfaces (ENIs). These are virtual network interfaces that let you add additional private IP addresses to your EC2 instances running in a VPC.

Security Groups

Ensuring your EC2 instances are secure at the network is a vital part of your infrastructure’s overall security assurance. Network level security for EC2 instances is done through the use of security groups.

A security group acts as a firewall that controls the traffic allowed to reach one or more instances. When you launch an Amazon EC2 instance, you associate it with one or more security groups. You can add rules to each security group that control the inbound traffic allowed to reach the instances associated with the security group. All other inbound traffic is discarded. Security group rules are stateful.

Your AWS account automatically comes with a default security group for your Amazon EC2 instances. If you don’t specify a different security group at instance launch time, the instance is automatically associated with your default security group.

The initial settings for the default security group are:

  • Allow no inbound traffic
  • Allow all outbound traffic
  • Allow instances associated with this security group to talk to each other

You can either chose to create new security groups with different sets of inbounds rules, which you’ll need if you’re running a multi-tier infrastructure, or you can modify the default group.

In terms of the limitations of security groups, you can create up to 500 Amazon EC2 security groups in each region in an account, with up to 100 rules per security group. In Amazon VPC, you can have up to 50 security groups, with up to 50 rules per security group, in each VPC. The Amazon VPC security group limit does not count against the Amazon EC2 security group limit.

Spot and Reserved Instances

Besides paying for EC2 instances on-demand, you’re able to utilize instance capacity in two other ways, Spot instances and Reserved instances.

Spot Instances

If you have flexibility on when your application will run, you can bid on unused Amazon EC2 compute capacity, called Spot Instances, and lower your costs significantly. Set by Amazon EC2, the Spot Price for these instances fluctuates periodically depending on the supply of and demand for Spot Instance capacity.

To use Spot Instances, you place a Spot Instance request (your bid) specifying the maximum price you are willing to pay per hour per instance. If the maximum price of your bid is greater than the current Spot Price, your request is fulfilled and your instances run until you terminate them or the Spot Price increases above your maximum price. Your instance can also be terminated when your bid price equals the market price, even when there is no increase in the market price. This can happen when demand for capacity rises, or when supply fluctuates.

You will often pay less per hour than your maximum bid price. The Spot Price is adjusted periodically as requests come in and the available supply of instances changes. Everyone pays that same Spot Price for that period regardless of whether their maximum bid price was higher, and you will never pay more than your hourly maximum bid price.

Reserved Instances

You can use Reserved Instances to take advantage of lower costs by reserving capacity. With Reserved Instances, you pay a low, one-time fee to reserve capacity for a specific instance and get a significant discount on the hourly fee for that instance when you use it. Reserved Instances, which are essentially reserved capacity, can provide substantial savings over owning your own hardware or running only On-Demand instances. Reserved Instances are available from AWS in one- and three-year terms. Reserved Instances are available in three varieties—Heavy Utilization, Medium Utilization, and Light Utilization

Launching your Reserved Instance is the same as launching any On-Demand instance: You launch an instance with the same configuration as the capacity you reserved, and AWS will automatically apply the discounted hourly rate that is associated with your capacity reservation. You can use the instance and be charged the discounted rate for as long as you own the Reserved Instance. When the term of your Reserved Instance ends, you can continue using the instance without interruption. Only this time, because you no longer have the capacity reservation, AWS will start charging you the On-Demand rate for usage.

To purchase an Amazon EC2 Reserved Instance, you must select an instance type (such as m1.small), platform (Linux/UNIX, Windows, Windows with SQL Server), location (Region and Availability Zone), and term (either one year or three years). When you want your Reserved Instance to run on a specific Linux/UNIX platform, you must identify the specific platform when you purchase the reserved capacity.


Tagging is a minor EC2 feature that I find interesting. Tags are a key-value pair that you can apply to one or many EC2 instances. These let you add your own metadata to your EC2 instances for use in inventory or lifecycle management of your instances.

The following basic restrictions apply to tags:

  • Maximum number of tags per resource—10
  • Maximum key length—128 Unicode characters
  • Maximum value length—256 Unicode characters
  • Unavailable prefixes: aws (we have reserved it for tag names and values)
  • Tag keys and values are case sensitive.

You’re able to tag a wide variety of resources.

Tagging can be done through the AWS console, the EC2 API tools, or the API itself.


EC2 instances are more than, but also different from, typical VPS instances. The flexibility of being able to use EC2 instances, of the many types and classes, coupled with the hourly pricing let’s you do many things with your infrastructure that traditional data centers did not make possible. But the disposable nature of EC2 instances have some drawbacks. These should all be considered careful as you decide how and when to use EC2 instances for your applications.

As we’ve seen, there are a number of options for how you can pay for your EC2 instances and how you manage the instance lifecycle.

Monitoring and AWS

A critical part of any application infrastructure is monitoring. Now monitoring can mean a lot of things to different people, but for the purposes of this post we’re going to define monitoring as two things

  1. Collecting metrics data to look at performance over time
  2. Alerting on metrics data based on thresholds

Let’s take a look at some of the tools and services available to accomplish this and some of their unique capabilities.

There are certainly many many options for this, as a search for “aws monitoring” will reveal, but I am going to focus on a few options that I am familiar with and see as representing the various classes of options available.

Amazon CloudWatch

Amazon CloudWatch is of course the first option you may think when your using AWS resources for your application infrastructure as it’s already able to to automatically provide you with metric data for most AWS services.

CloudWatch is made up of three main components:

  • metrics: metrics are data points that are stored in a time series format.
  • graphs: which are visualizations of metrics over a time period.
  • alarms: an alarm watches a single metric over a time period you specify, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods

Currently CloudWatch has built-in support for the following services:

  • AWS Billing
  • Amazon DynamoDB
  • Amazon ElastiCache
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic MapReduce
  • Amazon Relational Database
  • Amazon Simple Notification Service
  • Amazon Simple Queue Service
  • Amazon Storage Gateway
  • Auto Scaling
  • Elastic Load Balancing

CloudWatch provides you with the API and storage to be able to monitor and publish metrics for these AWS services, as well as adding your own custom data, through many interfaces, including CLI tools, An API, or via many language libraries. They even provide some sample monitoring scripts for collecting OS information for Linux and Windows.

Once your metric data is being stored by CloudWatch, you’re able to create alarms which use AWS SNS to send alerts via email, SMS, or posting to another web service.

Finally, you’re able to visualize these metrics in various ways through the AWS console.

CloudWatch pricing is straightforward. You pay per custom metric, per alarm, per API request, all on a monthly basis, and Basic Monitoring metrics (at five-minute frequency) for Amazon EC2 instances are free of charge, as are all metrics for Amazon EBS volumes, Elastic Load Balancers, and Amazon RDS DB instances.


Boundary is a startup based out of San Francisco which takes an interesting approach to monitoring. Caveat, I am friends with a few of the folks there and very excited about what they’re doing.

Boundary is a hosted offering who’s goal is to provide in-depth application monitoring by looking at things from the point of view of the network. The service works by having you deploy an agent they call a meter to each your servers, they currently support a variety of Linux distributions and Window 2008 server. These meters send IPFIX data back to Boundary’s hosted platform, where the data is processed and stored.

The idea behind Boundary is that by looking at the data from the network, in real time, you’re able to quickly see patterns in the flow of data in and out of your infrastructure and between the tiers within it in a way that hasn’t been as easily done with traditional monitoring tools that are looking at OS metrics, SNMP data, etc. And by being able to annotate this data and monitor for changes, you can create a comprehensive and detailed real time and long term view into your infrastructure.

You’re then able to visualize your data in various ways, including annotating and overlaying it or adding your own custom data. You’re then able to have alerts sent by email natively or in a variety ways through their supported integration with PagerDuty. Some more details on how you’re able to use Boundary is laid out in their 30 Days to Boundary page.

Boundary’s pricing is simple. It starts with a Free plan that lets you store up to 2GB of metric data/day. Paid plans begin at $199 US/month for commercial support, higher daily storage limits, flexible long term data storage options.

Boundary even has a couple good resources focused on how they’re a good fit for when you’re using AWS, including a video, See Inside the Amazon EC2 Black Box and a PDF, Optimizing Performance in Amazon EC2, that are worth reviewing.


Datadog is a hosted offering based out of New York City that aims to give you a unified place to store metrics, events and logs from your infrastructure and third party services, to visualize this data, and alert on it, as well as discuss and collaborate on this data.

Datadog works by having you installing an agent, which they currently support running on a variety of Linux distributions, Windows, OSX, and SmartOS. They also support integration with a variety of open source tools, applications, languages, some third party services.

Once you’ve installed the agent and configured your desired integations, you’ll begin seeing events and metrics flow into your account. You’re able to build your own agent based checks and service checks and do custom integration through a number of libraries.

From there you can beginning visualizing and using your data by making custom dashboards and then creating alerts which can be sent via email or in a variety ways through their supported integration with PagerDuty.

Datadog’s pricing starts with a free plan that includes 1 day retention and 5 hosts. Paid plans start at $15/host/month for up to 100 hosts with one year retention, alerts, and email support.


Not everyone wants to utilize a hosted serviced and there are a number of open source tools for building your own monitoring solution.

The up and coming tool in this space is Sensu. Sensu is an open source project that is sponsored by Sonian and has a thriving developer and user community around it. Sensu was built by Sonian out of their need for a flexible solution that could handle how they dynamically scale their infrastructure tiers up and down on various public cloud providers, starting with AWS.

Sensu’s goal is to be a monitoring framework that let’s you build a scalable solution to fit your needs. It is built from the following components:

  • The server, which aggregates data from the clients
  • The clients, which run checks
  • The API service
  • RabbitMQ, which is the message bug that glues everything together

The various Sensu components are all written in Ruby and open source. Sensu supports running checks written in Ruby, as well as existing Nagios checks.

This excellent getting started post by Joe Miller sums Sensu up nicely.

Sensu connects the output from “check” scripts run across many nodes with “handler” scripts run on Sensu servers. Messages are passed via RabbitMQ. Checks are used, for example, to determine if Apache is up or down. Checks can also be used to collect metrics such as MySQL statistics. The output of checks is routed to one or more handlers. Handlers determine what to do with the results of checks. Handlers currently exist for sending alerts to Pagerduty, IRC, Twitter, etc. Handlers can also feed metrics into Graphite, Librato, etc. Writing checks and handlers is quite simple and can be done in any language.


Nagios is the granddaddy of open source monitoring tools. It’s primarily a server service you run that it watches hosts and services that you specify, alerting you when things go bad and when they get better. It supports a variety of checks for most services and software, as well as the ability to write your own. You’ll find checks, scripts, and extensions for Nagios for pretty much anything you can think of.

For metrics and alerting there are good tools for integrating with Graphite and PagerDuty.


There are a number of hosted and open source solutions available to match your infrastructure’s monitoring needs. While this post hasn’t covered them all, I hope you’ve gotten a nice overview of some of options and some food thought when considering how to gather the metrics and data needed to run an application on AWS and stay alerted to the changes and incidents that are important to you.

If you’re interested in learning more about open source monitoring, you should watch The State of Open Source Monitoring by Jason Dixon.

If you’re interested in the future of monitoring, you should keep an eye on the upcoming Monitorama conference coming up in March 2013.

AWS Direct Connect

Today’s post on AWS Direct Connect is a contribution by Benjamin Krueger, who is a Site Reliability Engineer for Sourcefire, Inc and is presently working with a highly talented team to build a flexible hybrid cloud infrastructure.

He enjoys a delicious cup of buzzword soup, and isn’t afraid to SOA his cloud with API driven platform metrics. His event streams offer high availability processing characteristics. Turbo Encabulator.

Deck the halls with single-mode fiber

I wish I could have my cake and eat it too.

Whether you are a fan or critic, the world of cloud computing has undeniably changed how many of us build and operate the services we offer. Also undeniable, however, is the fact that the reliability of your access to resources in the cloud is limited by the reliability of all the networks in between. In the networking world, one way that ISPs, carriers, and content providers often side-step this issue is by participating in Internet Exchanges; physical network meet up points where participants exchange network traffic directly between their respective networks. Another form of this is through direct network peering agreements where two parties maintain a direct physical network connection between each other to exchange traffic.

While the cloud offers lots of benefits, sometimes it just doesn’t make sense to run your entire operation there. You can’t run your own specialized network appliances in the cloud, for example. Perhaps your requirements specify a level of hardware control that can’t be met by anything other than an in-house datacenter. Maybe the cost-benefit of available cloud server instances makes sense for some workloads but not for others. Sure, you can write off the cloud entirely but wouldn’t it be nice if you could build a hybrid solution and get a network connection direct from your own datacenter or home office to your cloud provider’s network? If you’re an Amazon Web Services customer then you can do this today with AWS Direct Connect. This article won’t be a howto cookbook but will outline what Direct Connect is and how you can use it to improve the reliability and performance of your infrastructure when taking advantage of the benefits of cloud services.

AWS Direct Connect service.

The AWS Direct Connect service lets you establish a network link, at 1Gb or 10Gb, from your datacenter to one of seven Amazon regional datacenters across the globe. At the highest level, you work with your infrastructure provider to establish a network link between your datacenter and an AWS Direct Connect Location. Direct Connect Locations are like meet up points. Each is located in physical proximity to an Amazon region, and are the point where direction connections are brought in to Amazon’s network for that region.

AWS Direct Connect Locations

As an illustration, let’s explore a hypothetical Direct Connect link from a New Jersey datacenter to Amazon’s US-East region. Amazon maintains Direct Connect Locations for their US-East Northern Virginia region at CoreSite in New York City, and seven Equinix data centers in Northern Virginia. Being in New Jersey, it makes sense for us to explore a connection to their CoreSite location in New York. Since you don’t already have a presence in CoreSite, you would make arrangements to rent a cage and collocate. Then you would have to make arrangements, usually with a telco or other network infrastructure provider, to create a link between your datacenter and your gear in CoreSite. At that point, you can begin the process to cross-connect between your CoreSite cage and Amazon’s CoreSite cage.

The example I just outlined has quite a few drawbacks. We need to interface with a lot of companies and sign a lot of contracts. That necessarily means quite a bit of involvement from your executive management and legal counsel. It also requires a significant investment of time and Capex, as well as ongoing Opex. Is there anything we can do to make this process simpler and more cost-effective?

An AWS Direct Connect Layout

As it turns out, there is something we can do. Amazon has established a group of what they call APN Technology and Consulting Partners. That’s quite a mouthful, but it boils down to a group of companies with can manage many of the details involved in the Direct Connect process. In the example layout above, we work with an APN Partner who establishes a link between our datacenter and the Direct Connect Location. They take care of maintaining a presence there, as well as the details involved in interfacing with Amazon’s cage. The end result is a single vendor helping us establish our Direct Connect link to Amazon.

So what’s this gonna cost me?

At the time of this publication, Amazon’s charges $0.30 per port-hour for 1Gb connections and $2.25 per port-hour for 10Gb connections. Since the ports are always on while you use the service, that works out to approximately $220/mo for 1Gb and $1650/mo for 10Gb. In addition, Amazon charges $0.03 per GB of outbound transfers while inbound transfers are free. That means a Direct Connect link makes the most sense for pushing large quantities of data towards Amazon. This works out well for scenarios where systems in the cloud make small requests to systems in your datacenter which then return larger results.

Costs when dealing with an APN Partner can vary. In my own environment, the vendor costs approximately $3k/mo. The vendor takes care of the connection between our North Virginia datacenter and Amazon’s Equinix Direct Connect Location, and we get a single-mode fiber drop straight into our cage. All we have to do is plug it in to our router. For more complex links, costs will obviously be higher. You could direct connect your Toronto datacenter to Amazon through CoreSite in New York but with getting fiber out of your cage, working with a network carrier for the trip between cities, cage rental, and cross connect charges, don’t be surprised if the bill is significant!

Get your packets runnin’

Once you have a physical path to Amazon, you need to plug it in to something. Amazon requires that your router support 802.1Q VLANs, BGP, and BGP MD5 authentication. While this often means using a traditional network router from a company like Cisco or Juniper, you could also build a router using Linux and a BGP implementation like OpenBGP or Zebra. If you have an ops team, but the idea of BGP makes you shiver, don’t fret. Once you give them some details, Amazon will generate a sample configuration for common Cisco and Juniper routers.

To begin routing traffic to AWS public services over your Direct Connect link, you will need to create a Virtual Interface in Amazon’s configuration console. You only need a few pieces of information to set this up: A VLAN number, a BGP ASN, your router’s peer IP address (which Amazon will provide), Amazon’s peer IP address (also provided), and the network prefixes that you want to advertise. Some of this is straight-forward, and some less so. If you do not have a BGP ASN then you can choose an arbitrary number between 64512 and 65534, which is a range of BGP ASNs reserved by IANA similar to RFC1918 address space. The prefix is a public address block which Amazon will know to route over your virtual interface; This could be as small as a /32 for a NAT server that your systems live behind. It should be noted that at this time, Direct Connect does not support IPv6.

Amazon has authored some excellent documentation for most of their AWS services, and the process for creating Virtual Interfaces is no exception. Your configuration may require some subtle changes, and of course you should never promote any system to production status without fully understanding the operational and security consequences of its configuration.

But once you’ve reached that point and your virtual interface is online, Amazon will begin routing your packets over the link and you now have a direct connection straight in to Amazon’s network!

So what does a hybrid infrastructure look like?

In addition to using Direct Connect to access AWS public services, you can also use it to run your own infrastructure on Amazon’s platform. One of the most polished and well-supported ways to do this is by using Amazon’s Virtual Private Cloud service. A VPC environment allows you to take your server instances out of Amazon’s public network. If you are familiar with Amazon’s EC2 platform, you will recall that server instances live on a network of private address space alongside every other EC2 customer. VPC takes that same concept, but puts your instances on one or more private address spaces of your choosing by themselves. Additionally, it offers fine-grained control over which machines get NAT to the public internet, which subnets can speak to each other, and other routing details. Another benefit offered by VPC is the ability for your Direct Connect Virtual Interface to drop straight on to your VPC’s private network. This means that the infrastructure in both your datacenter and Amazon VPC can live entirely on private address space, and communicate directly. Your network traffic never traverses the public internet. In essence, your VPC becomes a remote extension of your own datacenter.

When to use all this?

So what kind of situations can really benefit from this kind of hybrid infrastructure? There are myriad possibilities, but one that might be a common case is to take advantage of Amazon’s flexible infrastructure for service front-ends while utilizing your own hardware and datacenter for IO intensive or sensitive applications. In our hypothetical infrastructure, taking advantage of Amazon’s large bandwidth resources, ability to cope with DDoS, and fast instance provisioning, you bring up new web and application servers as demand requires. This proves to be cost effective, but your database servers are very performance sensitive and do not cope well in Amazon’s shared environment. Additionally, your VP really wants the master copy of your data to be on resources you control. Running your database on Amazon is right out, but using Direct Connect your app servers can connect right to your database in your datacenter. This works well, but all of your read requests are traversing the link and you’d like to eliminate that. So you set up read slaves inside Amazon, and configure your applications to only send writes to the master. Now only writes and your replication stream traverse the link, taking advantage of Amazon’s Direct Connect pricing and free inbound traffic.

How’s it work?

So how well can Direct Connect perform? Here is an example of the latency between the router in my own datacenter in Northern Virginia, and the router on Amazon’s US-East network. This is just about a best-case scenario, of course, and the laws of physics apply.

One millisecond, which is the lowest precision result our router provides! Due to a misconfiguration, I don’t presently have throughput stats but when measured in the past we have been able to match the interface speed that the router is capable of. In other words, Direct Connect performs exactly as you would expect a fiber link between two locations would.

Wrapping up

There are caveats to using Direct Connect, especially in a production environment. Being a single line of fiber, your network path is exposed to a few single points of failure. These include your router, the fiber in between yourself and the Direct Connect Location, and the infrastructure between the Direct Connect Location and Amazon’s regional datacenter. Additionally, Amazon does not offer an SLA on Direct Connect at this time and reserves the right to take down interfaces at their entry for maintenance. Because of this, Amazon recommends ensuring that you can fail over to your primary internet link or ordering a second Direct Connect link. If your requirements include low latency and high throughput, and failing over to your default internet provider link will not suffice, a second Direct Connect link may be justified.

While I’ve outlined Direct Connect’s benefits for a single organization’s hybrid infrastructure, that certainly isn’t the only group who can take advantage of this service. Hosting companies, for example, might wish to maintain Direct Connect links to Amazon datacenters so that their customers can take advantage of Amazon’s Web Services in a low latency environment. Organizations with a traditional datacenter might use AWS as a low cost disaster recovery option, or as a location for off-site backup storage.

I hope this article has helped illuminate Amazon’s Direct Connect service. Despite a few drawbacks this service is a powerful tool in the system administrator’s toolbox, allowing us to improve the reliability and performance of our infrastructures while taking advantage of the benefits of Amazon’s cloud platform. Hopefully we will soon start seeing similar offerings from other cloud providers. Perhaps there may even be dedicated cloud exchanges in the future, allowing direct communication between providers and letting us leverage the possibility of a truly distributed infrastructure on multiple clouds.

Automating Backups in AWS

In Day 9’s post we learned about some ideas for how to do proper backups when using AWS services.

In today’s post we’ll take a hands-on approach to automating creating resources and performing the action needs to achieve these kinds of backups, using some bash scripts and the Boto python library for AWS.

Ephemeral Storage to EBS volumes with rsync

Since IO performance is key for many applications and services, it is common to use your EC2 instance’s ephermeral storage and Linux software raid for your instance’s local data storage. While EBS volumes can have erratic performance, they are useful to provide backup storage that’s not tied to your instance, but is still accessible through a filesystem.

The approach we’re going to take is as follows:

  1. Make a two EBS volume software raid1 and mount as /backups
  2. Make a shell script to rsync /data to /backups
  3. Set the shell script up to run as a cron job

Making the EBS volumes

Adding the EBS volumes to your instance can be done with a simple Boto script

Once you’ve run this script you’ll have two new volumes attached as local devices on your EC2 instance.

Making the RAID1

Now you’ll want to make a two volume RAID1 from the EBS volumes and make a filesystem on it.

The following shell script takes care of this for you

Now you have a /backups/ you can rsync files and folders to for your backup process.

rsync shell script

rsync is the best method for syncing data on Linux servers.

The following shell script will use rsync to make backups for you.

making a cron job

To make this a cron job that runs once a day, you can add a file like the following, which assumes you put in /usr/local/bin

This cron job will run as root, at 12:15AM in the timezone of the instance.


Data Rotation, Retention, Etc

To improve on how your data is rotated and retained you can explore a number of open source tools, including:

EBS Volumes to S3 with boto-rsync

Now that you’ve got your data backed up to EBS volumes, or you’re using EBS volumes as your main source of datastore, you’re going to want to ensure a copy of your data exists elsewhere. This is where S3 is a great fit.

As you’ve seen, rsync is often the key tool in moving data around on and between Linux filesystems, so it makes sense that we’d use an rsync style utility that talks to S3.

For this we’ll look at how we can use boto-rsync.

boto-rsync is a rough adaptation of boto’s s3put script which has been reengineered to more closely mimic rsync. Its goal is to provide a familiar rsync-like wrapper for boto’s S3 and Google Storage interfaces.

By default, the script works recursively and differences between files are checked by comparing file sizes (e.g. rsync’s –recursive and –size-only options). If the file exists on the destination but its size differs from the source, then it will be overwritten (unless the -w option is used).

boto-rsync is simple to use, being as easy as boto-rsync [OPTIONS] /local/path/ s3://bucketname/remote/path/, which assumes you have your AWS key put in ~/.boto or the ENV variables set.

boto-rsync has a number of options you’ll be familiar with from rsync and you should consult the README to get more familiar with this.

As you can see, you can easily couple boto-rsync with a cron job and some script to get backups going to S3.

Lifecycle policies for S3 to Glacier

One of the recent features added to S3 was the ability to use lifecycle policies to archive your S3 objects to Glacier

You can create a lifecycle policy to archive data in an S3 bucket to glacier very easily with the following boto code.


As you can see, there are many options for automating your backups on AWS in comprehensive and flexible ways, and this post is only the tip of the iceberg.

Using ELB and Auto-Scaling

Load balancing is a critical piece of any modern web application infrastructure and Amazon’s Elastic Load Balancer (ELB) service provides an API driven and integrated solution for load balancing when using AWS services. Building on top of Amazon’s CloudWatch monitoring and metrics solution, and easily coupled with ELB, Amazon’s Auto-Scaling service provides you with the ability to dynamically scaling parts of your web application infrastructure on the fly and based on performance or user demand.

Elastic Load Balancer

ELB is a software load balancer solution that provides you with public IPs, SSL terminations, and the ability to do layer 4 and 7 load balancing, with session stickyness, as needed. Managed through the AWS console, CLI tools, or theELB API. All while paying by the hour, only for the resources and bandwidth used.


Auto-Scaling lets you define CloudWatch metrics for dynamically scaling EC2 instances up and down, completely automatically. You’re able to utilize On-Demand or Spot instances, inside or out of your VPC for the scaling and it easily couples with ELB to allow auto-scaled instances to begin serving traffic for web applications. Managed through the AS CLI tools or theAS API. All while paying by the hour, only for the CloudWatch metrics used. You’re also able to use [AWS SNS] to get alerted as auto-scaling policies take actions.

Getting Started with ELB

ELB is composed of ELB instances. An ELB instance has the following elements:

To get started with ELB you’ll build an ELB instance

  1. Login to the AWS console
  2. Click Load Balancers
  3. On the DEFINE LOAD BALANCER page, make the following selections:
  4. Enter a name for your load balancer (e.g., MyLoadBalancer).
  5. Leave CreateLB inside set to EC2 because in this example you’ll create your load balancer in Amazon EC2. The default settings require that your Amazon EC2 HTTP servers are active and accepting requests on port 80.
  6. On the CONFIGURE HEALTH CHECK page of the Create a New Load Balancer wizard, set the following configurations:
  7. Leave Ping Protocol set to its default value of HTTP.
  8. Leave Ping Port set to its default value of 80.
  9. In the Ping Path field, replace the default value with a single forward slash (“/”). Elastic Load Balancing sends health check queries to the path you specify in Ping Path. This example uses a single forward slash so that Elastic Load Balancing sends the query to your HTTP server’s default home page, whether that default page is named index.html, default.html, or a different name.
  10. Leave the Advanced Options set to their default values.
  11. On the ADD INSTANCES page, check the boxes in the Select column to add instances to your load balancer.
  12. On the Review page of the Create a New Load Balancer wizard, check your settings. You can make changes to the settings by clicking the edit link for each setting.
  13. Now that you’ve made your configuration choices, added your instances, reviewed your selections, and have clicked Create, you’re ready to create your load balancer.
  14. After you click Create button in the REVIEW page, a confirmation window opens. Click Close. When the confirmation window closes, the Load Balancers page opens. Your new load balancer now appears in the list.
  15. You can test your load balancer after you’ve verified that at least one of your EC2 instances is InService. To test your load balancer, copy the DNS Name value that is listed in the Description tab and paste it into the address field of an Internet-connected web browser. If your load balancer is working, you will see the default page of your HTTP server.

Now that you’ve created an ELB instance, some of things you may want to do could include:

Getting Started with Auto-Scaling

Auto-Scaling is built from two things, a launch configuration and an auto-scaling group.

To build an auto-scaling configuration, do the following

  1. Download and Install the AS CLI tools
  2. Create a launch configuration, e.g. as-create-launch-config MyLC --image-id ami-2272864b --instance-type m1.large
  3. Create an Auto-Scaling group, e.g. as-create-auto-scaling-group MyGroup --launch-configuration MyLC --availability-zones us-east-1a --min-size 1 --max-size 1
  4. You can list your auto-scaling group with as-describe-auto-scaling-groups --headers

At this point you have a basic auto-scaling group.

To make this useful you’ll probably want to do some of the following


In conclusion, ELB and Auto-Scaling provide a number of options for managing and scaling your web application infrastructure based on traffic growth and user demand and letting easily mix and match them with other AWS services.

Using IAM to Increase Flexibility and Security

Today’s post is a contribution from Erik Hollensbe, an active member of the Chef and Operations communities online and a practicing Operations Engineer.

AWS IAM (Identity and Access Management) is a tool to apply ACLs to AWS credentials – it’s not much more than that. While this sounds pretty banal, it can be used to solve a number of problems with both the flexibility and security of your network.

Scare Tactics Time

A lot of companies and groups use AWS exclusively, where previously they would have used racks of machines in a data center. Short of having a working proximity card and a large bucket of water, there wasn’t much you were going to be able to do to cause irreparable damage to every component of your company’s network. Presuming you did that, and didn’t kill yourself by electrocution, you still have to evade the DC cameras to get away with it.

That all changes with something like AWS. The master keys to your account can literally be used to destroy everything. Your machines, your disks, your backups, your assets. Everything. While vendor partitioning, off site backups, etc, is an excellent strategy (aside from other, separate gains) to mitigate the long-term damage, it doesn’t change this. Plus since the credentials are shared, it’s not exactly a feat to do this anonymously.

While my intent isn’t to scare you into using IAM, it’s important to understand that in more than a few organizations, not only will many members of your staff have these credentials, but frequently enough they will also live on the servers as part of code deploys, configuration management systems, or one-off scripts. So you don’t even have to work at the company in that situation, you simply need to find a hole to poke open to tear down an entire network.

Security Theatre in a Nutshell

Before I go into how to use IAM to solve these problems, I’m going to balance this out with a little note about security theatre.

Know the problem you’re actually solving. If you’re not clear on what you’re solving, or it’s not a full solution, you’re practicing the worst kind of security theatre, wasting everyone’s time as a result. Good security is as much about recognizing and accepting risk as mitigating it. Some of these solutions may not apply to you and some of them may not be enough. Use your head.

IAM as a tool to mitigate turnover problems

This is the obvious one, so I’ll mention it first. Managing turnover is something that’s familiar to anyone with significant ops experience, whether or not they had any hand in the turnover itself. Machines and networks are expected to be protected from reprisals and a good ops person is thinking about this way ahead of when it happens for the first time.

Just to be clear, no human likes discussing this subject, but it is a necessity and an unfortunate consequence of both business and ops. Ignoring it isn’t a solution.

IAM solves these problems in a number of ways:

  • Each user gets their own account to both log in to the web interface and associated credentials to use.
  • Users are placed in groups which have policies (ACLs). Users individually have policies as well and these can cascade. Policies are just JSON objects, so they’re easy to re-use, keep in source control, etc.

Most users have limited needs and it would be wise to (without engaging in security theatre) assess what those needs are and design policies appropriately. Even if you don’t assign restrictive policies, partitioning access by user makes credential revocation fast and easy, which is exactly what you want and need in an unexpected turnover situation… which is usually the time when it actually matters.

And who watches the watchers? Let’s be honest with ourselves for a second. You may be behind the steering wheel, but you probably aren’t the final word on the route to take, and anyone who thinks they are because they hold the access keys needs more friends in the legal profession. Besides, it’s just not that professional. Protect your network against yourself too. It’s just the right thing to do.

So, here’s the shortest path to solving turnover problems with AWS credentials:

  • Bootstrap IAM – click on IAM in the AWS control panel. Set up an admin group (the setup will offer to create this group for you) and a custom URL for your users to log in to.
  • Set up users for everyone who needs to do anything with AWS. Make them admins. (Admins still can’t modify the owner settings, but they can affect other IAM users.)
  • Tell your most senior technical staff to create a new set of owner credentials, to change the log in password, and to revoke the old credentials.

Now you’re three clicks away (or an API call) from dealing with any fear of employee reprisal short of the CTO, and you have traceable legal recourse in case it took you too long to do that. Congratulations.

IAM as a tool to scope automated access

Turnover is not a subject I enjoy discussing when it comes to security, but it’s the easier introduction. While I think the above is important, it’s arguably the lesser concern.

As our applications and automation tooling like configuration management becomes more involved and elaborate, we start integrating parts of the AWS API. Whether that’s a web app uploading files to a S3 bucket, a deploy script that starts new EC2 machines, or a chef recipe that allocates new volumes from EBS for a service to use, we become dependent on the API. This is a good thing, of course – the API is really where the win is in using a service like AWS.

However, those credentials have to live somewhere. On disk, in a central configuration store, encrypted, unencrypted, it doesn’t matter. If your automation or app can access it, an attacker that wants it will get it.

Policies let us scope what credentials can do. Does your app syncing assets with S3 and cloudfront need to allocate EBS volumes, or manage Route53 zones? Prrrrrroobably not. If it’s easier to think about this in unix terms, does named need to access the contents of /etc/shadow?

“Well, duh!”, you might say, yet many companies plug owner credentials directly into their S3 or EBS allocation tooling, and then run on EC2 under the same account. We preach loudly about not running as root, but then expose our entire network (not just that machine) to plunder.

Instead, consider assigning policies to different IAM accounts that allow exactly what that tool needs to do, and making those credentials available to that tool. Not only will you mitigate access issues, but it will be clearer when your tooling is trying to do something you didn’t expect it to do by side-effect, just like a service or user on your machine messing with a file you didn’t expect it to.

You can populate these credentials with your favorite configuration management system, or credentials can also be associated to EC2 instances directly, where the metadata is available from an internally-scoped HTTP request.

Creating a Policy

An IAM policy is just a JSON-formatted file with a simple schema that looks something like this:

{ "Statement": [ { "Sid": "Stmt1355374500916", "Action": [ "ec2:CreateImage" ], "Effect": "Allow", "Resource": "*" } ] }

Some highlights:

  • A Statement is a hash describing a rule.
  • Actions are a 1:1 mapping to AWS API calls. For example, the above statement references the CreateImage API call from the ec2 API.
  • Effect is just how to restrict the Action. Valid values are Allow and Deny.
  • A Resource is an ARN, which is just a qualified namespace. In the EC2 case ARNs have no effect, but you’d use one if you were referring to something like a S3 bucket.

For extended policy documentation, look here.

One of my favorite things about AWS policies is that they’re JSON. This JSON file can be saved in source control and re-used for reference, repeat purposes, or in a DR scenario.

AWS itself provides a pretty handy Policy Generator for making this a little easier. You will still want to become familiar with the API calls to write effective policies, but there is also a small collection of example policies while you get your feet wet.

Happy Hacking!

AWS EC2 Configuration Management with Chef

Today’s post is a contribution from Joshua Timberman, a Technical Community Manager at Opscode, an avid RPGer and DM extraordinaire, a talented home-brewer, who is always Internet meme and Buzzword compliant.

He shares with us how Chef can help manage your EC2 instances.

In a previous post, we saw some examples about how to get started managing AWS EC2 instances with Puppet and Chef. In this post, we’ll take a deeper look into how to manage EC2 instances with Chef. It is outside the scope of this post to go into great detail about building cookbooks. If you’re looking for more information on working with cookbooks, see the following links


There are a number of prequisites for performing the tasks outlined in this post, including

  • Workstation Setup
  • Authentication Credentials
  • Installing Chef

Workstation Setup

We assume that all commands and work will originate from a local workstation. For example, a company-issued laptop. We’ll take for granted that it is running a supported platform. You’ll need some authentication credentials, and configure knife.

Authentication Credentials

You’ll need the Amazon AWS credentials for your account. You’ll also need to create an SSH key pair to use for your instances. Finally, if you’re using a Chef Server, you’ll need your user key and the “validation” key

Install Chef

If your local workstation system doesn’t already have Chef installed, Opscode recommends using the “Omnibus package” installers.

Installing the knife-ec2 Plugin

Chef comes with a a plugin-based administrative command-line tool called knife. Opscode publishes the knife-ec2 plugin which extends knife with fog to interact with the EC2 API. This plugin will be used in further examples, and it can be installed as a RubyGem into the “Omnibus” Ruby environment that comes with Chef.


sudo /opt/chef/embedded/bin/gem install knife-ec2

If you’re using a different Ruby environment, you’ll need to use the proper gem command.


In order to use knife with your AWS account, it must be configured. The example below uses Opscode Hosted Chef as the Chef Server. It includes the AWS credentials as read in from shell environment variables. This is so the actual credentials aren’t stored in the config file directly.

Normally, the config file lives in ./.chef/knife.rb, where the current directory is a “Chef Repository.” See the knife.rb documentation for more information.


The additional commented lines can all be passed to the knife ec2 server create command through its options, see --help for full options list.

Launching Instances

Launch instances using knife-ec2’s “server create” command. This command will do the following:

  1. Create an instance in EC2 using the options supplied to the command, and in the knife.rb file.
  2. Wait for the instance to be available on the network, and then wait until SSH is available.
  3. SSH to the instance as the specified user (see command-line options), and perform a “knife bootstrap,” which is a built-in knife plugin that installs Chef and configures it for the Chef Server.
  4. Run chef-client with a specified run list, connecting to the Chef Server configured in knife.rb.

In this example, we’re going to use an Ubuntu 12.04 AMI provided by Canonical in the default region and availability zone (us-east–1, us-east–1d. We’ll use the default instance size (m1.small). We must specify the user that we’ll connect with SSH (-x ubuntu), because it is not the default (root). We also specify the AWS SSH keypair (-S jtimberman). As a simple example, we’ll set up an Apache Web Server with Opscode’s apache2 cookbook with a simple run list (-r 'recipe[apt],recipe[apache2]'=), and use the =apt cookbook to ensure the APT cache is updated. Then, we specify the security groups so the right firewall rules are opened (-G default,www).

knife ec2 server create -x ubuntu -I ami-9a873ff3 -S jtimberman -G default,www -r 'recipe[apt],recipe[apache2]'

The first thing this command does is talk to the EC2 API and provision a new instance.

The “Bootstrap” Process

What follows will be the output of the knife bootstrap process. That is, it installs Chef, and then runs chef-client with the specified run list.

The registration step is where the “validation” key is used to create a new client key for this instance. On the Chef Server:

On the EC2 instance:

The client.pem file was created by the registration. We can safely delete the validation.pem file now, it is not needed, and there’s actually a recipe for that.

The client.rb looks like this:

The chef_server_url and validation_client_name came from the knife.rb file above. The node name came from the instance ID assigned by EC2. Node names on a Chef Server must be unique, EC2 instance IDs are unique, whereas the FQDN (Chef’s default here) may be recycled from terminated instances.

The ohai directory contains hints for EC2. This is a new feature of the knife-ec2 plugin and ohai, to help better identify cloud instances, since certain environments make it difficult to auto-detect (including EC2 VPC and Openstack).

Now that the instance has finished Chef, it has a corresponding node object on the Chef Server.

In this output, the FQDN is the private internal name, but the IP is the public address. This is so when viewing node data, one can copy/paste the public IP easily.

Managing Instance Lifecycle

There are many strategies out there for managing instance lifecycle in Amazon EC2. They all use different tools and workflows. The knife-ec2 plugin includes a simple “purge” option that will remove the instance from EC2, and if the node name in Chef is the instance ID, will remove the node and API client objects from Chef, too.


AWS EC2 is a wonderful environment to spin up new compute quickly and easily. Chef makes it even easier than ever to configure those instances to do their job. The scope of this post was narrow, to introduce some of the concepts behind the knife-ec2 plugin and how the bootstrap process works, and there’s much more that can be learned.

Head over to the Chef Documentation to read more about how Chef works.

Find cookbooks shared by others on the Chef Community Site.

If you get stuck, the community has great folks available via the IRC channels and mailing lists.