AWS Advent Day – Deploying SQL Server in AWS

Our third AWS Advent post comes to us by way of Jeremiah Peschka. You can find him fiddling with databas for BrentOzer

For one reason or another, you’re deploying SQL Server into AWS. There are a few different ways to think about deploying SQL Server and a few concerns that you have to address for each one. This doesn’t have to be difficult and, in some ways, it’s a lot easier than buying a physical server.

We’re going to take a look at three ways to deploy SQL Server in AWS. In two situations we’ll look at renting the licensing from Amazon and in two situations we’ll look at running our own instances. There’s some overlap here, but rest assured that’s a good thing.

SQL Server as a Service

One of the easiest ways to run SQL Server in AWS is to not run it at all. Or, at least, to make AWS run it for you. Amazon have a hosted database as a service product – Amazon RDS.

Benefits of SQL Server RDS

Good operational database administrators are hard to come by. SQL Server RDS doesn’t provide a good database administrator, but it does turn a large portion of database administration into a service.

Amazon provides:

  • An operating system and SQL Server
  • Automated configuration tools
  • Regular backups
  • The ability to clone a database
  • High availability (if you check the box)

In addition, you can provision new SQL Servers in response to customer demand as needed. The ability to rapidly spin up multiple SQL Server installations can’t be understated – new SQL Servers on demand is critical for multi-tenant companies. Abstracting away the creation, patching, and other operational tasks is a boon for small companies without experienced DBAs.

The Downside of SQL Server RDS

It’s easy to think that getting someone else to handle SQL Server is the way to go. After all, Amazon is responsible for just about everything apart from your code, right?

They’re not. While AWS is responsible for a lot of plumbing, you’re still responsible for writing software, designing data structures, monitoring SQL Server performance, and performing capacity planning.

Even worse, you’re responsible for the maintenance of all of this functionality and making sure that your index structures are free of fragmentation and corruption. It is still necessary for someone to set up jobs to monitor and address:

  • Index fragmentation
  • Database corruption

Even though AWS is doing some of the work, there’s still a lot left to do.

AWS Licensing

It’s possible to rent your licensing from AWS. This happens with SQL Server RDS, but it’s still possible to rent licensing if you’re using SQL Server Standard Edition. For many companies, this is an easy way to get into SQL Server licensing. AWS can offer a competitive price.

For teams who don’t need Enterprise Edition features, renting the licenses from AWS is an easy on-ramp. SQL Server Standard Edition supports a number of high availability features that are good enough for most applications. Many AWS instance sizes are small enough that the limitations of SQL Server Standard Edition – 16 cores and 64GB of memory (128GB for SQL server 2014) – isn’t a limitation at all.

Enterprise Edition and AWS

Sometimes you need more than 128GB of memory. Or more than 32 cores. In these case, you can buy your own licensing for SQL Server Enterprise Edition. Although it’s expensive, this is the only way to take advantage of the larger AWS instance types with their high core counts and reasonably large volumes of memory.

Many aspects of SQL Server Enterprise Edition are the same as they’d be for a physical SQL Server. The most important thing to realize is that this giant SQL Server is subject to some very finite scaling limitations – AWS doesn’t always have the fastest CPUs and the maximum instance sizes are limited. Scaling up indefinitely isn’t always an option. DBAs need to carefully watch the CPU utilization of different SQL Server features.

Embrace the limitations of AWS hardware and use that to guide your tuning efforts.

Lessons Learned

Follow a Set Up Guide

No, really. Do it. My coworkers and I maintain a SQL Server Setup Checklist. Don’t deploy SQL Server without one.

Script Everything

I can’t stress it enough – script everything.

Instance spin up time in AWS is fast enough that it makes sense to have a scripted installation process. Whether you’re scripting the configuration of an RDS SQL Server or you’re installing SQL Server on your own VMs, the ability to rapidly configure new instances is powerful.

Script your SQL Server setup and keep it in version control.

Use Solid State

  1. Don’t go cheap out and use rotational storage. 
    Just stop it. Even if you’re hosting a data warehouse, it’s not worth it. The throughput available from AWS SSDs is more than worth it.
  2. Use the local SSDs. 
    The ephemeral SSDs won’t last between reboots, but they are still local SSDs. SQL Server can take advantage of low latency drives for temporary workloads. Careful configuration makes it possible to house the tempdb database on the local AWS SSDs. Since tempdb is recreated on every boot, who cares if it goes away on instance restart?

Plan Carefully

Measure your storage needs in advance. In How Much Memory Does SQL Server Need?, I did some digging to help DBAs figure out the amount of I/O that SQL Server is performing. Measuring disk use (both in IOPS and throughput) will help your capacity planning efforts. It’s okay to get this wrong, most of us do.

SQL Server tracks enough data that you can make an educated guess about IOPS, throughput, and future trends. Just make sure you look at the metrics you’re given, figure out the best route forward, and have plans in place to deal with a mistake in your capacity planning.

Hardware is Limited

It’s easy enough to buy a big 4 or 8 socket with terabytes of RAM. But that’s not possible in AWS. Embrace limitations.

Instead of scaling up SQL Server as high as the budget will allow, embrace the constraints of AWS hardware. Spin off different services into other AWS services or bring your services into the mix. Don’t think of it as abandoning SQL Server, think of it as scaling your architecture. By moving away from a monolithic database server, you’re able to scale individual portions of the application separately.

SQL Server Full Text Search is one example of a feature that can be scaled out. Finding the resources to create a full text search server using ElasticSearch or SOLR can be difficult in a physical data center. With AWS, you can spin up a new VM, with software installed, in a few minutes and be ready to index and query data.

Licensing is Tricky

Starting in SQL Server 2012, licensing became core based, rather than socket based. And, because of how Microsoft licenses SQL Server in virtual environments, those core based licenses may not be what you thought they are. Check with Amazon or your licensing reseller to make sure you’re licensed correctly.

Wrapping Up

There’s no reason to fear deploying SQL Server in AWS, or any cloud provider. For many applications, SQL Server RDS fits the bill. The more customized your deployment, the more likely you are to need SQL Server in an EC2 instance. As long as you keep these guidelines in mind, you’re likely to be successful.


AWS Advent 2014 – CoreOS and Kubernetes on AWS

Our second AWS Advent Post comes to us from Tim Dysinger. He walks us through exploring CoreOS and Kubernetes.

There’s a copy of the source and example code from this post on Github

What’s a CoreOS?

CoreOS is a fork of CrOS, the operating system that powers Google Chrome laptops. CrOS is a highly customized flavor of Gentoo that can be entirely built in one-shot on a host Linux machine. CoreOS is a minimal Linux/Systemd opperating system with no package manager. It is intended for servers that will be hosting virtual machines.

CoreOS has “Fast Patch” and Google’s Omaha updating system as well as CoreUpdate from the CoreOS folks. The A/B upgrade system from CrOS means updated OS images are downloaded to the non-active partition. If the upgrade works, great! If not, we roll back to the partition that still exists with the old version. CoreUpdate also has a web interface to allow you to control what gets updated on your cluster & when that action happens.

While not being tied specifically to LXC, CoreOS comes with Docker “batteries included”. Docker runs out of the box with ease. The team may add support for an array of other virtualization technologies on Linux but today CoreOS is known for it’s Docker integration.

CoreOS also includes Etcd, a useful Raft-based key/value store. You can use this to store cluster-wide configuration & and to provide look-up data to all your nodes.

Fleet is another CoreOS built-in service that can optionally be enabled. Fleet takes the systemd and stretches it so that it is multi-machine aware. You can define services or groups of services in a systemd syntax and deploy them to your cluster.

CoreOS has alpha, beta & stable streams of their OS images and the alpha channel gets updates often. The CoreOS project publishes images in many formats, including AWS images in all regions. They additionally share a ready-to-go basic AWS CloudFormation template from their download page.

Prerequisites

Today we are going to show how you can launch Google’s Kubernetes on Amazon using CoreOS. In order to play along you need the following checklist completed:

  • AWS account acquired
  • AWS_ACCESS_KEY_ID environment variable exported
  • AWS_SECRET_ACCESS_KEY environment variable exported
  • AWS_DEFAULT_REGION environment variable exported
  • Amazon awscli tools http://aws.amazon.com/cli installed
  • JQ CLI JSON tool http://stedolan.github.io/jq/ installed

You should be able to execute the following, to print a list of your EC2 Key-Pairs, before continuing:

CoreOS on Amazon EC2

Let’s launch a single instances of CoreOS just so we can see it work by itself. Here we create a small a YAML file for AWS ‘userdata’. In it we tell CoreOS that we don’t want automatic reboot with an update (we may prefer to manage it manually in our prod cluster. If you like automatic then don’t specify anything & you’ll get the default.)

Our super-basic cloud-config.yml file looks like so:

Here we use ‘awscli’ to create a new Key-Pair:

We’ll also need a security group for CoreOS instances:

Let’s allow traffic from our laptop/desktop to SSH:

Now let’s launch a single CoreOS Amazon Instance:

Running a Docker Instance The Old Fashioned Way

Login to our newly launched CoreOS EC2 node:

Start a Docker instance interactively in the foreground:

OK. Now terminate that machine (AWS Console or CLI). We need more than just plain ol’ docker. To run a cluster of containers we need something to schedule & monitor the containers across all our nodes.

Starting Etcd When CoreOS Launches

The next thing we’ll need is to have etcd started with our node. Etcd will help our nodes with cluster configuration & discovery. It’s also needed by Fleet.

Here is a (partial) Cloud Config userdata file showing etcd being configured & started:

You need to use a different discovery URL (above) for every cluster launch. This is noted in the etcd documentation. Etcd uses the discovery URL to hint to nodes about peers for a given cluster. You can (and probably should if you get serious) run your own internal etcd cluster just for discovery. Here’s the project page for more information on etcd.

Starting Fleetd When CoreOS Launches

Once we have etcd running on every node we can start up Fleet, our low-level cluster-aware systemd coordinator.

We need to open internal traffic between nodes so that etcd & fleet can talk to peers:

Let’s launch a small cluster of 3 coreos-with-fleet instances:

Using Fleet With CoreOS to Launch a Container

Starting A Docker Instance Via Fleet

Login to one of the nodes in our new 3-node cluster:

Now use fleetctl to start your service on the cluster:

NOTE: There’s a way to use the FLEETCTL_TUNNEL environment variable in order to use fleetctl locally on your laptop/desktop. I’ll leave this as a viewer exercise.

Fleet is capable of tracking containers that fail (via systemd signals). It will reschedule a container for another node if needed. Read more about HA services with fleet here.

Registry/Discovery feels a little clunky to me (no offense CoreOS folks). I don’t like having to manage separate “sidekick” or “ambassador” containers just so I can discover & monitor containers. You can read more about Fleet discovery patterns here.

There’s no “volume” abstraction with Fleet. There’s not really a cohesive “pod” definition. Well there is a way to make a “pod” but the config would be spread out in many separate systemd unit files. There’s no A/B upgrade/rollback for containers (that I know of) with Fleet.

For these reasons, we need to keep on looking. Next up: Kubernetes.

What’s Kubernetes?

Kubernetes is a higher-level platform-as-service than CoreOS currently offers out of the box. It was born out of the experience of running GCE at Google. It still is in it’s early stages but I believe it will become a stable useful tool, like CoreOS, very quickly.

Kubernetes has an easy-to-configure “Pods” abstraction where all containers that work together are defined in one YAML file. Go get some more information here. Pods can be given Labels in their configuration. Labels can be used in filters & actions in a way similar to AWS.

Kubernetes has an abstraction for volumes. These volumes can be shared to Pods & containers from the host machine. Find out more about volumes here.

To coordinate replicas (for scaling) of Pods, Kubernetes has the Replication Controller that coordinates maintaining N Pods in place on the running cluster. All of the information needed for the Pod & replication is maintained in the configuration for replications controllers. To go from 8 replicates to 11 is just increment a number. It’s the equivalent of AWS AutoScale groups but for Docker Pods. Additionally there are features that allow for rolling upgrades of a new version of a Pod (and the ability to rollback an unhealthy upgrade). More information is found here.

Kubernetes Services are used to load-balance across all the active replicates for a pod. Find more information here.

A Virtual Network for Kubernetes With CoreOS Flannel

By default an local private network interface (docker0) is configured for Docker guest instances when Docker is started. This network routes traffic to & from the host machine & all docker guest instances. It doesn’t route traffic to other host machines or other host machine’s docker containers though.

To really have pods communicating easily across machines, we need a route-able sub-net for our docker instances across the entire cluster of our Docker hosts. This way every docker container in the cluster can route traffic to/from every other container. This also means registry & discovery can contain IP addresses that work & no fancy proxy hacks are needed to get from point A to point B.

Kubernetes expects this route-able internal network. Thankfully the people at CoreOS came up with a solution (currently in Beta). It’s called “Flannel” (formally known as “Rudder”).

To enable a Flannel private network just download & install it on CoreOS before starting Docker. Also you must tell Docker to use the private network created by flannel in place of the default.

Below is a (partial) cloud-config file showing fleetd being downloaded & started. It also shows a custom Docker config added (to override the default systemd configuration for Docker). This is needed to use the Flannel network for Docker.

Flannel can be configured to use a number of virtual networking strategies. Read more about flannel here.

Adding Kubernetes To CoreOS

Now that we have a private network that can route traffic for our docker containers easily across the cluster, we can add Kubernetes to CoreOS. We’ll want to follow the same pattern for cloud-config of downloading the binaries that didn’t come with CoreOS & adding systemd configuration for their services.

The download part (seen 1st below) is common enough to reuse across Master & Minion nodes (The 2 main roles in a Kubernetes cluster). From there the Master does most of the work while the Minion just runs kube-kublet|kube-proxy & does what it’s told.

Download Kubernetes (Partial) Cloud Config (both Master & Minion):

Master-Specific (Partial) Cloud Config:

Minion-Specific (Partial) Cloud Config:

Kube-Register

Kube-Register bridges discovery of nodes from CoreOS Fleet into Kubernetes. This gives us no-hassle discovery of other Minion nodes in a Kubernetes cluster. We only need this service on the Master node. The Kube-Register project can be found here. (Thanks, Kelsey Hightower!)

Master Node (Partial) Cloud Config:

All Together in an AWS CFN Template with AutoScale

Use this CloudFormation template below. It’s a culmination of the our progression of launch configurations from above.

In the CloudFormation template we add some things. We add 3 security groups: 1 Common to all Kubernetes nodes, 1 for Master & 1 for Minion. We also configure 2 AutoScale groups: 1 for Master & 1 for Minion. This is so we can have different assertions over each node type. We only need 1 Master node for a small cluster but we could grow our Minions to, say, 64 without a problem.

I used YAML here for reasons: 1. You can add comments at will (unlike JSON). 2. It converts to JSON in a blink of an eye.

Converting To JSON Before Launch

If you have another tool you prefer to convert YAML to JSON, then use that. I have Ruby & Python usually installed on my machines from other DevOps activities. Either one could be used.

Launching with AWS Cloud Formation

SSH into the master node on the cluster:

We can still use Fleet if we want:

But now we can use Kubernetes also:

Looks something like this: img

Here’s the Kubernetes 101 documentation as a next step. Happy deploying!

Cluster Architecture

Just like people organizations, these clusters change as they scale. For now it works to have every node run etcd. For now it works to have a top-of-cluster master that can die & get replaced inside 5 minutes. These allowances work in the small scale.

In the larger scale, we may need a dedicated etcd cluster. We may need more up-time from our Kubernetes Master nodes. The nice thing about our using containers is that re-configuring things feels a bit like moving chess pieces on a board (not repainting the scene by hand).

Personal Plug

I’m looking for contract work to fill the gaps next year. You might need help with Amazon (I’ve using AWS FT since 2007), Virtualization or DevOps. I also like programming & new start-ups. I prefer to program in Haskell & Purescript. I’m actively using Purescript with Amazon’s JS SDK (& soon with AWS Lambda). If you need the help, let’s work it out. I’m @dysinger on twitter, dysinger on IRC or send e-mail to tim on the domain dysinger.net

P.S. You should really learn Haskell. 🙂


AWS Advent Day 1 – Kappa: Simplifying AWS Lambda Deployments

Our first AWS Advent post comes to us from Mitch Garnaat, the creator of the AWS python library boto and who is currently herding clouds and devops over at Scopely. He’s gonna walk us through exploring AWS Lambda and some tooling he built to help use it.

AWS Lambda is an interesting new service from Amazon Web Services. It allows you to write Lambda Functions and associate these functions with events such as new files appearing in an S3 bucket or new records being written to an Amazon Kinesis stream. The details of how the functions get executed and how they are scaled to meet demand are handled completely by the AWS Lambda service. So, as the developer, you don’t have to worry about instances or load balancers or auto scaling groups, etc. It all just happens automatically for you.

Sound too good to be true? Well, there are some caveats. The main one is that the AWS Lambda service is in Preview right now so there are some rough edges. The good news is that AWS has made the service available for testing and evaluation and your input can have a big impact on the future of the service. I encourage you to give it a try.

My first impressions of AWS Lambda (aside from the obvious wow factor) is that the process of creating and deploying a Lambda Function was more complicated than I imagined. For example, to have a small Javascript function called whenever a record is written to an Amazon Kinesis requires quite a few steps.

  • Write the Javascript function (AWS Lambda only supports Javascript right now)
  • Create an IAM Role that will be used to allow the Lambda Function to access any AWS resources it needs when executing.
  • Zip up the Javascript function and any dependencies
  • Upload the zip file to the AWS Lambda service
  • Send test data to the Lambda Function
  • Create an IAM Role that will be used by the service invoking your Lambda Function
  • Retrieve the output of the Lambda Function from Amazon CloudWatch Logs
  • Add an event source to the Lambda Function
  • Monitor the output of the live function

Each of these steps actually requires multiple steps involving different services. For example, the roles are created in IAM but you then need to know the ARN of those roles when uploading the function or adding the event source. The bottom line is that using AWS Lambda at the moment requires a lot of knowledge about other Amazon Web Services.

Sounds Like We Need Some Tools

Whenever I’m faced with a task that is complicated, fiddly, and repetitive, my first reaction is always to think about what kind of tool I can create to make it easier. That’s where kappa comes in.

Kappa is a command line tool written in Python. The goal of kappa is to make it easier to deploy Lambda Functions. It tries to handle many of the fiddly details and hopefully lets you focus more on what your Lambda Functions are actually doing.

Getting Kappa

You can install kappa using PyPI but for our purposes the best way is to simply clone the github repo. Once you have cloned it (hopefully inside a virtualenv) simply run:

and you should be all set.

A Simple Kinesis Example

To get an idea of how kappa works, let’s try a simple example of a Lambda Function that gets called each time a new record is written to a Kinesis stream. The function we will write doesn’t really do anything except log some debug output but the possibilities are endless. For example, I have a modified version of this basic function that indexes the payload in the Kinesis record to an ElasticSearch server. So, you can basically send any kind of JSON data into a Kinesis stream and it will get indexed in ElasticSearch. And I don’t have to create any EC2 instances or any other compute resources to make it happen.

The actual example is bundled with the kappa github repo so if you have cloned the repo as described above, simply cd into the samples/kinesis directory to find the example files.

Roles and Policies

First, lets handle the IAM Roles we will need. We need an execution role and an invocation role. The former represents the permissions granted to our function when it is being executed. The latter represents the permissions granated to whichever service is actually responsible for invoking our function.

IAM roles and policies are complex and rather arcane. The best approach is usually to find a working example similar to what you want and modify it. I stole the policies (and some other things) from theexample AWS provides in the Lambda docs. I then repackaged these as a CloudFormation template (see roles.cf in the sample directory).

The benefit of using CloudFormation to handle the IAM roles and policies is that it provides a transactional approach to creating and updating the roles and policies and lets you version control the policies easily in git. Kappa takes care of all of the details of dealing with CloudFormation for you. You should be able to use these roles and policies directly although over time you may need to modify them or further restrict them.

Config

Kappa is driven from a YAML config file. There is a sample config file in the sample directory. You will have to make a couple of changes.

  • The profile attribute refers to the profile within your AWS config file (e.g. the one used by botocore and AWSCLI). These are the credentials that kappa will use.
  • The region attribute refers to the AWS region used. You may need to adjust this.
  • The event_source attribute refers to the source of events driving our Lambda Function. In our case, that should be the ARN of the Kinesis stream you have already created for use in this sample.

Make It So

Now we are ready to go. To deploy your sample app:

This will create the stack in CloudFormation containing our IAM policies and roles. It will wait for the stack creation to complete and then it will retrieve the ARN for the execution policy from the stack resources. Finally, it will also zip up the Javascript function and upload that to AWS Lambda.

At this point our Lambda Function is available in Lambda but its not hooked up to any event sources yet. Before we do that, we can test it out with some sample data. The input.json file in the sample directory contains data similar to what our function will receive and this file is referenced in our config file so we can easily send this test data to our function like this:

This calls the InvokeAsynch request of the AWS Lambda service. Our Lambda Function will get called with the test data and we should be able to see the output in the Amazon CloudWatch Logs service. To see the output:

Kappa takes care of finding the right log group name and stream in the CloudWatch Logs service that contains our output. It then prints the most recent log events from that stream. Note that it can take a few minutes for the log output to become available so you might have to make this call a few times before the data shows up.

Assuming that our test looks good, we can now configure our Lambda Function to start getting live events from our Kinesis stream. To do this:

Kappa finds the invocation role we created with CloudFormation earlier and finds the ARN of the Kinesis stream in our config file and then calls the AddEventSource request of the AWS Lambda service to hook our Lambda Function up to the Kinesis stream.

At this point, you can send some real data to your Kinesis stream and use the kappa tail command to see the output of your function based on those new events.

If you need to make changes to your roles, policies, or to the function itself just call kappa deployagain and kappa will take care of updating the CloudFormation stack and uploading the new version of your Javascript function.

Next Steps

The kappa tool is very new. I hope its useful but I’m sure it will become even more useful with feedback from folks who are actually using it. Don’t be shy! Give it a try and create some issues.

Finally, here are some useful links related to AWS Lambda.


AWS Advent is near

All the welcome/reminder emails for current volunteer authors have been sent out. Please get in touch with @solarce if you volunteered and didn’t get one.

If you still want to volunteer, see below and then add yourself to the Google Spreadsheet

We still have 5 slots available for the following days

  • 12/7
  • 12/8
  • 12/9
  • 12/14
  • 12/16
  • 12/20

And I would love posts on the following topics

  • Kinesis
  • Amazon Lambda
  • Amazon Aurora
  • S3 And Glacier for Scalable Storage and Archiving
  • Monitoring, Metrics, and Logging in the Cloud (CloudWatch and CloudTrail best practices)
  • Managing AWS Billing (Comparing Netflix ICE and/or Hosted Services?)

AWS Advent 2014 Call for Participation

The end of 2014 is upon us and folks seem interested in having an AWS Advent this year.

I’m willing to curate and run one if I can get at least 15 submissions from folks.

Potential Post Topics we’d love to see

  • DynamoDB Intro
  • Kineses Intro
  • Lambda Intro
  • CodeDeploy/CodeCommit/CodePipeline (ALM) Intro
  • Amazon Aurora Intro
  • CloudFormation Best Practices for 2014
  • Securing your AWS Credentials (IAM, MFA, IAM Roles, Launch Configs, Sharing Secrets, using KMS)
  • S3 And Glacier for Scalable Storage and Archiving
  • Monitoring, Metrics, and Logging in the Cloud (CloudWatch and CloudTrail best practices)
  • Managing AWS Billing (Netflix ICE, Hosted Services?)

If you’re willing to contribute a post, please add yourself to the Google Spreadsheet: https://docs.google.com/spreadsheets/d/1uWwxJlDR9EzGdbh26LCZ9_YUYPDd8pal0JldO74yfiU/edit?usp=sharing


AWS Advent 2012 Recap

It’s hard to believe that the 2012 AWS Advent is drawing to close. This all started because on 11/30 I found myself explaining what EC2 and this “cloud business” was to my father-in-law who is an old school C/C++ software developer and this got me thinking that an advent calendar explaining and exploring AWS would be beneficial, so I made a Tumblr blog and a Twitter account and dove in.

Topics

In 24 days we’ve done 21 posts with great content. Eighteen of which were written by yours truly, and we had three contributed posts, as well as a number of Tweets and RTs.

The topics covered were:

Thanks

I’d like to thank everyone who followed this on Twitter, followed on Tumblr, posted your own tweets, or gave me feedback elsewhere.

A special thanks to Joshua Timberman, Erik Hollensbe, and Benjamin Krueger for contributing articles.

All the articles and sample code has been posted to a Github repository.

If you liked the content you saw here, follow me on Twitter or on my (hopefully updated more in 2013) blog.

Have a Merry Christmas and a Happy New Year!

Feedback

If you have any feedback please contact me on on Twitter, @solarce or email me solarce+awsadvent2012 at gmail dot com. I’d love to hear what you liked, didn’t like, would like to see, or maybe contribute, for next year.


Strategies for re-usable CloudFormation Templates

In day 7’s post we learned about how CloudFormation (CFN) can help you to automate the creation and management of your AWS resources. It supports a wide variety of AWS services, includes the ability to pass in user supplied paramaters, has a nice set of CLI tools, and a few handy functions you are able to use in the JSON files.

In today’s post we’ll explore some strategies for getting the most out of your CFN stacks by creating re-usable templates.

Down with Monolithic

A lot of the CFN sample templates are monothithic, meaning that the template defines all the resources needed for an application’s infrastructure in a single template and so they all get created as part of a single stack. Examples of this are the Multi-tier VPC example or the Redmine Multi-AZ with Multi-AZ RDS example.

In keeping with the ideas of agile operations or infrastructure as code, I think that the way we should use CFN templates is as re-usable bits of infrastructure code to manage our AWS resources.

Layer cake

The approach that I’ve come up with for this is a series of layers, as outlined below:

  • VPC template – this defines the VPC for a region, your set of subnets (private and public), an internet gateway, any NAT instances you may want, your initial security groups, and possibly network ACLs
  • EC2 instance template – here you define the kinds of instances you want to run, passing in a VPC id, one or more AMI ids, security groups, EBS volumes, etc that are needed to run your infrastructure. Whether you make a template that is takes everything as parameters or one that is more monolithic in defining your instances is up to you
  • ELB (+ Auto-scaling) template – this defines one or more ELB instances needed, passing in your EC2 instances ids, listener ports, subnets, etc as parameters. Optionally, if you’re going to use auto-scaling, I’d include that in this template, along with the parameters needed it for it, since AS makes most sense when used with ELB for web facing applications.
  • S3 (+ CloudFront) template – this defines the buckets, ACLs, lifecycle policies, etc that are needed. Parameters
  • RDS template – this defines your RDS instances, taking a VPC id, subnet, RDS instance class, etc as parameters
  • If you’re using Route53 for DNS, I recommend putting the needed Route53 resources in each layer’s template.

These covers the most common resources you’re likely to use in a typical web applications, if you’re using other services like DynamoDB or Simple Notification Services then you should make additional templates as needed.

The overall approach is that your templates should have sufficient parameters and outputs to be re-usable across environments like dev, stage, qa, or prod and that each layer’s template builds on the next.

Some examples

As with any new technique, it is useful to have some examples.

Example VPC template

This template does not require any inputs, it will make a VPC with a network of 10.20.0.0/16, a public subnet of 10.20.10.0/24 , a private subnet of 10.20.20.0/24, with default inbound 22 and 80 rules, and typical outbound rules.

It returns the newly created VPC’s id.

Example EC2 instance template

This template will create an EC2 instance, it requires you give it an ssh keypair name, a VPC id, a Subnet id within your VPC, an AMI id, and a Security Group.

It returns the EC2 instance id, the subnet, and the security group id.

Conclusion

Hopefully this has provided you with some strategies and examples for how to create re-usable CFN templates and build your infrastructure from a series of layered stacks.

As you build your templates, you’ll want to build some automation with a language library to drive the creation of each stack and manage passing your inputs from one stack to the next or see the earlier AWS Advent post on Automating AWS.

To explore this further I recommend you play with and tear apart the CloudFormation example templates Amazon has made available.


Exploring aws-cli

Yesterday Mitch Garnaat, a Senior Engineer at Amazon, announced the developer candidate release of a new AWS cli tool, awscli.

The tool is open source, available under the Apache 2.0 license, written in Python, and the code is up on Github.

The goal of this new cli tool is to provide a unified command line interface to Amazon Web Services.

It currently supports the following AWS services:

  • Amazon Elastic Compute Cloud (Amazon EC2)
  • Elastic Load Balancing
  • Auto Scaling
  • AWS CloudFormation
  • AWS Elastic Beanstalk
  • Amazon Simple Notification Service (Amazon SNS)
  • Amazon Simple Queue Service (Amazon SQS)
  • Amazon Relational Database Service (Amazon RDS)
  • AWS Identity and Access Management (IAM)
  • AWS Security Token Service (STS)
  • Amazon CloudWatch
  • Amazon Simple Email Service (Amazon SES)

This tool is still new, but it looks very promising. Let’s explore some of the ways we can use it.

Getting Started

To get started with awscli you’ll install it, create a configuration file, and optionally add some bash shell completions.

Installation

awscli can be quickly installed with either easy_install or pip

Once it is installed it you should have a aws tool available to use. You can confirm this with the command shown below:

If you run it without any arguments if should look like this:

Configuration

You’ll need to make a configuration file for it. I am assuming you’ve already created and know your AWS access keys.

I created my configuration file as ~/.aws, and when you create yours, it should look like

You’ll want to set the region to the region you have AWS resources running in.

Once you’ve created it, you’ll set an environement variable to tell the aws tool where to find your configuration, you can do this with the following command

bash Completions

If you’re a bash shell user, you can install some handy tab completions with the following command

zsh shell users should look at https://github.com/aws/aws-cli/issues/5 for how to try to get completion working.

While I am a zsh user, I am still on 4.3.11 so I used bash for the purposes of testing out the awscli.

Let’s test it out, the following command should return a bunch of JSON output describing any instances in the region you’ve put in your configuration file. You can also tell aws to return text output by using the –output text argument at the end of your command.

Since all the sample output is very instance specific, I don’t have a good example of the output to share, but if the command works, you’ll know you got the right output. 😉

Now that we have the aws tool installed and we know it’s working, let’s take a look at some of the ways we can use it for fun and profit.

Managing EC2

The primary way a lot of you may use the aws tool is to manage EC2 instances.

To do that with the aws command, you use the ec2 service name.

With the tab completion installed, you can quickly see that aws ec2 <tab><tab> has 144 possible functions to run.

To view your EC2 resources you use the describe- commands, such as describe-instances which lists all your instances, describe-snapshots which lists all your EBS snapshots, or describe-instance-status which you give the argument –instance-id to see a specific instance.

To create new resources you use the create- commands, such as create-snapshot to create a new snapshot of an EBS volume or create-vpc to create a new VPC.

To launch a new EC2 instance you use the run-instances command, which you give a number of arguments including –instance-type, –key-name (your ssh keypair), or –user-data. aws ec2 run-instances --<tab><tab> is a quick way to review the available options.

There are a number of other kinds of commands available, including attach-, delete-, and modify. You can use the bash completion or the documentation to learn and explore all the available commands and each command’s arguments.

Managing S3

Unfortunately the aws tool does not support S3 yet, but boto has great S3 support, s3cmd is popular, or you can use the AWS S3 Console.

Managing CloudFormation

The aws tool supports managing CloudFormation.

You can see your existing stacks with list-stacks or see an a specific stack’s resources with list-stacks-resources and the –stack-name argument.

You can create or delete a stack with the aptly named create-stack and delete-stack commands.

You can even use the handy estimate-template-cost command to get a template sent through the AWS calculator and you’ll get back a URL with all your potential resources filled out.

Managing ELB

The aws tool supports managing Elastic Load Balancer (ELB).

You can see your existing load balancers with the describe-load-balancers command. You can create a new load balancer with the create-load-balancer, which takes a number of arguments, including –availability-zones, –listeners, –subnets or –security-groups. You can delete an existing load balancer with the delete-load-balancer command.

You can add or remove listeners to an existing load balancer with the create-load-balancer-listeners and delete-load-balancer-listeners.

Managing CloudWatch

The aws tool supports managing CloudWatch.

You can review your existing metrics with the list-metrics command and your existing alarms with the describe-alarms command. You can look at the alarms for a specific metric by using describe-alarms-for-metric and the –metric-name argument.

You can enable and disable alarm actions with the enable-alarm-actions and disable-alarm-actions commands.

Where to go from here?

You should make sure you’ve read the README.

To get more familiar with the commands and arguments, you should use both the bash completions and the built-in help.

To see the help for a specific command you invoke it like shown below:

An example is

You’ll get some details on each of the available commands for a given service.

From there, if you encounter issues or had ideas for feedback you should file an issue on Github.

While not an official channel, I idle in ##aws on irc.freenode.net and am happy to answer questions/provide help when I have time.


EC2 In-depth

In day 1’s post on AWS Key Concepts we learned a little about EC2, but as you’ve come to see in these past few posts, anyone using seriously AWS is likely using EC2 as a major part of their application infrastructure.

Let’s review what we learned about EC2 previously. EC2 is the Elastic Compute Cloud. It provides you with a variety of compute instances with set levels of CPU, RAM, and Disk allocations. You utilize these instances on demand, with hourly based pricing, but you can also pay to reserve instances.

An EC2 instance operating system is cloned from an AMI (Amazon Machine Images). These are the base from which your instances will be created. A number of operating systems are supported, including Linux, Windows Server, FreeBSD (on some instance types), and OmniOS.

Pricing

EC2 instance pricing is per hour and varies by instance type and class. You begin paying for an instance as soon as you launch it. You also pay AWS’s typical bandwidth charges for any traffic that leaves the region your EC2 instance is running in.

For more details, consult the EC2 FAQ on Pricing.

Storage Options

There are two types of storage available for your instance’s root device volume:

  1. Instance store: In this case the root device for an instance launched from the AMI is an instance store volume created from a template stored in Amazon S3. An instance store is not persistent and has a fixed size, but it uses storage local to the instance’s host server. You’re not able to derive new AMIs from instance store backed instances.

  2. Elastic Block Store (EBS): EBS is a separate AWS service, but one of it’s uses is for the root storage of instances. These are called EBS backed instances. EBS volumes are block devices of N gigabytes that are available over the network and have some advanced snapshotting and performance features. This storage persists even if you terminate the instance, but this incurs additional costs as well. We’ll cover more EBS details below. If you choose to use EBS optimized instance types, your instance will be provisioned with a dedicated NIC for your EBS traffic. Non-EBS optimized instanced share EBS traffic with all other traffic on the instance’s primary NIC.

There are also two types of storage available to instances for additional storage needs:

  1. Ephemeral storage: Ephemeral storage are disks that are local to the instance host and the number of disks you get depends on the size of your instance. This storage is wiped whenever there is an event that terminates an instance, whether an EC2 failure or an action by a user.

  2. EBS: As mentioned, EBS is a separate AWS service, you’re able to create EBS volumes of N gigabytes and attach them over the network. You’re also able to take advantage ot their advanced snapshotting and performance features. This storage persists even if you terminate the instance, but this incurs additional costs as well.

Managing Instances

Managing instances can be done through the AWS console, the EC2 API tools, or the API itself.

The lifecycle of a EC2 instance is typically

  • Creation from an AMI
  • The instances runs, you may attach [EBS volumes] or [Elastic IPs], you may also restart the instance
  • You may stop an instance.
  • Eventually you may terminate the instance or the instance may go away due to a host failure.

It’s important to note that EC2 instances are meant to be considered disposable and that you should use multiple EC2 instances in multiple Availability Zones to ensure the availability of your applications.

Instance IPs and ENIs

So once you’ve begun launching instances, you’ll want to login and access them, and you’re probably wondering what kind of IP addresses your instances come with.

The EC2 FAQ on IP addresses tells us:

By default, every instance comes with a private IP address and an internet routable public IP address. The private address is associated exclusively with the instance and is only returned to Amazon EC2 when the instance is stopped or terminated. The public address is associated exclusively with the instance until it is stopped, terminated or replaced with an Elastic IP address.

If you’re deploying your instances in a VPC, you’re also able to use Elastic Network Interfaces (ENIs). These are virtual network interfaces that let you add additional private IP addresses to your EC2 instances running in a VPC.

Security Groups

Ensuring your EC2 instances are secure at the network is a vital part of your infrastructure’s overall security assurance. Network level security for EC2 instances is done through the use of security groups.

A security group acts as a firewall that controls the traffic allowed to reach one or more instances. When you launch an Amazon EC2 instance, you associate it with one or more security groups. You can add rules to each security group that control the inbound traffic allowed to reach the instances associated with the security group. All other inbound traffic is discarded. Security group rules are stateful.

Your AWS account automatically comes with a default security group for your Amazon EC2 instances. If you don’t specify a different security group at instance launch time, the instance is automatically associated with your default security group.

The initial settings for the default security group are:

  • Allow no inbound traffic
  • Allow all outbound traffic
  • Allow instances associated with this security group to talk to each other

You can either chose to create new security groups with different sets of inbounds rules, which you’ll need if you’re running a multi-tier infrastructure, or you can modify the default group.

In terms of the limitations of security groups, you can create up to 500 Amazon EC2 security groups in each region in an account, with up to 100 rules per security group. In Amazon VPC, you can have up to 50 security groups, with up to 50 rules per security group, in each VPC. The Amazon VPC security group limit does not count against the Amazon EC2 security group limit.

Spot and Reserved Instances

Besides paying for EC2 instances on-demand, you’re able to utilize instance capacity in two other ways, Spot instances and Reserved instances.

Spot Instances

If you have flexibility on when your application will run, you can bid on unused Amazon EC2 compute capacity, called Spot Instances, and lower your costs significantly. Set by Amazon EC2, the Spot Price for these instances fluctuates periodically depending on the supply of and demand for Spot Instance capacity.

To use Spot Instances, you place a Spot Instance request (your bid) specifying the maximum price you are willing to pay per hour per instance. If the maximum price of your bid is greater than the current Spot Price, your request is fulfilled and your instances run until you terminate them or the Spot Price increases above your maximum price. Your instance can also be terminated when your bid price equals the market price, even when there is no increase in the market price. This can happen when demand for capacity rises, or when supply fluctuates.

You will often pay less per hour than your maximum bid price. The Spot Price is adjusted periodically as requests come in and the available supply of instances changes. Everyone pays that same Spot Price for that period regardless of whether their maximum bid price was higher, and you will never pay more than your hourly maximum bid price.

Reserved Instances

You can use Reserved Instances to take advantage of lower costs by reserving capacity. With Reserved Instances, you pay a low, one-time fee to reserve capacity for a specific instance and get a significant discount on the hourly fee for that instance when you use it. Reserved Instances, which are essentially reserved capacity, can provide substantial savings over owning your own hardware or running only On-Demand instances. Reserved Instances are available from AWS in one- and three-year terms. Reserved Instances are available in three varieties—Heavy Utilization, Medium Utilization, and Light Utilization

Launching your Reserved Instance is the same as launching any On-Demand instance: You launch an instance with the same configuration as the capacity you reserved, and AWS will automatically apply the discounted hourly rate that is associated with your capacity reservation. You can use the instance and be charged the discounted rate for as long as you own the Reserved Instance. When the term of your Reserved Instance ends, you can continue using the instance without interruption. Only this time, because you no longer have the capacity reservation, AWS will start charging you the On-Demand rate for usage.

To purchase an Amazon EC2 Reserved Instance, you must select an instance type (such as m1.small), platform (Linux/UNIX, Windows, Windows with SQL Server), location (Region and Availability Zone), and term (either one year or three years). When you want your Reserved Instance to run on a specific Linux/UNIX platform, you must identify the specific platform when you purchase the reserved capacity.

Tagging

Tagging is a minor EC2 feature that I find interesting. Tags are a key-value pair that you can apply to one or many EC2 instances. These let you add your own metadata to your EC2 instances for use in inventory or lifecycle management of your instances.

The following basic restrictions apply to tags:

  • Maximum number of tags per resource—10
  • Maximum key length—128 Unicode characters
  • Maximum value length—256 Unicode characters
  • Unavailable prefixes: aws (we have reserved it for tag names and values)
  • Tag keys and values are case sensitive.

You’re able to tag a wide variety of resources.

Tagging can be done through the AWS console, the EC2 API tools, or the API itself.

Conclusion

EC2 instances are more than, but also different from, typical VPS instances. The flexibility of being able to use EC2 instances, of the many types and classes, coupled with the hourly pricing let’s you do many things with your infrastructure that traditional data centers did not make possible. But the disposable nature of EC2 instances have some drawbacks. These should all be considered careful as you decide how and when to use EC2 instances for your applications.

As we’ve seen, there are a number of options for how you can pay for your EC2 instances and how you manage the instance lifecycle.


Monitoring and AWS

A critical part of any application infrastructure is monitoring. Now monitoring can mean a lot of things to different people, but for the purposes of this post we’re going to define monitoring as two things

  1. Collecting metrics data to look at performance over time
  2. Alerting on metrics data based on thresholds

Let’s take a look at some of the tools and services available to accomplish this and some of their unique capabilities.

There are certainly many many options for this, as a search for “aws monitoring” will reveal, but I am going to focus on a few options that I am familiar with and see as representing the various classes of options available.

Amazon CloudWatch

Amazon CloudWatch is of course the first option you may think when your using AWS resources for your application infrastructure as it’s already able to to automatically provide you with metric data for most AWS services.

CloudWatch is made up of three main components:

  • metrics: metrics are data points that are stored in a time series format.
  • graphs: which are visualizations of metrics over a time period.
  • alarms: an alarm watches a single metric over a time period you specify, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods

Currently CloudWatch has built-in support for the following services:

  • AWS Billing
  • Amazon DynamoDB
  • Amazon ElastiCache
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic MapReduce
  • Amazon Relational Database
  • Amazon Simple Notification Service
  • Amazon Simple Queue Service
  • Amazon Storage Gateway
  • Auto Scaling
  • Elastic Load Balancing

CloudWatch provides you with the API and storage to be able to monitor and publish metrics for these AWS services, as well as adding your own custom data, through many interfaces, including CLI tools, An API, or via many language libraries. They even provide some sample monitoring scripts for collecting OS information for Linux and Windows.

Once your metric data is being stored by CloudWatch, you’re able to create alarms which use AWS SNS to send alerts via email, SMS, or posting to another web service.

Finally, you’re able to visualize these metrics in various ways through the AWS console.

CloudWatch pricing is straightforward. You pay per custom metric, per alarm, per API request, all on a monthly basis, and Basic Monitoring metrics (at five-minute frequency) for Amazon EC2 instances are free of charge, as are all metrics for Amazon EBS volumes, Elastic Load Balancers, and Amazon RDS DB instances.

Boundary

Boundary is a startup based out of San Francisco which takes an interesting approach to monitoring. Caveat, I am friends with a few of the folks there and very excited about what they’re doing.

Boundary is a hosted offering who’s goal is to provide in-depth application monitoring by looking at things from the point of view of the network. The service works by having you deploy an agent they call a meter to each your servers, they currently support a variety of Linux distributions and Window 2008 server. These meters send IPFIX data back to Boundary’s hosted platform, where the data is processed and stored.

The idea behind Boundary is that by looking at the data from the network, in real time, you’re able to quickly see patterns in the flow of data in and out of your infrastructure and between the tiers within it in a way that hasn’t been as easily done with traditional monitoring tools that are looking at OS metrics, SNMP data, etc. And by being able to annotate this data and monitor for changes, you can create a comprehensive and detailed real time and long term view into your infrastructure.

You’re then able to visualize your data in various ways, including annotating and overlaying it or adding your own custom data. You’re then able to have alerts sent by email natively or in a variety ways through their supported integration with PagerDuty. Some more details on how you’re able to use Boundary is laid out in their 30 Days to Boundary page.

Boundary’s pricing is simple. It starts with a Free plan that lets you store up to 2GB of metric data/day. Paid plans begin at $199 US/month for commercial support, higher daily storage limits, flexible long term data storage options.

Boundary even has a couple good resources focused on how they’re a good fit for when you’re using AWS, including a video, See Inside the Amazon EC2 Black Box and a PDF, Optimizing Performance in Amazon EC2, that are worth reviewing.

Datadog

Datadog is a hosted offering based out of New York City that aims to give you a unified place to store metrics, events and logs from your infrastructure and third party services, to visualize this data, and alert on it, as well as discuss and collaborate on this data.

Datadog works by having you installing an agent, which they currently support running on a variety of Linux distributions, Windows, OSX, and SmartOS. They also support integration with a variety of open source tools, applications, languages, some third party services.

Once you’ve installed the agent and configured your desired integations, you’ll begin seeing events and metrics flow into your account. You’re able to build your own agent based checks and service checks and do custom integration through a number of libraries.

From there you can beginning visualizing and using your data by making custom dashboards and then creating alerts which can be sent via email or in a variety ways through their supported integration with PagerDuty.

Datadog’s pricing starts with a free plan that includes 1 day retention and 5 hosts. Paid plans start at $15/host/month for up to 100 hosts with one year retention, alerts, and email support.

Sensu

Not everyone wants to utilize a hosted serviced and there are a number of open source tools for building your own monitoring solution.

The up and coming tool in this space is Sensu. Sensu is an open source project that is sponsored by Sonian and has a thriving developer and user community around it. Sensu was built by Sonian out of their need for a flexible solution that could handle how they dynamically scale their infrastructure tiers up and down on various public cloud providers, starting with AWS.

Sensu’s goal is to be a monitoring framework that let’s you build a scalable solution to fit your needs. It is built from the following components:

  • The server, which aggregates data from the clients
  • The clients, which run checks
  • The API service
  • RabbitMQ, which is the message bug that glues everything together

The various Sensu components are all written in Ruby and open source. Sensu supports running checks written in Ruby, as well as existing Nagios checks.

This excellent getting started post by Joe Miller sums Sensu up nicely.

Sensu connects the output from “check” scripts run across many nodes with “handler” scripts run on Sensu servers. Messages are passed via RabbitMQ. Checks are used, for example, to determine if Apache is up or down. Checks can also be used to collect metrics such as MySQL statistics. The output of checks is routed to one or more handlers. Handlers determine what to do with the results of checks. Handlers currently exist for sending alerts to Pagerduty, IRC, Twitter, etc. Handlers can also feed metrics into Graphite, Librato, etc. Writing checks and handlers is quite simple and can be done in any language.

Nagios

Nagios is the granddaddy of open source monitoring tools. It’s primarily a server service you run that it watches hosts and services that you specify, alerting you when things go bad and when they get better. It supports a variety of checks for most services and software, as well as the ability to write your own. You’ll find checks, scripts, and extensions for Nagios for pretty much anything you can think of.

For metrics and alerting there are good tools for integrating with Graphite and PagerDuty.

Conclusion

There are a number of hosted and open source solutions available to match your infrastructure’s monitoring needs. While this post hasn’t covered them all, I hope you’ve gotten a nice overview of some of options and some food thought when considering how to gather the metrics and data needed to run an application on AWS and stay alerted to the changes and incidents that are important to you.

If you’re interested in learning more about open source monitoring, you should watch The State of Open Source Monitoring by Jason Dixon.

If you’re interested in the future of monitoring, you should keep an eye on the upcoming Monitorama conference coming up in March 2013.