AWS Advent 2014 – SparkleFormation: Build infrastructure with CloudFormation without losing your sanity.

Today’s post on taming CloudFormation with SparkleFormation, comes to us from Cameron Johnston of Heavy Water Operations.

The source code for this post can be found at


This article assumes some familiarity with CloudFormation concepts such as stack parameters, resources, mappings and outputs. See the AWS Advent CloudFormation Primer for an introduction.

Although CloudFormation templates are billed as reusable, many users will attest that as these monolithic JSON documents grow larger, they become “all encompassing JSON file[s] of darkness,” and actually reusing code between templates becomes a frustrating copypasta exercise.

From another perspective these JSON documents are actually just hashes, and with a minimal DSL we can build these hashes programmatically. SparkleFormation provides a Ruby DSL for merging and compiling hashes into CFN templates, and helpers which invoke CloudFormation’s intrinsic functions (e.g. Ref, Attr, Join, Map).

SparkleFormation’s DSL implementation is intentionally loose, imposing little of its own opinion on how your template should be constructed. Provided you are already familiar with CloudFormation template concepts and some minimal ammount of Ruby, the rest is merging hashes.


Just as with CloudFormation, the template is the high-level object. In SparkleFormation we instantiate a new template like so:

But an empty template isn’t going to help us much, so let’s step into it and at least insert the requiredAWSTemplateFormatVersion specification:

In the above case we use the _set helper method because we are setting a top-level key with a string value. When we are working with hashes we can use a block syntax, as shown here adding a parameter to the top-level Parametershash that CloudFormation expects:


SparkleFormation provides primatives to help you build templates out of reusable code, namely:

  • Components
  • Dynamics
  • Registries


Here’s a component we’ll name environment which defines our allowed environment parameter values:

Resources, parameters and other CloudFormation configuration written into a SparkleFormation component are statically inserted into any templates using the load method. Now all our stack templates can reuse the same component so updating the list of environments across our entire infrastructure becomes a snap. Once a template has loaded a component, it can then step into the configuration provided by the component to make modifications.

In this template example we load the environment component (above) and override the allowed values for the environment parameter the component provides:


Where as components are loaded once at the instantiation of a SparkleFormation template, dynamics are inserted one or more times throughout a template. They iteratively generate unique resources based on the name and optional configuration they are passed when inserted.

In this example we insert a launch_config dynamic and pass it a config object containing a run list:

The launch_config dynamic (not pictured) can then use intrisic functions like Fn::Join to insert data passed in the config deep inside a launch configuration, as in this case where we want our template to tell Chef what our run list should be.


Similar to dynamics, a registry entry can be inserted at any point in a SparkleFormation template or dynamic. e.g. a registry entry can be used to share the same metadata between both AWS::AutoScaling::LaunchConfiguration and AWS::EC2::Instance resources.

Translating a ghost of AWS Advent past

This JSON template from a previous AWS Advent article provisions a single EC2 instance into an existing VPC subnet and security group:

Not terrible, but the JSON is a little hard on the eyes. Here’s the same thing in Ruby, using SparkleFormation:

Without taking advantage of any of SparkleFormation’s special capabilities, this translation is already a few lines shorter and easier to read as well. That’s a good start, but we can do better.

The template format version specification and parameters required for this template are common to any stack where EC2 compute resources may be used, whether they be single EC2 instances or Auto Scaling Groups, so lets take advantage of some SparkleFormation features to make them reusable.

Here we have a base component that inserts the common parameters into templates which load it:

Now that the template version and common parameters have moved into the new base component, we can make use of them by loading that component as we instantiate our new template, specifying that the template will override any pieces of the component where the two intersect.

Let’s update the SparkleFormation template to make use of the new base component:

Because the basecomponent includes the parameters we need, the template no longer explicitly describes them.

Advanced tips and tricks

Since SparkleFormation is Ruby, we can get a little fancy. Let’s say we want to build 3 subnets into an existing VPC. If we know the VPC’s /16 subnet we can provide it as an environment variable (export VPC_SUBNET=""), and then call that variable in a template that generates additional subnets:

Of course we could place the subnet and route table association resources into a dynamic, so that we could just call the dynamic with some config:

Okay, this all sounds great! But how do I operate it?

SparkleFormation by itself does not implement any means of sending its output to the CloudFormation API. In this simple case, a SparkleFormation template named ec2_example.rb is output to JSON which you can use with CloudFormation as usual:

The knife-cloudformation plugin for Chef’s knife command adds sub-commands for creating, updating, inspecting and destroying CloudFormation stacks described by SparkleFormation code or plain JSON templates. Using knife-cloudformation does not require Chef to be part of your toolchain, it simply leverages knife as an execution platform.

Advent readers may recall a previous article on strategies for reusable CloudFormation templates which advocates a “layer cake” approach to deploying infrastructure using CloudFormation stacks:

The overall approach is that your templates should have sufficient parameters and outputs to be re-usable across environments like dev, stage, qa, or prod and that each layer’s template builds on the next.

Of course this is all well and good, until we find ourselves, once again, copying and pasting. This time its stack outputs instead of JSON, but again, we can do better.

The recent 0.2.0 release of knife-cloudformation adds a new --apply-stack parameter which makes operating “layer cake” infrastructure much easier.

When passed one or more instances of --apply-stack STACKNAME, knife-cloudformation will cache the outputs of the named stack and use the values of those outputs as the default values for parameters of the same name in the stack you are creating.

For example, a stack “coolapp-elb” which provisions an ELB and an associated security group has been configured with the following outputs:

The values from the ElbName and ElbSecurityGroup would be of use to us in attaching an app server auto scaling group to this ELB, and we could use those values automatically by setting parameter names in the app server template which match the ELB stack’s output names:

Once our coolapp_asg template uses parameter names that match the output names from the coolapp-elb stack, we can deploy the app server layer “on top” of the ELB layer using --apply-stack:

Similarly, if we use a SparkleFormation template to build our VPC, we can set a number of VPC outputs that will be useful when building stacks inside the VPC:

This ‘apply stack’ approach is just the latest way in which the SparkleFormation tool chain can help you keep your sanity when building infrastructure with CloudFormation.

Further reading

I hope this brief tour of SparkleFormation’s capabilities has piqued your interest. For some AWS users, the combination of SparkleFormation and knife-cloudformation helps to address a real pain point in the infrastructure-as-code tool chain, easing the development and operation of layered infrastructure.

Here’s some additional material to help you get started:

AWS Advent 2014 – Integrating AWS with Active Directory

Today’s post on Integrating AWS with Active Directory comes to us from Roger Siggs, who currently helps architect clouds at DataLogix.


One of the most popular directory services available is Microsoft’s Active Directory. Active Directory serves as the authoritative system to coordinate access between users and their devices to other resources which could include internal applications, servers, or cloud-based systems and applications. The challenge with AD is not that it is inherently a bad piece of software, but more that the dynamics of the IT landscape have changed. In the past few years there has been a dramatic shift to cloud-based infrastructure. IT applications and an organization’s server infrastructure is increasingly based in the cloud. As a result, how does AD manage that remote infrastructure?

Previous security models were built with the idea of protecting the on-premises environment from outside attacks. How does an Active Directory based model support that model when the environment spans a far larger footprint? The second major trend is toward heterogeneous computing environments. AD was introduced to answer the problems of enterprises that were primarily Windows based.

That is no longer true with Macs and Linux devices infiltrating all sizes of organizations, not to mention tablets and phones. Trends such as these are significant issues and IT admins are struggling with how to leverage them with legacy software and infrastructure. A key part of that struggle is with how to connect and manage their employees and their devices and IT applications. AD doesn’t let them connect everything together – at least not without significant effort.  

Amazon Web Services offers several different methods to provision user access and permissions management through its Identity and Access Management (IAM) service but as a user repository it is lacking in many important features for larger enterprises. The need for more granular levels of access and control on the actual instance, as well as the need to connect and manage employees, devices, and legacy applications, often requires the use of an on-premises directory to provide a centralized and authoritative list of employees, their roles, and their access rights. For many organizations, this on-premises directory is Active Directory. To support their growing customer base, Amazon has released several different methods to integrate your existing directory services with AWS.

Integration Types

The ‘simplest’ form of leveraging AD services into AWS is to just extend your existing footprint into your Amazon environment. This answers some issues around latency for logins, as well as provides relatively quick and easy support for disaster recovery and scalability. Using Cloud-init and various configuration management tools (Puppet, Chef, Powershell, etc) instances can be deployed and automatically join an existing domain- centralizing user management at the instance level. A good working knowledge of AWS services (in particular security group configuration and DHCP Option Sets) is required to ensure replication and other AD specific functionality is supported. The AWS Reference Architecture available here provides much greater detail in both the exact process and step-by-step methodology for this type of integration. This method, while quick and fairly simple, does not allow for access to the back-end systems of AWS. Extending your AD infrastructure in this fashion replicates your existing management processes, but does not provide API or console access to AWS.

Another common form of Directory Service integration is an SSO-based, Federation model. Federation allows for delegated access to AWS resources using a 3rd party Authentication resource. With identity federation, external identities (federated users) are granted secure access to resources in your AWS account without having to create IAM users. These external identities can come from your corporate identity provider ( e.g. Active Directory) or from a web identity provider, such as Amazon Cognito, Login with Amazon, Facebook, Google or any OpenID Connect (OIDC) compatible provider. This allows for users to retain their existing set of usernames, passwords and authentication credentials, while still accessing the AWS resources they need to perform their roles. Depending upon the roles allowed to an authenticated user, this method can provide Console Access, API Access (through the STS GetFederationToken api call), and even Workspaces and Zocalo access.

Federation with Active Directory is configured using SAML (Security Assertion Markup Language) to create a connection between an Identity Provider (IDP), and a Service Provider (SP). In this instance, Active Directory is the IDP, and AWS the SP. This process, detailed below, allows for secure and granular access based on the requesting users role within the organization, and the capabilities that role is allowed within the AWS environment.




  1. The user browses to the internal federation resource server.

  2. If the user is a logged into a computer joined to the AD domain and their web browser supports Windows authentication, they will be authenticated using Windows integrated authentication. If the user is not logged into a computer joined to the domain, they will be prompted for their Windows username and password. The proxy determines the Windows username from the web request and uses this when making the session request.

After an AD user is authenticated by the proxy the following occurs:

  1. The proxy retrieves a list of the user’s AD group membership.

  2. The proxy retrieves IAM user credentials from a web configuration file (web.config) configured during setup. By default, the sample encrypts the secret access key using Windows Cryptographic Services. The proxy uses these credentials to call the ListRoles API requesting a list of all the IAM roles in the AWS account created during setup.

  3. The response includes a list of all the IAM roles available within the AWS account for the requesting user.

  4. The proxy determines user entitlements by taking the list of AD groups and the list of IAM roles and determines the intersection of these two lists based on name mapping. The proxy takes the intersection and populates a drop down box with all the available roles for the current user. The user selects the role they want to use while logging into the AWS management console. Note: if the user is not a member of any AD groups that match a corresponding IAM role, the user will be notified that no roles are available and access will be denied.

  5. Using the Amazon Resource Name (ARN) of the selected role, the proxy uses the credentials of the IAM user to makes an AssumeRole request. The request includes setting the ExternalId property using the security identifier (SID) of the AD group that matches the name of the role. This adds an additional layer of verification in event the AD group is ever deleted and recreated using the same display name. By default the expiration is set to the maximum of 3600 seconds.

  6. The proxy receives a session from Amazon Security Token Service (STS) that includes temporary credentials: access key, secret key, expiration and session token.

  7. (ADFS specific) The proxy uses the session token along with the SignInURL and ConsoleURL from the web configuration file (web.config) to generate a temporary sign-in url.

  8. Finally the user is redirected to the temporary sign-in url which automatically logs them into the AWS Management Console (or API session) is valid until the session expires.

Federation methods can become very complex, depending on the individual use case. Amazon has a large amount of documentation around this feature, but a good starting point is the IAM ‘Manage Federation’ topic available here.

The third method of Integration is using the new AWS Directory Service. This is a cloud-based, managed service that allows for a direct connection between your existing AD environment and your AWS resources. This service has two different directory types- the ‘AD Connector’ for existing systems; and the ‘SimpleAD’ directory type for new, cloud-only environments. The ‘AD Connector’ serves as a proxy between your on-premises infrastructure and AWS and eliminates the need for federation services. To use the AWS Directory Service, you must have AWS Direct Connect, or another secure VPN connection into an AWS VPC (Virtual Private Cloud). The AD Connector allows you to provision access to Amazon Workspaces, Amazon Zocalo, and to provide access to the AWS Console to existing groups in your Active Directory structure. Access is also automatically updated in the event of organizational changes (employee terminations, promotions, team changes) to your AD environment. Additionally, your existing security policies – password expiration, password history, account lockouts and the like are all enforced and managed from a central location.  


There are almost as many methods to address authentication and authorization needs as there are companies who need the problem resolved. With AWS, existing organizations have a number of resources available to custom tailor their hybrid infrastructure to meet the needs of their employees and customers moving forward, without sacrificing the security, stability, and governance that is the hallmark of an on-premises environment. This overview of the topic will hopefully provide some direction for IT Administrators looking to answer the question of how their identity management systems will bridge the gap between yesterday and today.

AWS Advent 2014 – Advanced Network Resilience in VPCs with Consul

Today’s post on building Advanced Network Resilience in AWS VPCs with Consul comes to us from Sam Bashton.

At my company, we’ve been using AWS + VPC for three years or so. On day one of starting to build out an infrastructure within it we sent an email to our Amazon contact asking for ‘a NAT equivalent of an Internet Gateway’ – an AWS managed piece of infrastructure that would do NAT for us. We’re still waiting.

In the mean time, we’ve been through a couple of approaches to providing network resilience for NAT devices. As we’re now using Consul for service discovery everywhere, when we came to re-visiting how to provide resilience at the network layer, it made sense for us to utilise the feature-set it provides.

Autoscaling NAT/bastion instances

For our application to function, it needs to have outbound Internet connectivity at all times. Originally, we provided for this by having one NAT instance per AZ, and having healthchecks fail if this was not available. This meant that a failed NAT instance took down a whole AZ – something that the infrastructure had been designed to cope with, but not ideal, as it meant losing half or a third of capacity until the machine was manually re-provisioned.

The approach I set out below allows us to have NAT provided by instances in an autoscaling group, with minimal downtime in the event of instance failure. This means we now don’t need to worry about machines ‘scheduled for retirement’, being able to terminate them at will.

In this example, we set up a three node consul cluster. One node will be elected as the NAT instance, and will take over NAT duties. A simplistic health check is provided to ensure this instance has Internet access; it sends a ping to and checks for a response. In the event of the node failing in any way, another will quickly step in and take over routing.

In practice, if you already have a consul cluster, you would only need two NAT instances to be running and retain fast failover.

You can try out this setup by using the CloudFormation template at

The template only has AMIs defined for us-west-2 and eu-west-1, so you’ll need to launch in one of those regions.

This setup relies on a python script ( ) as a wrapper around consul. It discovers the other nodes for consul to connect to via the AWS API, and uses consul session locking to get the cluster to agree on which machine should be the NAT device.

Hopefully this example gives you enough building blocks to go and implement something similar for your environment.

AWS Advent 2014 – An introduction to DynamoDB

Today’s awesome post on DynamoDB comes to use from Jharrod LaFon.

DynamoDB is a powerful, fully managed, low latency, NoSQL database service provided by Amazon. DynamoDB allows you to pay for dedicated throughput, with predictable performance for “any level of request traffic”. Scalability is handled for you, and data is replicated across multiple availability zones automatically. Amazon handles all of the pain points associated with managing a distributed datastore for you, including replication, load balancing, provisioning, and backups. All that is left is for you to take your data, and its access patterns, and make it work in the denormalized world of NoSQL.

Modeling your data

The single most important part of using DynamoDB begins before you ever put data into it: designing the table(s) and keys. Keys (Amazon calls them primary keys) can be composed of one attribute, called a hash key, or a compound key called the hash and range key. The key is used to uniquely identify an item in a table. The choice of the primary key is particularly important because of the way that Amazon stores and retrieves the data. Amazon shards (partitions) your data internally, based on this key. When you pay for provisioned throughput, that throughput is divided across those shards. If you create keys based on data with too little entropy, then your key values will be similar. If your key values are too similar, so that they hash to the same shard, then you are limiting your own throughput.

Choosing an appropriate key requires that you structure your DynamoDB table appropriately. A relational database uses a schema that defines the primary key, columns, and indexes. DynamoDB on the other hand, only requires that you define a schema for the keys. The key schema must be defined when you create the table. Individual parts of an item in DynamoDB are called attributes, and those attributes have data types (basic scalar types are Number, String, Binary, and Boolean). When you define a key schema, you specify which attributes to use for a key, and their data types.

DynamoDB supports two types of primary keys, a Hash Key and a Hash and Range Key.

  • Hash Key consists of a single attribute that uniquely identifies an item.
  • Hash and Range Key consists of two attributes that together, uniquely identify an item.


If you’ve spent some time with relational databases, then you have probably heard of normalization, which is the process of structuring your data to avoid storing information in more than one place. Normalization is accomplished by defining storing data in separate tables and then defining relationships between those tables. Data retrieval is possible because you can join all of that data using the flexibility of a query language (such as SQL).

DynamoDB, being a NoSQL database, and therefore does not support SQL. So instead of normalizing our data, we denormalize it (and eliminate the need to join). A full discussion of denormalizing is beyond the scope of this introductory tutorial, but you can read more about it in DynamoDB’s developer guide.

Accessing data

Performing operations in DynamoDB consumes throughput (which you pay for), and so you should structure your application with that in mind. Individual items in DynamoDB can be retrieved, updated, and deleted. Conditional updates are also supported, which means that the write or update only succeeds if the condition specified is successful. Operations can also be batched for efficiency.

Two other batch operations are also supported, scan and query. A query returns items in a table using a primary key value, and optionally using a range key value or condition. A scan operation examines every item in a table, optionally filtering items before returning them.


Items are accessed using their primary key, but you can also use indexes. Indexes provide an alternative (and performant) way up accessing data. Each index has its own primary key and that key is used when performing index lookups. Tables can have multiple indexes, allowing your application to retrieve table data according to its needs. DynamoDB supports two types of indexes.

  • Local secondary index: An index that uses the table’s Hash Key, but can use an alternate range key. Using these indexes consumes throughput capacity from the table.
  • Global secondary index: An index that uses a Hash and Range Key that can be different from the table’s. These indexes have their own throughput capacity, separate from the table’s.

A global secondary indexes is called global because it applies to the entire table, and secondary because the first real index is the primary hash key. In contrast, local secondary indexes are said to be local to a specific hash key. In that case you could have multiple items with the same hash key, but different range keys, and you could access those items using only the hash key.


DynamoDB is a distributed datastore, storing replicas of your data to ensure reliability and durability. Synchronizing those replicas takes time, and may not always be immediately necessary. Because of this, DynamoDB allows the user to specify the desired consistency for reading data. There are two types of consistency available.

  • Eventually consistent reads: This is better for read throughput, but you might read stale data.
  • Strongly consistent reads: Used when you absolutely need the latest result.


DynamoDB is a great service, but it does have limits.

  • Individual items cannot exceed 400kb.
  • Tables cannot exceed 10Gb.
  • If you exceed for provisioned throughput, your requests may be throttled.

A simple example

To help understand how to use DynamoDB, let’s look at an example. Suppose that you wanted to store web session data in DynamoDB. An item in your table might have the following attributes.

Attribute NameData Typesession_idStringuser_idNumbersession_dataStringlast_updatedStringcreatedString

In this example, each item consists of a unique session ID, an integer user ID, the content of the session as a String, and timestamps for the creation of the session and the last time it was updated.

The simplest way to access our table is to use Hash Key. We can use the session_id attribute for theHash Key because it is unique. To look up any session in our session table, we can retrieve it by using the session_id.

Accessing DynamoDB

DynamoDB is provided as an HTTP API. There are multiple libraries that provide a higher level abstraction over the HTTP API, and in many different languages. Python is my primary language, and so I’ll be using Python for this tutorial. Amazon has created boto, their official Python interface. For this example however, I will be using PynamoDB, which has succinct, ORM like syntax (disclaimer: I wrote PynamoDB).

Installing PynamoDB

You can install PynamoDB directly from PyPI.

Specifying your table and key schema

PynamoDB allows you to specify your table attributes and key schema by defining a class with attributes.

The Session class defined above specifies the schema of our table, and its primary key. In just a few lines of code we’ve defined the attributes for a session item as discussed in the table above. PynamoDB provides customized attributes such as the UTCDateTimeAttribute as a convenience, which stores the timestamp as a String in DynamoDB.

The Meta class attributes specify the name of our table, as well as our desired capacity. With DynamoDB, you pay for read and write capacity (as well as data storage), so you need to decide how much capacity you want initially. It’s not fixed however, you can always scale the capacity up or down for your table using the API. It’s worth reading more about capacity in the offical documentationif you plan on using DynamoDB for your project.

Creating a table

Now that we know how our data is structured, and what type of key we will use, let’s create a table.

Tables created in DynamoDB may not be available for use immediately (as Amazon is provisioning resources for you) and the wait argument above specifies that we would like the function to block until the table is ready.

Reading & Writing Data

Now that we have a table we can use, let’s store a session in it.

Our session is now saved in the fully managed DynamoDB service, and can be retrieved just as easily.

A less simple example

Building upon the previous example, let’s make it more useful. Suppose that in additon to being able to retrieve an individual session, you wanted to be able to retrieve all sessions belonging to a specific user. The simple answer is to create a global secondary index.

Here is how we can define the table with a global secondary index using PynamoDB.

This might seem complicated, but it really isn’t. The Session class is defined as before, but with an extra user_index attribute. That attribute is defined by the UserIndex class, which defines the key schema of the index as well as the throughput capacity for the index.

We can create the table, with its index, just as we did previously.

Now, assuming that our table has data in it, we can use the index to query every session for a given user.


DynamoDB isn’t perfect, but it is a great service if you need a scalable, highly available NoSQL database, without having to manage any of it yourself. This tutorial shows how easy it is to get started, but there is much more to it than what is mentioned here.

If you are interested, you should definitely check out these awesome resources.

AWS Advent 2014 – Using IAM to secure your account and resources

Today’s AWS Advent post comes to us from Craig Bruce.

AWS Identity and Access Management (IAM) is a service from AWS to aid you in securing your AWS resources. This is accomplished by creating users and roles with specific permissions for both API endpoints and AWS resources.

Ultimately, IAM is a security tool and like all security tools there is a balance of security and practicality (no one wants to enter an MFA code for every single API request). IAM is an optional and free service, but users that do not use IAM have been bitten – most recently BrowserStack (see here). If BrowserStack had followed the IAM best practices it could of avoided this incident.

This article will cover a few areas of IAM.

  • Best practices. There is really no excuse not to follow these.
  • Where your user identities can originate from. AWS offer multiple options now.
  • Various tips and tricks for using IAM.

Best practices

IAM has a best practice guide which is easy to implement. Also when you access IAM via the AWS management console it highlights some best practices and if you have implemented them.

Here is a brief summary of the IAM best practices:

  • Lock away your AWS account access keys

    Ideally just delete the root access keys. Your root user (with a physical MFA) is only required to perform IAM actions via the AWS management console. Create power users for all other tasks, everything except IAM.

  • Create individual IAM users

    Every user has their own access keys/credentials.

  • Use groups to assign permissions to IAM users

    Even if you think a policy is just for one user, still make a group. You’ll be amazed how quickly user policies become forgotten.

  • Grant least privilege

    If a user asks for read access to a single S3 bucket, do not grant s3:* on all resources, be specific with s3:GetObject and select the specific resource. It is easy to add further access later than restrict from wildcard.

  • Configure a strong password policy for your users

    Do you users even need access to the AWS Management console? If they do make sure the passwords are strong (and preferably stored in a password manager, not a post-it).

  • Use roles for applications that run on Amazon EC2 instances

    Roles on EC2 remove the need for ever, ever including access keys on the instance or in code (which can all too easily end up in version control). Roles let you give the same permissions as a user but AWS rotates the keys three times a day. All AWS SDK’s can obtain credentials from the instance meta-data, you do not need any extra code.

  • Delegate by using roles instead of by sharing credentials

    If you have multiple AWS accounts (common in larger companies) you can authenticate users from the other account to use your resources.

  • Rotate credentials regularly

    For users with access keys rotate them. This is a manual step, but you can have two active keys per user to enable a more seamless transition.

  • Remove unnecessary credentials

    If a user has left, or access requirements change delete the user, alter group memberships and edit the policies by group (so much easier when all your policies are in groups, not users).

  • Use policy conditions for extra security

    Conditions can include specific IP ranges or authenticated via MFA requirements. You can apply these to specific actions to ensure they are only performed from behind your corporate firewall, for example.

  • Keep a history of activity in your AWS account

    CloudTrail is a separate service but having a log of all API access (which includes IAM user information) is incredibly useful for an audit log and even debugging issues with your policies.

Federated users

The default in IAM is to create a new user, which is internal to AWS. You can use this user for any part of AWS. When an IAM user logs in they get a special login screen (via a special URL) to provide their username/password (not email address like the root account). To provide flexibility IAM can utilize 3rd party services for identity, for example Amazon/Facebook/Google (and other SAML providers). Another recent product is AWS Directory Service which lets you use your on-premise corporate identity (Microsoft Active Directory for example) as your identity provider. For mobile applications you should explore Amazon Cognito as this is especially designed for mobile and includes IAM integration. Regardless of your identity source, IAM is still core to managing the access to your AWS resources.

General tips

MFA (multi factor authentication) is available with IAM and highly recommended. One approach you could adopt is:

  • Physical for the root account, they are not expensive.
  • Power users (what ever you define as a power user) use a virtual MFA (like Google Authenticator or Duo Security).
  • Users will less potential destructive access have no MFA.

A power user could have access to everything except IAM, as shown below. Learn more about the policy grammar, which is JSON based, on this blog post.

Advanced policies include the use of resource-level permissions and tags, a good example from the EC2 documentation looks like:

This policy allows the user to describe any EC2 instance, to stop or start two instances (i-123abc12and i-4c3b2a1) and to terminate any instance with the tag purpose set to test. This is particularly useful if you want to restrict your users to your development EC2 instances, but not have access to your production instances. Resource-level permissions often a great deal of flexibility in your policies. A combination of tags and resource-level permissions are AWS preferred approach to writing these more complex policies.

While the policies can get complex here are some final tips:

  • When writing your policies AWS provides templates which can be a good starting place.
  • The IAM Policy Simulator is very handy at testing your policies.
  • Changes to IAM can take a few minutes to propagate, but IAM is not a service you should be changing constantly.
  • 3rd party resources that require access to your AWS resources should each use their own IAM account and access keys.
  • Use IAM in preference to S3 ACL or bucket policies (although there are specific exceptions – such as CloudFront access)
  • IAM support is not complete across all AWS products, get the latest information here.


IAM is a powerful service that can assist you manage and restrict access to your AWS resources. While the initial setup can be tricky getting the correct policies, once saved as groups you will be set. Time invested in IAM now could save you from an embarrassing situation later. Hopefully this article touches on the various aspects of IAM and some are directly appropriate for your use case.

AWS Advent Day – Deploying SQL Server in AWS

Our third AWS Advent post comes to us by way of Jeremiah Peschka. You can find him fiddling with databas for BrentOzer

For one reason or another, you’re deploying SQL Server into AWS. There are a few different ways to think about deploying SQL Server and a few concerns that you have to address for each one. This doesn’t have to be difficult and, in some ways, it’s a lot easier than buying a physical server.

We’re going to take a look at three ways to deploy SQL Server in AWS. In two situations we’ll look at renting the licensing from Amazon and in two situations we’ll look at running our own instances. There’s some overlap here, but rest assured that’s a good thing.

SQL Server as a Service

One of the easiest ways to run SQL Server in AWS is to not run it at all. Or, at least, to make AWS run it for you. Amazon have a hosted database as a service product – Amazon RDS.

Benefits of SQL Server RDS

Good operational database administrators are hard to come by. SQL Server RDS doesn’t provide a good database administrator, but it does turn a large portion of database administration into a service.

Amazon provides:

  • An operating system and SQL Server
  • Automated configuration tools
  • Regular backups
  • The ability to clone a database
  • High availability (if you check the box)

In addition, you can provision new SQL Servers in response to customer demand as needed. The ability to rapidly spin up multiple SQL Server installations can’t be understated – new SQL Servers on demand is critical for multi-tenant companies. Abstracting away the creation, patching, and other operational tasks is a boon for small companies without experienced DBAs.

The Downside of SQL Server RDS

It’s easy to think that getting someone else to handle SQL Server is the way to go. After all, Amazon is responsible for just about everything apart from your code, right?

They’re not. While AWS is responsible for a lot of plumbing, you’re still responsible for writing software, designing data structures, monitoring SQL Server performance, and performing capacity planning.

Even worse, you’re responsible for the maintenance of all of this functionality and making sure that your index structures are free of fragmentation and corruption. It is still necessary for someone to set up jobs to monitor and address:

  • Index fragmentation
  • Database corruption

Even though AWS is doing some of the work, there’s still a lot left to do.

AWS Licensing

It’s possible to rent your licensing from AWS. This happens with SQL Server RDS, but it’s still possible to rent licensing if you’re using SQL Server Standard Edition. For many companies, this is an easy way to get into SQL Server licensing. AWS can offer a competitive price.

For teams who don’t need Enterprise Edition features, renting the licenses from AWS is an easy on-ramp. SQL Server Standard Edition supports a number of high availability features that are good enough for most applications. Many AWS instance sizes are small enough that the limitations of SQL Server Standard Edition – 16 cores and 64GB of memory (128GB for SQL server 2014) – isn’t a limitation at all.

Enterprise Edition and AWS

Sometimes you need more than 128GB of memory. Or more than 32 cores. In these case, you can buy your own licensing for SQL Server Enterprise Edition. Although it’s expensive, this is the only way to take advantage of the larger AWS instance types with their high core counts and reasonably large volumes of memory.

Many aspects of SQL Server Enterprise Edition are the same as they’d be for a physical SQL Server. The most important thing to realize is that this giant SQL Server is subject to some very finite scaling limitations – AWS doesn’t always have the fastest CPUs and the maximum instance sizes are limited. Scaling up indefinitely isn’t always an option. DBAs need to carefully watch the CPU utilization of different SQL Server features.

Embrace the limitations of AWS hardware and use that to guide your tuning efforts.

Lessons Learned

Follow a Set Up Guide

No, really. Do it. My coworkers and I maintain a SQL Server Setup Checklist. Don’t deploy SQL Server without one.

Script Everything

I can’t stress it enough – script everything.

Instance spin up time in AWS is fast enough that it makes sense to have a scripted installation process. Whether you’re scripting the configuration of an RDS SQL Server or you’re installing SQL Server on your own VMs, the ability to rapidly configure new instances is powerful.

Script your SQL Server setup and keep it in version control.

Use Solid State

  1. Don’t go cheap out and use rotational storage. 
    Just stop it. Even if you’re hosting a data warehouse, it’s not worth it. The throughput available from AWS SSDs is more than worth it.
  2. Use the local SSDs. 
    The ephemeral SSDs won’t last between reboots, but they are still local SSDs. SQL Server can take advantage of low latency drives for temporary workloads. Careful configuration makes it possible to house the tempdb database on the local AWS SSDs. Since tempdb is recreated on every boot, who cares if it goes away on instance restart?

Plan Carefully

Measure your storage needs in advance. In How Much Memory Does SQL Server Need?, I did some digging to help DBAs figure out the amount of I/O that SQL Server is performing. Measuring disk use (both in IOPS and throughput) will help your capacity planning efforts. It’s okay to get this wrong, most of us do.

SQL Server tracks enough data that you can make an educated guess about IOPS, throughput, and future trends. Just make sure you look at the metrics you’re given, figure out the best route forward, and have plans in place to deal with a mistake in your capacity planning.

Hardware is Limited

It’s easy enough to buy a big 4 or 8 socket with terabytes of RAM. But that’s not possible in AWS. Embrace limitations.

Instead of scaling up SQL Server as high as the budget will allow, embrace the constraints of AWS hardware. Spin off different services into other AWS services or bring your services into the mix. Don’t think of it as abandoning SQL Server, think of it as scaling your architecture. By moving away from a monolithic database server, you’re able to scale individual portions of the application separately.

SQL Server Full Text Search is one example of a feature that can be scaled out. Finding the resources to create a full text search server using ElasticSearch or SOLR can be difficult in a physical data center. With AWS, you can spin up a new VM, with software installed, in a few minutes and be ready to index and query data.

Licensing is Tricky

Starting in SQL Server 2012, licensing became core based, rather than socket based. And, because of how Microsoft licenses SQL Server in virtual environments, those core based licenses may not be what you thought they are. Check with Amazon or your licensing reseller to make sure you’re licensed correctly.

Wrapping Up

There’s no reason to fear deploying SQL Server in AWS, or any cloud provider. For many applications, SQL Server RDS fits the bill. The more customized your deployment, the more likely you are to need SQL Server in an EC2 instance. As long as you keep these guidelines in mind, you’re likely to be successful.

AWS Advent 2014 – CoreOS and Kubernetes on AWS

Our second AWS Advent Post comes to us from Tim Dysinger. He walks us through exploring CoreOS and Kubernetes.

There’s a copy of the source and example code from this post on Github

What’s a CoreOS?

CoreOS is a fork of CrOS, the operating system that powers Google Chrome laptops. CrOS is a highly customized flavor of Gentoo that can be entirely built in one-shot on a host Linux machine. CoreOS is a minimal Linux/Systemd opperating system with no package manager. It is intended for servers that will be hosting virtual machines.

CoreOS has “Fast Patch” and Google’s Omaha updating system as well as CoreUpdate from the CoreOS folks. The A/B upgrade system from CrOS means updated OS images are downloaded to the non-active partition. If the upgrade works, great! If not, we roll back to the partition that still exists with the old version. CoreUpdate also has a web interface to allow you to control what gets updated on your cluster & when that action happens.

While not being tied specifically to LXC, CoreOS comes with Docker “batteries included”. Docker runs out of the box with ease. The team may add support for an array of other virtualization technologies on Linux but today CoreOS is known for it’s Docker integration.

CoreOS also includes Etcd, a useful Raft-based key/value store. You can use this to store cluster-wide configuration & and to provide look-up data to all your nodes.

Fleet is another CoreOS built-in service that can optionally be enabled. Fleet takes the systemd and stretches it so that it is multi-machine aware. You can define services or groups of services in a systemd syntax and deploy them to your cluster.

CoreOS has alpha, beta & stable streams of their OS images and the alpha channel gets updates often. The CoreOS project publishes images in many formats, including AWS images in all regions. They additionally share a ready-to-go basic AWS CloudFormation template from their download page.


Today we are going to show how you can launch Google’s Kubernetes on Amazon using CoreOS. In order to play along you need the following checklist completed:

  • AWS account acquired
  • AWS_ACCESS_KEY_ID environment variable exported
  • AWS_SECRET_ACCESS_KEY environment variable exported
  • AWS_DEFAULT_REGION environment variable exported
  • Amazon awscli tools installed
  • JQ CLI JSON tool installed

You should be able to execute the following, to print a list of your EC2 Key-Pairs, before continuing:

CoreOS on Amazon EC2

Let’s launch a single instances of CoreOS just so we can see it work by itself. Here we create a small a YAML file for AWS ‘userdata’. In it we tell CoreOS that we don’t want automatic reboot with an update (we may prefer to manage it manually in our prod cluster. If you like automatic then don’t specify anything & you’ll get the default.)

Our super-basic cloud-config.yml file looks like so:

Here we use ‘awscli’ to create a new Key-Pair:

We’ll also need a security group for CoreOS instances:

Let’s allow traffic from our laptop/desktop to SSH:

Now let’s launch a single CoreOS Amazon Instance:

Running a Docker Instance The Old Fashioned Way

Login to our newly launched CoreOS EC2 node:

Start a Docker instance interactively in the foreground:

OK. Now terminate that machine (AWS Console or CLI). We need more than just plain ol’ docker. To run a cluster of containers we need something to schedule & monitor the containers across all our nodes.

Starting Etcd When CoreOS Launches

The next thing we’ll need is to have etcd started with our node. Etcd will help our nodes with cluster configuration & discovery. It’s also needed by Fleet.

Here is a (partial) Cloud Config userdata file showing etcd being configured & started:

You need to use a different discovery URL (above) for every cluster launch. This is noted in the etcd documentation. Etcd uses the discovery URL to hint to nodes about peers for a given cluster. You can (and probably should if you get serious) run your own internal etcd cluster just for discovery. Here’s the project page for more information on etcd.

Starting Fleetd When CoreOS Launches

Once we have etcd running on every node we can start up Fleet, our low-level cluster-aware systemd coordinator.

We need to open internal traffic between nodes so that etcd & fleet can talk to peers:

Let’s launch a small cluster of 3 coreos-with-fleet instances:

Using Fleet With CoreOS to Launch a Container

Starting A Docker Instance Via Fleet

Login to one of the nodes in our new 3-node cluster:

Now use fleetctl to start your service on the cluster:

NOTE: There’s a way to use the FLEETCTL_TUNNEL environment variable in order to use fleetctl locally on your laptop/desktop. I’ll leave this as a viewer exercise.

Fleet is capable of tracking containers that fail (via systemd signals). It will reschedule a container for another node if needed. Read more about HA services with fleet here.

Registry/Discovery feels a little clunky to me (no offense CoreOS folks). I don’t like having to manage separate “sidekick” or “ambassador” containers just so I can discover & monitor containers. You can read more about Fleet discovery patterns here.

There’s no “volume” abstraction with Fleet. There’s not really a cohesive “pod” definition. Well there is a way to make a “pod” but the config would be spread out in many separate systemd unit files. There’s no A/B upgrade/rollback for containers (that I know of) with Fleet.

For these reasons, we need to keep on looking. Next up: Kubernetes.

What’s Kubernetes?

Kubernetes is a higher-level platform-as-service than CoreOS currently offers out of the box. It was born out of the experience of running GCE at Google. It still is in it’s early stages but I believe it will become a stable useful tool, like CoreOS, very quickly.

Kubernetes has an easy-to-configure “Pods” abstraction where all containers that work together are defined in one YAML file. Go get some more information here. Pods can be given Labels in their configuration. Labels can be used in filters & actions in a way similar to AWS.

Kubernetes has an abstraction for volumes. These volumes can be shared to Pods & containers from the host machine. Find out more about volumes here.

To coordinate replicas (for scaling) of Pods, Kubernetes has the Replication Controller that coordinates maintaining N Pods in place on the running cluster. All of the information needed for the Pod & replication is maintained in the configuration for replications controllers. To go from 8 replicates to 11 is just increment a number. It’s the equivalent of AWS AutoScale groups but for Docker Pods. Additionally there are features that allow for rolling upgrades of a new version of a Pod (and the ability to rollback an unhealthy upgrade). More information is found here.

Kubernetes Services are used to load-balance across all the active replicates for a pod. Find more information here.

A Virtual Network for Kubernetes With CoreOS Flannel

By default an local private network interface (docker0) is configured for Docker guest instances when Docker is started. This network routes traffic to & from the host machine & all docker guest instances. It doesn’t route traffic to other host machines or other host machine’s docker containers though.

To really have pods communicating easily across machines, we need a route-able sub-net for our docker instances across the entire cluster of our Docker hosts. This way every docker container in the cluster can route traffic to/from every other container. This also means registry & discovery can contain IP addresses that work & no fancy proxy hacks are needed to get from point A to point B.

Kubernetes expects this route-able internal network. Thankfully the people at CoreOS came up with a solution (currently in Beta). It’s called “Flannel” (formally known as “Rudder”).

To enable a Flannel private network just download & install it on CoreOS before starting Docker. Also you must tell Docker to use the private network created by flannel in place of the default.

Below is a (partial) cloud-config file showing fleetd being downloaded & started. It also shows a custom Docker config added (to override the default systemd configuration for Docker). This is needed to use the Flannel network for Docker.

Flannel can be configured to use a number of virtual networking strategies. Read more about flannel here.

Adding Kubernetes To CoreOS

Now that we have a private network that can route traffic for our docker containers easily across the cluster, we can add Kubernetes to CoreOS. We’ll want to follow the same pattern for cloud-config of downloading the binaries that didn’t come with CoreOS & adding systemd configuration for their services.

The download part (seen 1st below) is common enough to reuse across Master & Minion nodes (The 2 main roles in a Kubernetes cluster). From there the Master does most of the work while the Minion just runs kube-kublet|kube-proxy & does what it’s told.

Download Kubernetes (Partial) Cloud Config (both Master & Minion):

Master-Specific (Partial) Cloud Config:

Minion-Specific (Partial) Cloud Config:


Kube-Register bridges discovery of nodes from CoreOS Fleet into Kubernetes. This gives us no-hassle discovery of other Minion nodes in a Kubernetes cluster. We only need this service on the Master node. The Kube-Register project can be found here. (Thanks, Kelsey Hightower!)

Master Node (Partial) Cloud Config:

All Together in an AWS CFN Template with AutoScale

Use this CloudFormation template below. It’s a culmination of the our progression of launch configurations from above.

In the CloudFormation template we add some things. We add 3 security groups: 1 Common to all Kubernetes nodes, 1 for Master & 1 for Minion. We also configure 2 AutoScale groups: 1 for Master & 1 for Minion. This is so we can have different assertions over each node type. We only need 1 Master node for a small cluster but we could grow our Minions to, say, 64 without a problem.

I used YAML here for reasons: 1. You can add comments at will (unlike JSON). 2. It converts to JSON in a blink of an eye.

Converting To JSON Before Launch

If you have another tool you prefer to convert YAML to JSON, then use that. I have Ruby & Python usually installed on my machines from other DevOps activities. Either one could be used.

Launching with AWS Cloud Formation

SSH into the master node on the cluster:

We can still use Fleet if we want:

But now we can use Kubernetes also:

Looks something like this: img

Here’s the Kubernetes 101 documentation as a next step. Happy deploying!

Cluster Architecture

Just like people organizations, these clusters change as they scale. For now it works to have every node run etcd. For now it works to have a top-of-cluster master that can die & get replaced inside 5 minutes. These allowances work in the small scale.

In the larger scale, we may need a dedicated etcd cluster. We may need more up-time from our Kubernetes Master nodes. The nice thing about our using containers is that re-configuring things feels a bit like moving chess pieces on a board (not repainting the scene by hand).

Personal Plug

I’m looking for contract work to fill the gaps next year. You might need help with Amazon (I’ve using AWS FT since 2007), Virtualization or DevOps. I also like programming & new start-ups. I prefer to program in Haskell & Purescript. I’m actively using Purescript with Amazon’s JS SDK (& soon with AWS Lambda). If you need the help, let’s work it out. I’m @dysinger on twitter, dysinger on IRC or send e-mail to tim on the domain

P.S. You should really learn Haskell. 🙂