Protecting AWS Credentials

15. December 2016 2016 0

Author: Brian Nuszkowski
Editors: Chris Henry, Chris Castle

AWS provides its users with the opportunity to leverage their vast offering of advanced data center resources via a tightly integrated API. While the goal is to provide easy access to these resources, we must do so with security in mind. Peak efficiency via automation is the pinnacle of our industry. At the core of our automation and operation efforts lie the ‘keys’ to the kingdom. Actually, I really do mean keys; access keys. As an AWS Administrator or Power User, we’ve probably all used them, and there is probably at least 1 (valid!) forgotten copy somewhere in your home directory. Unauthorized access to AWS resources via compromised access keys usually occurs via:

  • Accidental commit to version control systems
  • Machine compromise
  • Unintentional sharing/capturing during live demonstrations or recordings

Without the proper controls, if your credentials are obtained by an unauthorized party, they can be used by anyone with internet access. So, we’ll work to transform how we look at our access keys, by treating them less as secrets that we guard with great care, and more like disposable items. We’ll do that by embracing Multi-factor Authentication (MFA, but also referred to as Two Factor Authentication or 2FA).

In this scenario, we’re looking to protect IAM users who are members of the Administrators IAM group. We’ll do this by:

  1. Enabling MFA for IAM Users
  2. Authoring and applying an MFA Enforced IAM Policy
  3. Leveraging the Security Token Service to create MFA enabled credentials

1. Enable MFA for applicable IAM Users

This can be done by adding a Multi-Factor Authentication Device in each user’s Security Credentials section. I prefer Duo Mobile, but any TOTP application will work. Your MFA device will be uniquely identified by it’s ARN and will look something like: arn:aws:iam::1234567889902:mfa/hugo

2. Apply an MFA Enforced Policy

Create a Managed or Inline policy using the json above and attach it to the IAM User or Group whose credentials you wish to protect. This IAM policy above allows all actions against any resource if the request’s credentials are labeled as having successfully performed MFA.

3. Create MFA Enabled Credentials via the Security Token Service

Now that you’re enforcing MFA for API requests via Step 2, your existing access keys are no longer primarily used for making requests. Instead, you’ll use these keys in combination with your MFA passcode to create a new set of temporary credentials that are issued via the Security Token Service.

The idea now is to keep your temporary, priviliged credentials valid for only as long as you need them. e.g. The life of an administrative task or action. I like to recommend creating credentials that have a valid duration of less than or equal to 1 hour. Shrinking the timeframe for which your credentials are valid, limits the risk of their exposure. Credentials that provide administrative level privileges on Friday, from 10am to 11am, aren’t very useful to an attacker on Friday evening.

To create temporary credentials, you reference the current Time Based One Time Passcode (TOTP) in your MFA application and perform either of the following operations:

mfa

3a. Use a credential helper tool such as aws-mfa to fetch and manage your AWS credentials file
3b. If you’re an aws cli user, you can run:

aws sts get-session-token --duration-seconds 3600 --serial-number <ARN of your MFA Device> --token-code 783462 and using its output, manually update your AWS credentials file or environment variables.

3c. Write your own application that interfaces with STS using one of AWS’s SDKs!

AWS Console and MFA

Implementing MFA for console usage is a much simpler process. By performing Step 1, the console automatically prompts for your MFA passcode upon login. Awesome, and easy!

Service Accounts

There are scenarios where temporary credentials do not fit the workload of long running tasks. Having to renew credentials every 60 minutes for long-running or recurring automated processes seems highly counterintuitive. In this case, it’s best to create what I like to call an IAM Service Account. An IAM Service Account is just a normal IAM User, but it’s functionally used by an application or process, instead of a human being. Because the service account won’t use MFA, you’ll want to reduce the risk associated to its credentials in the event of their exposure. You do this by combining a least privilege policy, meaning only give access to what’s absolutely necessary, with additional controls, such as source IP address restrictions.

An example Service Account IAM Policy that only allows EC2 instance termination from an allowed IP address range.

MFA Protection on Identity Providers and Federation

While AWS offers MFA Protection for Cross-Account Delegation, this only applies to requests originating from an AWS account. AWS does not have visibility into the MFA status of external identity providers (IdP). If your organization uses an external Identity Provider to broker access to AWS, either via SAML or a custom federation solution, it is advised that you implement a MFA solution, such as Duo, in your IdP’s authentication workflow.

Stay safe, have fun, and keep building!

About the Author

Brian Nuszkowski (@nuszkowski) is a Software Engineer at Uber’s Advanced Technologies Center. He is on the organizing committee for DevOpsDays Detroit and has spoken at several conferences throughout North America such as DevOps Days Austin, Pittsburg, Toronto, and O’Reilly’s Velocity conference in New York.

About the Editors

Chris Henry is a technologist who has devoted his professional life to building technology and teams that create truly useful products. He believes in leveraging the right technologies to keep systems up and safe, and teams productive. Previously, Chris led technology teams at Behance, the leading online platform for creatives to showcase and discover creative work, and later at Adobe, which acquired Behance in 2012. A large part of his time has been spent continually examining and improving systems and processes with the goal of providing the best experience to end users. He’s currently building IssueVoter.org, a simple way to send Congress opinions about current legislature and track their results. He occasionally blogs at http://chr.ishenry.com about tech, travel, and his cat.

Chris Castle is a Delivery Manager within Accenture’s Technology Architecture practice. During his tenure, he has spent time with major financial services and media companies. He is currently involved in the creation of a compute request and deployment platform to enable migration of his client’s internal applications to AWS.


Serverless everything: One-button serverless deployment pipeline for a serverless app

14. December 2016 2016 0

Author: Soenke Ruempler
Editors: Ryan S. Brown

Update: Since AWS recently released CodeBuild, things got much simpler. Please also read my follow-up post AWS CodeBuild: The missing link for deployment pipelines in AWS.

Infrastructure as Code is the new default: With tools like Ansible, Terraform, CloudFormation, and others it is getting more and more common. A multitude of services and tools can be orchestrated with code. The main advantages of automation are reproducibility, fewer human errors, and exact documentation of the steps involved.

With infrastructure expressed as code, it’s not a stretch to also want to codify deployment pipelines. Luckily, AWS has it’s own service for that named CodePipeline, which in turn can be fully codified and automated by CloudFormation (“Pipelines as Code”).

This article will show you how to create a deploy pipeline for a serverless app with a “one-button” CloudFormation template. The more concrete goals are:

  • Fully serverless: neither the pipeline nor the app itself involves server, VM or container setup/management (and yes, there are still servers, just not managed by us).
  • Demonstrate a fully automated deployment pipeline blueprint with AWS CodePipeline for a serverless app consisting of a sample backend powered by the Serverless framework and a sample frontend powered by “create-react-app”.
  • Provide a one-button quick start for creating deployment pipelines for serverless apps within minutes. Nothing should be run from a developer machine, not even an “inception script”.
  • Show that it is possible to lower complexity by leveraging AWS components so you don’t need to configure/click third party providers (e.g. TravisCi/CircleCi) as pipeline steps.

We will start with a repository consisting of a typical small web application with a front end and a back end. The deployment pipeline described in this article makes some assumptions about the project layout (see the sample project):

  • a frontend/ folder with a package.json which will produce a build into build/ when npm run build is called by the pipeline.
  • a backend/ folder with a serverless.yml. The pipeline will call the serverless deploy (the Serverless framework). It should have at least one http event so that the Serverless framework creates a service endpoint which can then be used in the frontend to call the APIs.

For a start, you can just clone or copy the sample project into your own GitHub account.

As soon as you have your project ready, we can continue to create a deployment pipeline with CloudFormation.

The actual CloudFormation template we will use here to create the deployment pipeline does not reside in the project repository. This allows us to develop/evolve the pipeline and the pipeline code and the projects using the pipeline independent from each other. It is published to an S3 bucket so we can build a one-click launch button. The launch button will direct users to the CloudFormation console with the URL to the template prefilled:

Launch Stack

After you click on the link (you need to be logged in into the AWS Console), and click “Next” to confirm that you want to use the predefined template, some CloudFormation stack parameters have to be specified:CloudFormation stack parameters

First you need to specify the GitHub Owner/Repository of the project (the one you copied earlier), a branch (usually master) and a GitHub Oauth Token as described in the CodePipeline documentation.

The other parameters specify where to find the Lambda function source code for the deployment steps, we can live with the defaults for now, stuff for another blog post. (Update: the Lambda functions became obsolete the move to AWS CodeBuild,, and so did the template parameters regarding Lambda source code location.)

The next step of the CloudFormation stack setup allows you to specify advanced settings like tags, notifications and so on. We can leave as-is as well.

On the last assistant page you need to acknowledge that CloudFormation will create IAM roles on your behalf:

CloudFormation IAM confirmation

The IAM roles are needed to give Lambda functions the right permissions to run and logs to CloudWatch. Once you pressed the “Create” button, CloudFormation will create the following AWS resources:

  • An S3 Bucket containing the website assets with website hosting enabled.
  • A deployment pipeline (AWS CodePipeline) consisting of the following steps:
    • Checks out the source code from GitHub and saves it as an artifact.
    • Back end deployment: A Lambda function build step which takes the source artifact, installs and calls the Serverless framework.
    • Front end deployment: Another Lambda function build step which takes the source artifact, runs npm build and deploys the build to the Website S3 bucket

(Update: in the meantime, I replaced the Lambda functions with AWS CodeBuild).

No servers harmed so far, and also no workstations: No error-prone installation steps in READMEs to be followed, no curl | sudo bash or other awkward setup instructions. Also no hardcoded AWS access key pairs anywhere!

A platform team in an organization could provide several of these types of templates for particular use cases, then development teams could get going just by clicking the link.

Ok, back to our example: Once the CloudFormation stack creation is fully finished, the created CodePipeline is going to run for the first time. On the AWS console:

CodePipeline running

As soon as the initial pipeline run is finished:

  • the back end CloudFormation stack has been created by the Serverless framework, depending on what you defined in the backend/serverless.yml configuration file.
  • the front end has been built and put into the website bucket.

To find out the URL of our website hosted in S3, open the resources of the CloudFormation stack and expand the outputs. The WebsiteUrl output will show the actual URL:

CloudFormation Stack output

Click on the URL link and view the website:

Deployed sample website

Voila! We are up and running!

As might have seen in the picture above, there is some JSON output: It’s actually the result of a HTTP call the front end made against the back end: the hello function, which just responds the Lambda event object.

Let’s dig a bit deeper into this detail as it shows the integration of frontend and backend: To pass the ServiceEndpoint URL to the front end build step, the back end build step is exporting all CloudFormation Outputs of the Serverless-created stack as a CodePipeline build artifact, which the front end build step takes in turn to pass it to npm build (in our case via a react-specific environment var). This is how the API call looks in react:

This Cross-site request actually works, because we specified CORS to be on in the serverless.yml:

Here is a high-level overview of the created CloudFormation stack:

Overview of the CloudFormation Stack

With the serverless pipeline and serverless project running, change something in your project, commit it and view the change propagated through the pipeline!

Additional thoughts:

I want to setup my own S3 bucket with my own CloudFormation templates/blueprints!

In case that you don’t trust me as a template provider, or you want to change the one-button CloudFormation template, you can of course host your own S3 bucket. The scope of doing that is beyond this article but you can start by looking at my CloudFormation template repo.

I want to have testing/staging in the pipeline!

The sample pipeline does not have any testing or staging steps. You can add more steps to the pipeline, e.g. another Lambda step, which calls e.g. npm test on your source code.

I need a database/cache/whatever for my app!

No problem, just add additional resources to the serverless.yml configuration file.

Summary

In this blog post I demonstrated a CloudFormation template which bootstraps a serverless deployment pipeline with AWS CodePipeline. This enables rapid application development and deployment, as development teams can use the template in a “one-button” fashion for their projects.

We have deployed a sample project with a deployment pipeline with a front and back end.

AWS gives us all the lego bricks we need to create such pipelines in an automated, codified and (almost) maintenance-free way.

Known issues / caveats

  • I am describing an experiment / working prototype here. Don’t expect high quality, battle tested code (esp. the JavaScript parts 🙂 ). It’s more about the architectural concept. Issues and Pull requests to the mentioned projects are welcome 🙂 (Update: luckily I could delete all the JS code with  the move to AWS CodeBuild)
  • All deployment steps currently run with full access roles (AdministratorAccess) which is a bad practice but it was not the focus of this article.
  • The website could also be backed by a CloudFront CDN with HTTPS and a custom domain.
  • Beware of the 5 minute execution limit in Lambda functions (e.g. more complex serverless.yml setups might take longer, this could be worked around by sourcing it resource creation a CloudFormation pipeline step, Michael Wittig has blogged about that). (Update: this point became invalid with the move to AWS CodeBuild)
  • The build steps are currently not optimized, e.g. installing npm/serverless every time is not necessary. It could use an artifact from an earlier CodePipeline step (Update: this point became invalid with the move to AWS CodeBuild)
  • The CloudFormation stack created by the Serverless framework is currently suffixed with “dev”, because that’s their default environment. The prefix should be omitted or made configurable.

Acknowledgements

Special thanks goes to the folks at Stelligent.

First for their open source work on serverless deploy pipelines with Lambda, especially the “dromedary-serverless” project. I adapted much from the Lambda code.

Second for their “one-button” concept which influenced this article a lot.

About the Author

Along with 18 years of web software development and web operations experience, Soenke Ruempler is an expert in AWS technologies (6 years in experience of development and operating), and in moving on-premise/legacy systems to the Cloud without service interruptions.

His special interests and fields of knowledge are Cloud/AWS, Serverless architectures, Systems Thinking, Toyota Kata (Kaizen), Lean Software Development and Operations, High Performance/Reliability Organizations, Chaos Engineering.

You can find him on Twitter, Github and occasionally blogging on ruempler.eu.

About the Editors

Ryan Brown is a Sr. Software Engineer at Ansible (by Red Hat) and contributor to the Serverless Framework. He’s all about using the best tool for the job, and finds simplicity and automation are a winning combo for running in AWS.


Modular cfn-init Configsets with SparkleFormation

13. December 2016 2016 0

Author: Michael F. Weinberg
Editors: Andreas Heumaier

This post lays out a modular, programmatic pattern for using CloudFormation Configsets in SparkleFormation codebases. This technique may be beneficial to:

  • Current SparkleFormation users looking to streamline EC2 instance provisioning
  • Current CloudFormation users looking to manage code instead of JSON/YAML files
  • Other AWS users needing an Infrastructure as Code solution for EC2 instance provisioning

Configsets are a CloudFormation specific EC2 feature that allow you to configure a set of instructions for cfn-init to run upon instance creation. Configsets group collections of specialized resources, providing a simple solution for basic system setup and configuration. An instance can use one or many Configsets, which are executed in a predictable order.

Because cfn-init is triggered on the instance itself, it is an excellent solution for Autoscaling Group instance provisioning, a scenario where external provisioners cannot easily discover underlying instances, or respond to scaling events.

SparkleFormation is a powerful Ruby library for composing CloudFormation templates, as well as orchestration templates for other cloud providers.

The Pattern

Many CloudFormation examples include a set of cfn-init instructions in the instance Metadata using the config key. This is an effective way to configure instances for a single template, but in an infrastructure codebase, doing this for each service template is repetitious and introduces the potential for divergent approaches to the same problem in different templates. If no config key is provided, cfn-init will automatically attempt to run a default Configset. Configsets in CloudFormation templates are represented as an array. This pattern leverages Ruby’s concat method to construct adefault Configset in SparkleFormation’s compilation step. This allows us to use Configsets to manage the instance Metadata in a modular fashion.

To start any Instance or Launch Config resources should include an empty array as the default Configset in their metadata, like so:

Additionally, the Instance or Launch Config UserData should run the cfn-init command. A best practice is to place this in a SparkleFormation registry entry. A barebones example:

With the above code, cfn-init will run the empty default Configset. Using modular registry entries, we can expand this Configset to meet our needs. Each registry file should add the defined configuration to the default Configset, like this:

A registry entry can also include more than one config block:

Calling these registry entries in the template will add them to the default Configset in the order they are called:

Note that other approaches to extending the array will also work:

sets.default += [ 'key_to_add' ], sets.default.push('key_to_add'), sets.default << 'key_to_add', etc.

Use Cases

Extending the default Configset rather than setting the config key directly makes it easy to build out cfn-initinstructions in a flexible, modular fashion. Modular Configsets, in turn, create opportunities for better Infrastructure as Code workflows. Some examples:

Development Instances

This cfn-init pattern is not a substitute for full-fledged configuration management solutions (Chef, Puppet, Ansible, Salt, etc.), but for experimental or development instances cfn-init can provide just enough configuration management without the increased overhead or complexity of a full CM tool.

I use the Chef users cookbook to manage users across my AWS infrastructure. Consequently, I very rarely make use of AWS EC2 keypairs, but I do need a solution to access an instance without Chef. My preferred solution is to use cfn-init to fetch my public keys from Github and add them to the default ubuntu (or ec2-user) user. The registry for this:

In the template, I just set a github_user parameter and include the registry, and I get access to an instance in any region without needing to do any key setup or configuration management.

This could also be paired with a configuration management registry entry and the Github user setup can be limited to development:

Compiling this with the environment variable development=true will include the Github Configset, in any other case it will run the full configuration management.

In addition to being a handy shortcut, this approach is useful for on-boarding other users/teams to an Infrastructure codebase and workflow. Even with no additional automation in place, it encourages system provisioning using a code-based workflow, and provides a groundwork to layer additional automation on top of.

Incremental Automation Adoption

Extending the development example, a modular Configset pattern is helpful for incrementally introducing automation. Attempting to introduce automation and configuration management to an infrastructure that is actively being architected can be very frustrating—each new component require not just understanding the component and its initial configuration, but also determining how best to automate and abstract that into code. This can lead to expedient, compromise implementations that add to technical debt, as they aren’t flexible enough to support emergent needs.

An incremental approach can mitigate these issues, while maintaining a focus on code and automation. Well understood components are fully automated, while some emergent features are initially implemented with a mixture of automation and manual experimentation. For example, an engineer approaching a new service might perform some baseline user setup and package installation via an infrastructure codebase, but configure the service manually while determining the ideal configuration. Once that configuration matures, the automation resources necessary to achieve it are included in the codebase.

CloudFormation Configsets are effective options for package installation and are also good for fetching private assets from S3 buckets. An engineer might use a Configset to setup her user on a development instance, along with the baseline package dependencies and a tarball of private assets. By working with the infrastructure codebase from the outset, she has the advantage of knowing that any related AWS components are provisioned and configured as they would be in a production environment, so she can iterate directly on service configuration. As the service matures, the Configset instructions that handled user and package installation may be replaced by more sophisticated configuration management tooling, but this is a simple one-line change in the template.

Organization Wide Defaults

In organizations where multiple engineers or teams contribute discrete application components in the same infrastructure, adopting standard approaches across the organization is very helpful. Standardization often hinges on common libraries that are easy to include across a variety of contexts. The default Configset pattern makes it easy to share registry entries across an organization, whether in a shared repository or internally published gems. Once an organizational pattern is codified in a registry entry, including it is a single line in the template.

This is especially useful in organizations where certain infrastructure-wide responsibilities are owned by a subset of engineers (e.g. Security or SRE teams). These groups can publish a gem (SparklePack) containing a universal configuration covering their concerns that the wider group of engineers can include by default, essentially offering these in an Infrastructure as a Service model. Monitoring, Security, and Service Discovery are all good examples of the type of universal concerns that can be solved this way.

Conclusion

cfn-init Configsets can be a powerful tool for Infrastructure as Code workflows, especially when used in a modular, programmatic approach. The default Configset pattern in SparkleFormation provides an easy to implement, consistent approach to managing Configsets across an organization–either with a single codebase or vendored in as gems/SparklePacks. Teams looking to increase the flexibility of their AWS instance provisioning should consider this pattern, and a progammatic tool such as SparkleFormation.

For working examples, please checkout this repo.

About the Author

Michael F. Weinberg is an Infrastructure & Automation specialist, with a strong interest in cocktails and jukeboxes. He currently works at Hired as a Systems Engineer. His open source projects live at http://github.com/reverseskate.


Providing Static IPs for Non-Trivial Architectures

12. December 2016 2016 0

Author: Oli Wood
Editors: Seth Thomas, Scott Francis

An interesting problem landed on my desk a month ago that seemed trivial to begin with, but once we started digging into the problem it turned out to be more complex than we thought.  A small set of our clients needed to restrict outgoing traffic from their network to a whitelist of IP addresses.  This meant providing a finite set of IPs which we could use to provide a route into our data collection funnel.

Traditionally this has not been too difficult, but once you take into account the ephemeral nature of cloud infrastructures and the business requirements for high availability and horizontal scaling (within reason) it gets more complex.

We also needed to take into account that our backend system (api.example.com) is deployed in a blue/green manner (with traffic being switched by DNS), and that we didn’t want to incur any additional management overhead with the new system.  For more on Blue/Green see http://martinfowler.com/bliki/BlueGreenDeployment.html.

Where we ended up looks complex but is actually several small systems glued together.  Let’s describe the final setup and then dig into each section.

The Destination

A simplified version of the final solution.
A simplified version of the final solution.

 

The View from the Outside World

Our clients can address our system by two routes:

  • api.example.com – our previous public endpoint.  This is routed by Route 53 to either api-blue.example.com or api-green.example.com
  • static.example.com – our new address which will always resolve to a finite set of IP addresses (we chose 4).  This will eventually route through to the same blue or green backend.

The previous infrastructure

api-blue.example.com is an autoscaling group deployed (as part of a wider system) inside its own VPC.  When we blue/green deploy an entire new VPC is created (this is something we’re considering revisiting).  It is fronted by an ELB.  Given the nature of ELBs, the IP addresses of this instance will change over time, which is why we started down this road.

The proxying infrastructure

static.example.com is a completely separate VPC which houses 4 autoscaling groups set to a minimum size of 1 and a maximum size of 1.  The EC2 instances are assigned an EIP on boot (more on this later) and have HAProxy 1.6 installed.  HAProxy is setup to provide two things:

  • A TCP proxy endpoint on port 443
  • A healthcheck endpoint of port 9000

The DNS configuration

The new DNS entry for static.example.com is configured so that it only returns IP addresses for up to 4 of the EIPs, based on the results of their healthcheck (as provided by HAProxy).

How we got there

The DNS setup

static.example.com is based on a set of four Health Checks which form a Traffic Policy that creates the Policy Record (which is the equivalent of your normal DNS entry).

Steps to create Health Checks:

  1. Log into the AWS Console
  2. Head to Route 53
  3. Head to Health Checks
  4. Create new Health Check
    1. What to monitor => Endpoint
    2. Specify endpoint by => IP Address
    3. Protocol => HTTP
    4. IP Address => [Your EIP]
    5. Host name => Ignore
    6. Port => 9001
    7. Path => /health

Repeat four times.  Watch until they all go green.

Steps to create Traffic Policy:

  1. Route 53
  2. Traffic Policies
  3. Create Traffic Policy
    1. Policy name => something sensible
    2. Version description => something sensible

This opens up the GUI editor

  1. Choose DNS type A: IP address
  2. Connect to => Weighted Rule
  3. Add 2 more Weights
  4. On each choose “Evaluate target health” and then one of your Health Checks
  5. Make sure the Weights are all set the same (I chose 10)
  6. For each click “Connect to” => New Endpoint
    1. Type => Value
    2. Value => EIP address
The traffic policy in the GUI
The traffic policy in the GUI

Adding the Policy record

  1. Route 53
  2. Policy Record
  3. Create new Policy Record
    1. Traffic policy => Your new policy created above
    2. Version => it’ll probably be version 1 because you just created it
    3. Hosted zone => chose the domain you’re already managing in AWS
    4. Policy record => add static.example.com equivalent
    5. TTL => we chose 60 seconds

And there you go, static.example will route traffic to your four EIPs, but only if they are available.

The Autoscaling groups

The big question you’re probably wondering here is “why did they create four separate Autoscaling groups?  Why not just use one?”  It’s a fair question, and our choice might not be right for you, but the reasoning is that we didn’t want to build something else to manage which EIPs were assigned to each of the 4 instances.  By using 4 separate Autoscaling groups we can use 4 separate Launch Configurations, and then use the EC2 tags to manage how an instance knows which EIP to launch.

The keys things here are…

  • Each of the Autoscaling Groups is defined separately in our CloudFormation stack
  • Each of the Autoscaling Groups has its own Launch Configuration
  • We place two Autoscaling Groups in each of our Availability Zones
  • We place two Autoscaling Groups in each Public Subnet
  • Tags on the Autoscaling Group are set with “PropagateAtLaunch: true” so that the instances they launch end up with the EIP reference on them
  • Each of the four Launch Configurations includes the same UserData script (Base64 encoded in our CloudFormation template)
  • The LaunchConfiguration includes an IAM Role giving enough permissions to be able to tag the instance

The UserData script

The IAM Role statement

The EC2 instances

We chose c4.xlarge instances to provide a good amount of network throughput.  Because HAProxy is running in TCP mode we struggle to monitor the traffic levels and so we’re using CloudWatch to alert on very high or low Network Output from the four instances.

The EC2 instances themselves are launched from a custom AMI which includes very little except a version of HAProxy (thanks to ITV for https://github.com/ITV/rpm-haproxy).  We’re using this fork because it supplies the slightly newer HAProxy veresion 1.6.4

Unusually for us we’ve baked the config for HAProxy into the AMI.  This is a decision we will revisit at a later date I suspect and have the config pulled from S3 at boot time.

HAProxy is set to start on boot.  Something we shall probably add at a later date is to have the Autoscaling Group use the same healthcheck endpoint that HAProxy provides to Route 53 to determine the instance health. This way we’ll launch another instance if one comes up, but does not provide a healthy HAProxy for some reason.

The HAProxy setup

HAProxy is a fabulously flexible beast and we had a lot of options on what to do here.  We did however wish to keep it as simple as possible.  With that in mind, we opted to not offload SSL at this point but to act as a passthrough proxy direct to our existing architecture.

Before we dive into the config, however, it’s worth mentioning our choice of backend URL.  We opted to route back to api.example.com because this means that when we blue/green deploy our existing setup we don’t need to make any changes to our HAProxy setup.  By using its own health check mechanism and “resolvers” entry we can make sure that the IP addresses that it is routing to (the new ELB) aren’t more than a few seconds out of date.  This loopback took us a while to figure out and is (again) something we might revisit in the future.

Here are the important bits of the config file:

The resolver

Makes use of AWS’s internal DNS service.  This has to be used in conjunction with a health check on the backend server

The front end listener

Super simple.  This would be more complex if you wanted to route traffic from different source addresses to different backends using SNI (see http://blog.haproxy.com/2012/04/13/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/).

The backend listener

The key things here are the including of the resolver (mydns, as defined above). It’s the combination of the two which causes HAProxy to reevaluate the DNS entry.

The outwards facing health check

This will return a 200 if everything is ok, 503 if the backend is down, and will return a connection failure if HAProxy is down. This will correctly inform the Route 53 health checks and if needed R53 will not include the IP address.

What we did to test it

We ran through various scenarios to check how the system coped:

  • Deleting one of the proxy instances and seeing it vanish from the group returned from static.example.com
  • Doing a blue/green deployment and seeing HAProxy update its backend point
  • Block access to one AZ with a tweak to the Security Group to simulate the AZ becoming unavailable
  • Forcing 10 times our load in using Vegeta
  • Running a soak test at sensible traffic levels over several hours (also with Vegeta)

The end result

While this is only providing 4 EC2 instances which proxy traffic, it’s a pattern which could be scaled out very easily, with each section bringing another piece of the resilience pie to the table.

  • Route 53 does a great job of only including EIPs that are associated with healthy instances
  • The Autoscaling Groups make sure that our proxy instances will bounce back if something nasty happens to them
  • UserData and Tags provide a neat way for the instances to self-manage the allocation of EIPs
  • HAProxy provides both transparent routing and health checks.
  • Route 53 works really well for Blue/Greening our traffic to our existing infrastructure.

It’s not perfect (I imagine we’ll have issues with some client caching DNS records for far too long at some point), and I’ll wager we’ll end up tuning some of the timeouts and HAProxy config at some point in the future, but for now it’s out there and happily providing an end point for our customers (and not taking up any of our time).  We’ve tested how to deploy updates (deploy a new CloudFormation stack and let the new instance “steal” the EIPs) successfully too.

About the Author:

Oli Wood has been deploying systems into AWS since 2010 in businesses ranging from 2 people startups to multi-million dollar enterprises. Previous to that he mostly battled with deploying them onto other service providers, cutting his teeth in a version control and deployment team on a Large Government Project back in the mid 2000s.

Inside of work he spends time, train tickets and shoe leather helping teams across the business benefit from DevOps mentality.

Outside of work he can mostly be found writing about food on https://www.omnomfrickinnom.com/ and documenting the perils of poor posture at work at http://goodcoderbadposture.com/

Online he’s @coldclimate

About the Editors:

Scott Francis has been designing, building and operating Internet-scale infrastructures for the better part of 20 years. He likes BSD, Perl, AWS, security, cryptography and coffee. He’s a good guy to know in a zombie apocalypse. Find him online at  https://linkedin.com/in/darkuncle and https://twitter.com/darkuncle.


L4 vs L7 Showdown

10. December 2016 2016 0

Author: Atif Siddiqui

Editors: Vinny Carpenter, Brian O’Rourke

Objective

This article will explain the role and types of load balancers before delving into it through the prism of Amazon Web Services (AWS). This post wraps up with a lab exercise on AWS Load Balancer migration.

Introduction

A load balancer is a device that in its simplest form acts as a funnel for traffic before redistributing it. This is achieved by playing the role of reverse proxy server (RPS). While a load balancer can be a hardware device or a software component, this article will focus on a Software Defined Networking (SDN) load balancer.

Load Balancer dictating traffic distribution
Load Balancer dictating traffic distribution

OSI 101

Open System Interconnection (OSI) model is a conceptual illustration of networking. It shows the dependency of each layer serving the one above it. When discussing load balancers, transport and applications layer hold our interest.

Open Systems Interconnection model – high level
Open Systems Interconnection model – high level

There are two types of load balancers.

1. A Layer 4 load balancer works at the networking transport layer. This confines the criteria to IP addresses and ports as only the packet header is being inspected without reviewing its contents.

2.A  Layer 7 load balancer works at the application layer. It has higher intelligence because it can inspect packet contents as it understands protocols such as HTTP, HTTPS, WebSockets. This gives it the ability to perform advanced routing.

 Open Systems Interconnection model – close up [1]
Open Systems Interconnection model – close up [1]

AWS Perspective

Elastic Load Balancer (ELB) is one of the cornerstones of designing resilient applications. A walk down memory lane shows that beta release happened back in May 2009. Being a layer 4 (L4) load balancer, with ELB, routing decisions are made without inspecting contents of the packet.

The abstraction and simplicity of use remain as its core strengths: provisioning can be done through one click of a button. On the flip side, one of the features that is conspicuously missing is the support of server name indication (SNI). While wildcard and SAN certificates are supported, hopefully multiple certificates support is around the corner.

As a new offering in this space, AWS recently came out with Layer 7 Load balancer aptly named Application Load Balancer (ALB). This was announced in August this year with availability across all AWS commercial regions. Along with this announcement, the original load balancer was rebranded as Class Load Balancer.

Building blocks of an AWS application load balancer
Building blocks of an AWS application load balancer

AWS has also introduced target group as the new nomenclature. Target group is used to register EC2(s) that is mapped to port number(s). Target group is linked to ALB via Listener which in turn can have rule(s) association.

Register/de-register instance for Target group
Register/de-register instance for Target group

Some other noteworthy aspects about ALB are:

1. ALB supports HTTP and Web Sockets.

3. While AWS cli for Classic Load Balancer is aws elb, for Application Load Balancer it is aws elbv2.

4. ALB allows routing via path matching only with a ceiling of 10 URL based rules.

5. Like Classic, pre-warming for ALB is recommended in preparation for major traffic spike.

6. ALB’s hourly late is 10% lower than ELB.

7. CloudFormation supports ALB though, interestingly, it is referred to as ElasticLoadBalancingV2.

Migration Guide: ELB -> ALB

While ELB cannot be converted to an ALB, migration is supported [2]. AWS recommends python script [3] available in github. The following exercise was done on an Amazon AMI to test such a migration. Each command is preceded with a comment to indicate the purpose. It is assumed that the reader already has the AWS CLI installed, as well as has their credentials set up to be able to manipulate aws objects from the command line.

grab migration utility [4]

— verify existing ELB name via cli

— Conduct dry run for load balancer migration (specified incorrect region first time around). As python script needs boto3; prerequisite step is to run command via pip install boto3

— create application load balancer

Target group ARNs:

Considerations:

1. If your Classic load balancer is attached to an Auto Scaling group, attach the target groups to the Auto Scaling group.

2. All HTTPS listeners use the predefined security policy.

3. To use Amazon EC2 Container Service (Amazon ECS), register your containers as targets.

On November 22, the product team published [5] a new ALB feature for request tracing. This will provide the ability to trace through individual requests. I can’t wait to play with it.

References

  1. https://mplsnet.files.wordpress.com/2014/06/osi-model.gif
  2. http://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/migrate-to-application-load- balancer.html
  3. https://github.com/aws/elastic-load-balancing-tools
  4. https://raw.githubusercontent.com/aws/elastic-load-balancing- tools/master/copy_classic_load_balancer.py
  5. https://aws.amazon.com/blogs/aws/application-performance-percentiles-and-request-tracing-for- aws-application-load-balancer/

 

About the Author:

Atif Siddiqui is a certified AWS Solutions Architect. He works as an Architect at General Electric (GE) in enterprise applications space. His responsibilities encompass critical applications with global footprint where he brings solutions and infrastructure expertise for his customers. He is also an avid blogger on GE’s internal social media.

About the Editors:

Brian O’Rourke is the co-founder of RedisGreen, a highly available and highly instrumented Redis service. He has more than a decade of experience building and scaling systems and happy teams, and has been an active AWS user since S3 was a baby.

Four ways AWS Lambda makes me happy

09. December 2016 2016 0

Author: Tal Perry
Editors: Jyrki Puttonen, Bill Weiss

Intro

What is Lambda

Side projects are my way of learning new technology. One that I’ve been anxious to try is AWS Lambda. In this article, I will focus on the things that make Lambda a great service in my opinion.

For the uninitiated, Lambda is a service that allows you to essentially upload a function and AWS will make sure the hardware is there to run it. You pay for the compute time in hundred millisecond increments instead of by the hour, and you can run as many copies of your lambda function as needed.

You can think of Lambda as a natural extension to containers. Containers (like Docker) allow you to easily deploy multiple workloads to a fleet of servers. You no longer deploy to a server, you deploy to the fleet and if there is enough room in the fleet your container runs. Lambda takes this one step further by abstracting away the management of the underlying server fleet and containerization. You just upload code, AWS containerizes it and puts it on their fleet.

Why did I choose Lambda?

My latest side project is SmartScribe, an automated transcription service. SmartScribe transcribes hours of audio in minutes, a feat which requires considerable memory and parallel processing of audio. While a fleet of containers could get the job done, I didn’t want to manage a fleet, integrate it with other services nor did I want to pay for my peak capacity when my baseline usage was far lower. Lambda abstracts away these issues, which made it a very satisfying choice.

How AWS Lambda makes me happy

It’s very cheap

I love to invest my time in side projects, I get to create and learn. Perhaps irrationally, I don’t like to put a lot of money into them from the get go. When I start building a project I want it up all the time so that I can show it around. On the other hand I know that 98% of the time, my resources will not be used.

Serverless infrastructure saves me that 98% by allowing me to pay by the millisecond instead of by the hour. 98% is a lot of savings by any account

I don’t have to think about servers

As I mentioned, I like to invest my time in side projects but I don’t like to invest it in maintaining or configuring infrastructure. A thousand little things can go wrong on your server and any one of those will bring your product to a halt. I’m more than happy to never think about another server again.

Here are a few things that have slowed me down before that Lambda has abstracted away:

  1. Having to reconfigure because I forgot to set the IP address of an instance to elastic and the address went away when I stopped it (to save money)
  2. Worrying about disk space. My processes write to the disk. Were I to use a traditional architecture I’d have to worry about multiple concurrent processes consuming the entire disk, a subtle and aggravating bug. With lambda, each function invocation is guaranteed a (small) chunk of tmp space which reduces my concern.
  3. Running out of memory. This is a fine point because a single lambda function can only use 1.5 G of memory.

Two caveats:

  1. Applications that hold large data sets in memory might not benefit from Lambda. Applications that hold small to medium sized data sets in memory are prime candidates.
  2. 512MB of provisioned tmp space is a major bottleneck to writing larger files to disk.

Smart Scribe works with fairly large media files and we need to store them in memory with overhead. Even a few concurrent users can easily lead to problems with available memory – even with a swap file (and we hate configuring servers so we don’t want one). Lambda guarantees that every call to my endpoints will receive the requisite amount of memory. That’s priceless.

I use Apex to deploy my functions, which happens in one line

Apex is smart enough to only deploy the functions that have changed. And in that one line, my changes and only them reach every “server” I have. Compare that to the time it takes to do a blue green deployment or, heaven forbid, sshing into your server and pulling the latest changes.

But wait, there is more. Pardon last year’s buzzword, but AWS Lambda induces or at least encourages a microservice architecture. Since each function exists as its own unit, testing becomes much easier and more isolated which saves loads of time.

Tight integration with other AWS services

What makes microservices hard is the overhead of orchestration and communications between all of the services in your system. What makes Lambda so convenient is that it integrates with other AWS services, abstracting away that overhead.

Having AWS invoke my functions based on an event in S3 or SNS means that I don’t have to create some channel of communication between these services, nor monitor that channel. I think that this fact is what makes Lambda so convenient, the overhead you pay for a scalable, maintainable and simple code base is virtually nullified.

The punch line

One of the deep axioms of the world is “Good, Fast, Cheap : Choose two”. AWS Lambda takes a stab at challenging that axiom.

About the Author:

By day, Tal is a data science researcher at Citi’s Innovation lab Tel Aviv focusing on NLP. By night he is the founder of SmartScribe, a fully serverless automated transcription service hosted on AWS. Previously Tal was CTO of Superfly where he and his team leveraged AWS technologies and good Devops to scale the data pipeline 1000x. Check out his projects and reach out on Twitter @thetalperry

About the Editors:

Jyrki Puttonen is Chief Solutions Executive at Symbio Finland (@SymbioFinland) who tries to keep on track what happens in cloud.

Bill Weiss is a senior manager at Puppet in the SRE group.  Before his move to
Portland to join Puppet he spent six years in Chicago working for Backstop
Solutions Group, prior to which he was in New Mexico working for the
Department of Energy.  He still loves him some hardware, but is accepting
that AWS is pretty rad for some some things.


Building custom AMIs with Packer

08. December 2016 2016 0

Author: Andreas Rütten
Editors: Steve Button, Joe Nuspl

Amazon machine images (AMIs) are the basis of every EC2 instance launched. They contain the root volume and thereby define what operating system or application will run on the instance.

There are two types of AMIs:

  • public AMIs are provided by vendors, communities or individuals. They are available on the AWS Marketplace and can be paid or free.
  • private AMIs belong to a specific AWS account. They can be shared with other AWS accounts by granting launch permissions. Usually they are either a copy of a public AMI or created by the account owner.

There are several reasons to create your own AMI:

  • predefine a template of the software, which runs on your instances. This provides a major advantage for autoscaling environments. Since most, if not all, of the system configuration has already been done, there is no need to run extensive provisioning steps on boot. This drastically reduces the amount of time from instance start to service ready.
  • provide a base AMI for further usage by others. This can be used to ensure a specific baseline across your entire organization.

What is Packer

Packer is software from the HashiCorp universe, like Vagrant, Terraform or Consul. From a single source, you can create machine and container images for multiple platforms.

For that Packer has the concept of builder and provisioner.

Builders exist for major cloud providers (like Amazon EC2, Azure or Google), for container environments (like Docker) or classical visualization environments (like QEMU, VirtualBox or VMware). They manage the build environment and perform the actual image creation.

Provisioners on the other hand are responsible for installing and configuring all components, that will be part of the final image. They can be simple shell commands or fully featured configuration management systems like Puppet, Chef or Ansible.

How to use Packer for building AWS EC2 AMIs

The heart of every Packer build is the template, a JSON file which defines the various steps of each Packer run. Let’s have a look at a very simple Packer template:
simple_template

There is just one builder and one simple provisioner. On line 4, we specify the amazon-ebs builder which means Packer will create an EBS-backed AMI by:

  • launching the source AMI
  • running the provisioner
  • stopping the instance
  • creating a snapshot
  • converting the snapshot into a new AMI
  • terminating the instance

As this all occurs under your AWS account, you need to specify your access_key and secret_key. Lines 7-9 specify the region, source AMI and instance type that will be used during the build. ssh_username specifies which user Packer will use to ssh into the build instance. This is specific to the source AMI. Ubuntu based AMIs use ubuntu which AWS Linux based AMIs use ec2-user.

Packer will create temporary keypairs and security groups to connect the local system to the build instance to run the provisioner. If you are running packer prior to 0.12.0 watch out for GitHub issue #4057.

The last line defines the name of the resulting AMI. We use the {{timestamp}} function of the Packer template engine which generates the current timestamp as part of the final AMI name.

The provisioner section defines one provisioner of the type shell. The local script “setup_things.sh” will be transferred to the build instance and executed. This is the easiest and most basic way to provision an instance.

A more extensive example

The requirements for a real world scenario usually needs something more than just executing a simple shell script during provisioning. Lets add some more advanced features to the simple template.

Optional sections

The first thing somebody could add is a description and a variables section to the top of our template, like:

The first one is just a simple and optional description of the template. While the second adds some useful functionality. It defines variables, which can later be used in the template. Some of them have a default value, others are optional and can be set during the Packer call. Using packer inspect shows that:

Overriding can be done like this:

Multiple builders

The next extension could be to define multiple builders:

The amazon-ebs builder was extended by using some of the previously introduced variables. It got a bit more specific about the build environment which will be used on AWS side by defining a VPC, a subnet and attaching a public IP address to the build instance, and also added how the description of the resulting AMI will look.

The second builder defines a build with docker. This is quite useful for testing the provisioning part of the template. Creating an EC2 instance and an AMI afterwards takes some time and resources while building in a local docker environment is faster.

The pull option ensures that the base docker image is pulled if it isn’t already in the local repository. While the commit option is set so that the container will be committed to an image in the local repository after provisioning instead of exported.

Per default, packer will execute all builders which have been defined. This can be useful if you want to build the same image in a different Cloud Provider or in different AWS regions at the same time. In our example we have a special test builder and the actual AWS builder. The following command tells packer to use only a specific builder:

Provisioner

Provisioners are executed sequentially during the build phase. Using the only option you can restrict the provisioner to be called only by the corresponding builder.

This is useful if you need different provisioners or different options for a provisioner. In this example both call the same script to do some general bootstrap actions. One is for the amazon-ebs builder, where we call the script with sudo, and the other is for the docker builder where we don’t need to call sudo, as being root is the default inside a docker container.

The script itself is about upgrading all installed packages and installing Ansible to prepare the next provisioner:

Now a provisioner of the type ansible-local can be used. Packer will copy the defined Ansible Playbook from the local system into the instance and then execute it locally.

The last one is another simple shell provisioner to do some cleanup:

Post-Processors

Post-processors are run right after the image is built. For example to tag the docker image in the local docker repository:

Or to trigger the next step of a CI/CD pipeline.

Pitfalls

While building AMIs with packer is quite easy in general, there are some pitfalls to be aware of.

The most common is differences between the build system and the instance which will be created based on the AMI. It could be anything from simple things like different instance type to running in a different VPC. This means thinking about what can already be done at build time and what is specific to the environment where an EC2 instance is created based on the build AMI. Other examples, an Apache worker threads configuration based on the amount of available CPU cores, or a VPC specific endpoint the application communicates with, for example, an S3 VPC endpoint or the CloudWatch endpoint where custom metrics are sent.

This can be addressed by running a script or any real configuration management system at first boot time.

Wrapping up

As we have seen, building an AMI with a pre-installed configuration is not that hard. Packer is an easy to use and powerful tool to do that. We have discussed the basic building blocks of a packer template and some of the more advanced options. Go ahead and check out the great Packer documentation which explains this and much more in detail.

All code examples can be found at https://github.com/aruetten/aws-advent-2016-building-amis

About the Author

Andreas Rütten is a Senior Systems Engineer at Smaato, a global real-time advertising platform for mobile publishers & app developers. He also is the UserGroup leader of the AWS UG Meetup in Hamburg.

About the Editors

Steve Button is a Linux admin geek / DevOps who likes messing around with Raspberry Pi, Ruby, Python. Love technology and hate technology.

Joe Nuspl is a Portland, OR based DevOps Kung Fu practioner. He is a senior operations engineer at Workday. One of the DevOpsDays Portland 2016 organizers. Author of the zap chef community cookbook. Aspiring culinary chef. Occasionally he rambles on http://nvwls.github.io/ or @JoeNuspl on Twitter.


Are you getting the most out of IAM?

07. December 2016 2016 0

Author: Jon Topper
Editors: Bill Weiss, Alfredo Cambera

Identity Concepts

Identity is everywhere, whether we’re talking about your GitHub id, Twitter handle, or email address. A strong notion of identity is important in information systems, particularly where security and compliance is involved, and a good identity system supports access control, trust delegation, and audit trail.

AWS provides a number of services for managing identity, and today we’ll be looking at their main service in this area: IAM – Identity and Access Management.

IAM Concepts

Let’s take a look at the building blocks that IAM provides.

First of all, there’s the root user. This is how you’ll log in when you’ve first set up your AWS account. This identity is permitted to do anything and everything to any resource you create in that account, and – like the unix root user – you should really avoid using it for day to day work.

As well as the root user, IAM supports other users. These are separate identities which will typically be used by the people in your organization. Ideally, you’ll have just one user per person; and only one person will have access to that user’s credentials – sharing usernames and passwords is bad form. Users can have permissions granted to them by the use of policies.

Policies are JSON documents which, when attached to another entity, dictate what those entities are allowed to do.

Just like a unix system, we also have groups. Groups pull together lists of users, and any policies applied to the group are available to the members.

IAM also provides roles. In the standard AWS icon set, an IAM Role is represented as a hard hat. This is fairly appropriate, since other entities can “wear” a role, a little like putting on a hat. You can’t log directly into a role – they can’t have passwords – but users and instances can assume a role, and when they do so, the policies associated with that role dictate what they’re allowed to do.

Finally we have tokens. These are sets of credentials you can hold, either permanent or temporary. If you have a token you can present these to API calls to prove to them who you are.

IAM works across regions, so any IAM entity you create is available everywhere. Unlike other AWS services, IAM itself doesn’t cost anything – though obviously anything created in your account by an IAM user will incur costs in the same way as if you’ve done it yourself.

Basic Example

In a typical example we may have three members of staff: Alice, Bob and Carla. Alice is the person who runs the AWS account, and to stop her using the root account for day to day work, she can create herself an IAM user, and assign it one of the default IAM Policies: AdministratorAccess.

As we said earlier, IAM Policies are JSON documents. The AdministratorAccess policy looks like this:

The Version number here establishes which version of the JSON policy schema we’re using and this will likely be the same across all of your policies. For the purpose of this discussion it can be ignored.

The Statement list is the interesting bit: here, we’re saying that anyone using this policy is permitted to call any Action(the * is a wildcard match), on any Resource. Essentially this holder of this policy has the same level of access as the root account, which is what Alice wants, because she’s in charge.

Bob and Carla are part of Alice’s team. We want them to be able to make changes to most of the AWS account, but we don’t want to let them manipulate users – otherwise they might disable Alice’s account, and she doesn’t want that! We can create a group called PowerUsers to put Bob and Carla in, and assign another default policy to that group, PowerUserAccess, which looks like this:

Here you can see that we’re using a NotAction match in the Statement list. We’re saying that users with this policy are allowed to access all actions that don’t match the iam:* wildcard. When we give this policy to Bob and Carla, they’re no longer able to manipulate users with IAM, either in the console, on the CLI or via API calls.

This, though, presents a problem. Now Bob and Carla can’t make changes to their own users either! They won’t be able to change their passwords, for a start, which isn’t great news.

So, we want to allow PowerUsers to perform certain IAM activities, but only on their own users – we shouldn’t let Bob change Carla’s password. IAM provides us with a way to do that. See, for example, this ManageOwnCredentials policy:

The important part of this policy is the ${aws:username} variable expansion. This is expanded when the policy is evaluated, so when Bob is making calls against the IAM service, that variable is expanded to bob.

There’s a great set of example policies for administering IAM resources in the IAM docs, and these cover a number of other useful scenarios.

Multi-Factor Authentication

You can increase the level of security in your IAM accounts by requiring users to make use of a multi-factor authentication token.

Your password is something that you know. An MFA token adds a possession factor: it’s something that you have. You’re then only granted access to the system when both of these factors are present.

If someone finds out your password, but they don’t have access to your MFA token, they still won’t be able to get into the system.

There are instructions on how to set up MFA tokens in the IAM documentaion. For most types of user, a “virtual token” such as the Google Authenticator app is sufficient.

Once this is set up, we can prevent non-MFA users from accessing certain policies by adding this condition to IAM policy statements:

As an aside, several other services permit the use of MFA tokens (they may refer to it as 2FA) – enabling MFA where available is a good practise to get into. I use it with my Google accounts, with Github, Slack, and Dropbox.

Instance Profiles

If your app needs to write to an S3 bucket, or use DynamoDB, or otherwise make AWS API calls, you may have AWS access credentials hard-coded in your application config. There is a better way!

In the Roles section of the IAM console, you can create a new AWS Service Role, and choose the “Amazon EC2” type. On creation of that role, you can attach policy documents to it, and define what that role is allowed to do.

As a real life example, we host application artefacts as package repositories in an S3 bucket. We want our EC2 instances to be able to install these packages, and so we create a policy which allows our instances read-only access to our S3 bucket.

When we create new EC2 instances, we can attach our new role to it. Code running on the instance can then request temporary tokens associated with the new server role.

These tokens are served by the Instance Metadata Service. They can be used to call actions on AWS resources as dictated by the policies attached to the role.

diagram

The diagram shows the flow of requests. At step 1, the application connects to the instance metadata service with a request to assume a role. In step 2, the metadata service returns a temporary access token back to the application. In step 3, the application connects to S3 using that token.

The official AWS SDKs are all capable of obtaining credentials from the Metadata Service without you needing to worry about it. Refer to the documentation for details.

The benefit of this approach is that if your application is compromised and your AWS tokens leak out, these can only be used for a short amount of time before they’ll expire, reducing the amount of damage that can be caused in this scenario. With hard-coded credentials you’d have to rotate these yourself.

Cross-Account Role Assumption

One other use of roles is really useful if you use multiple AWS accounts. It’s considered best practice to use separate AWS accounts for different environments (eg. live and test). In our consultancy work, we work with a number of customers, who each have four or more accounts, so this is invaluable to us.

In our main account (in this example, account ID 00001), we create a group for our users who are allowed to access customer accounts. We create a policy for that group, AssumeRoleCustomer, that looks like this:

In this example, our customer’s account is ID 00005, and they have a role in that account called ScaleFactoryUser. ThisAssumeRoleCustomer policy permits our users to call sts:AssumeRole to take on the ScaleFactoryUser role in the customer’s account.

sts:AssumeRole is an API call which will return a temporary token for the role specified in the resource, which we can then use in order to behave as that role.

Of course, the other account (00005) also needs a policy to allow this, and so we set up a Trust Relationship Policy:

This policy allows any entity in the 00001 account to call sts:AssumeRole in our account, as long as it is using an MFA token (remember we saw that conditional in the earlier example?).

Having set that up, we can now log into our main account in the web console, click our username in the top right of the console and choose “Switch Role”. By filling in the account number of the other account (0005), and the name of the role we want to assume (ScaleFactoryUser) the web console calls sts:AssumeRole in the background, and uses that to start accessing the customer account.

Role assumption doesn’t have to be cross-account, by the way. You can also allow users to assume roles in the same account – and this can be used to allow unprivileged users occasional access to superuser privileges, in the same way you might use sudo on a unix system.

Federated Identity

When we’re talking about identity, It’s important to know the difference between the two “auth”s: authentication and authorization.

Authentication is used to establish who you are. So, when we use a username and password (and optionally an MFA token) to connect to the web console, that’s authentication at work.

This is distinct from Authorization which is used to establish what you can do. IAM policies control this.

In IAM, these two concepts are separate. It is possible to configure an Identity Provider (IdP) which is external to IAM, and use that for authentication. Users authenticated against the external IdP can then be assigned IAM roles which control the authentication part of the story.

IdPs can be either SAML or using OpenID Connect. Google Apps (or are we calling it G-Suite now?) can be set up as a SAML provider, and I followed this blog post with some success. I can now jump straight from my Google account into my AWS console, taking on a role I’ve called GoogleSSO, without having to give any other credentials.

Wrapping Up

I hope I’ve given you a flavour of some of the things you can do with IAM. If you’re still logging in with the root account, if you’re not using MFA, or if you’re hard-coding credentials in your application config, you should now be armed with the information you need to level up your security practice.

In addition to that, you may benefit from using role assumption, cross-account access, or an external IdP. As a bonus hint, you should also look into CloudTrail logging, so that your Alice can keep an eye on what Bob and Carla are up to!

However you’re spending the rest of this year, I wish you all the best.

About the Author

Jon Topper has been building Linux infrastructure for fifteen years. His UK-based consultancy, The Scale Factory, are a team of DevOps and infrastructure specialists, helping organizations of various sizes design, build, operate and scale their platforms.

About the Editors

Bill is a senior manager at Puppet in the SRE group.  Before his move to Portland to join Puppet he spent six years in Chicago working for Backstop Solutions Group, prior to which he was in New Mexico working for the Department of Energy.  He still loves him some hardware, but is accepting that AWS is pretty rad for some some things.

Alfredo Cambera is a Venezuelan outdoorsman, passionate about DevOps, AWS, automation, Data Visualization, Python and open source technologies. He works as Senior Operations Engineer for a company that offers Mobile Engagement Solutions around the globe.


Just add Code: Fun with Terraform Modules and AWS

06. December 2016 2016 0

Author: Chris Marchesi

Editors: Andrew Langhorn, Anthony Elizondo

This article is going to show you how you can use Terraform, with a little help from Packer and Chef, to deploy a fully-functional sample web application, complete with auto-scaling and load balancing, in under 50 lines of Terraform code.

You will need the sample project to follow along, so make sure you load that up before continuing with reading this article.

The Humble Configuration

Check out the code in the terraform/main.tf file.

It might be hard to think that with this mere smattering of Terraform is setting up:

  • An AWS VPC
  • 2 subnets, each in different availability zones, fully routed
  • An AWS Application Load Balancer
  • A listener for the ALB
  • An AWS Auto Scaling group
  • An ALB target group attached to the ALB
  • Configured security groups for both the ALB and backend instances

So what’s the secret?

Terraform Modules

This example is using a powerful feature of Terraform – the modules feature, providing a semantic and repeatable way to manage AWS infrastructure. The modules hide most of the complexity of setting up a full VPC behind a relatively small set of code, and an even smaller set of changes going forward (generally, to update this application, all that is needed is to update the AMI).

Note that this example is composed entirely of modules – no root module resources exist. That’s not to say that they can’t exist – and in fact one of the secondary examples demonstrates how you can use the outputs of one of the modules to add extra resources on an as-needed basis.

The example is composed of three visible modules, and one module that operates under the hood as a dependency:

  • terraform_aws_vpc, which sets up the VPC and subnets
  • terraform_aws_alb, which sets up the ALB and listener
  • terraform_aws_asg, which configures the Auto Scaling group, and ALB target group for the launched instances
  • terraform_aws_security_group, which is used by the ALB and Auto Scaling modules to set up security groups to restrict traffic flow.

These modules will be explained in detail later in the article.

How Terraform Modules Work

Terraform modules work very similar to basic Terraform configuration. In fact, each Terraform module is a standalone configuration in its own right, and depending on its pre-requisites, can run completely on its own. In fact, a top-level Terraform configuration without any modules being used is still a module – the root module. You sometimes see this mentioned in various parts of the Terraform workflow, such as in things like error messages, and the state file.

Module Sources and Versioning

Terraform supports a wide variety of remote sources for modules, such as simple, generic locations like HTTP, or Git, or well-known locations like GitHub, Bitbucket, or Amazon S3.

You don’t even need to put a module in a remote location. In fact, a good habit to get into is if you need to re-use Terraform code in a local project, put that code in a module – that way you can re-use it several times to create the same kind of resources in either the same, or even better, different, environments.

Declaring a module is simple. Let’s look at the VPC module from the example:

The location of the module is specified with the source parameter. The style of the parameter will dictate what kind of behaviour TF will undertake to get the module.

The rest of the options here are module parameters, which translate to variables within the module. Note that any variable that does not have a default value in the module is a required parameter, and Terraform will not start if these are not supplied.

The last item that should be mentioned is regarding versioning. Most module sources that work off of source control have a versioning parameter you can supply to get a revision or tag – with Git and GitHub sources, this is ref, which can translate to most Git references, be it a branch, or tag.

Versioning is a great way to keep things under control. You might find yourself iterating very fast on certain modules as you learn more about Terraform or your internal infrastructure design patterns change – versioning your modules ensures that you don’t need to constantly refactor otherwise stable stacks.

Module Tips and Tricks

Terraform and HCL is a work in progress, and there may be some things that seem like they may make sense that don’t necessarily work 100% – yet. There are some things that you might want to keep in mind when you are designing your modules that may reduce the complexity that ultimately gets presented to the user:

Use Data Sources

Terraform 0.7+’s data sources feature can go a long way in reducing the amount of data needs to go in to your module.

In this project, data sources are used for things such as obtaining VPC IDs from subnets (aws_subnet) and getting the security groups assigned to an ALB (using the aws_alb_listener and aws_alb data sources chained together). This allows us to create ALBs based off of subnet ID alone, and attach auto-scaling groups to ALBs with knowing only the listener ARN that we need to attach to.

Exploit Zero Values and Defaults

Terraform follows the rules of the language it was created in regarding zero values. Hence, most of the time, supplying an empty parameter is the same as supplying none at all.

This can be advantageous when designing a module to support different kinds of scenarios. For example, the alb module supports TLS via supplying a certificate ARN. Here is the variable declaration:

And here it is referenced in the listener block:

Now, when this module parameter is not supplied, its default value becomes an empty string, which is passed in to aws_alb_listener.alb_listener. This is, most times, exactly the same as if the parameter is not passed in at all. This allows you to not have to worry about this parameter when you just want to use HTTP on this endpoint (the default for the ALB module as a whole).

Pseudo-Conditional Logic

Terraform does not support conditional logic yet, but through creative use of count and interpolation, one can create semi-conditional logic in your resources.

Consider the fact that the terraform_aws_autoscaling module supports the ability to attach the ASG to an ALB, but does not explicit require it. How can you get away with that, though?

To get the answer, check one of the ALB resources in the module:

Here, we make use of the map interpolation function, nested in a lookup function to provide essentially an if/then/else control structure. This is used to control a resource’s instance count, adding an instance if var.enable_albis true, and completely removing the resource from the graph otherwise.

This conditional logic does not necessarily need to be limited to count either. Let’s go back to the aws_alb_listener.alb_listener resource in the ALB module, looking at a different parameter:

Here, we are using this trick to supply the correct SSL policy to the listener if the listener protocol is not HTTP. If it is, we supply the zero value, which as mentioned before, makes it as if the value was never supplied.

Module Limitations

Terraform does have some not-necessarily-obvious limitations that you will want to keep in mind when designing both modules and Terraform code in general. Here are a couple:

Count Cannot be Computed

This is a big one that can really get you when you are writing modules. Consider the following scenario that totally did not happen to me even though I knew of of such things beforehand 😉

  • An ALB listener is created with aws_alb_listener
  • The arn of this resource is passed as an output
  • That output is used as both the ARN to attach an auto-scaling group to, and the pseudo-conditional in the ALB-related resources’ count parameter

What happens? You get this lovely message:

value of 'count' cannot be computed

Actually, it used to be worse (a strconv error was displayed instead), but luckily that changed recently.

Unfortunately, there is no nice way to work around this right now. Extra parameters need to be supplied or you need to structure your modules in way that avoids computed values being passed into count directives in your workflow. (This is pretty much exactly why the terraform_aws_asg module has a enable_alb parameter).

Complex Structures and Zero Values

Complex structures are not necessarily good candidates for zero values, even though it may seem like a good idea. But by defining a complex structure in a resource, you are by nature supplying it a non-zero value, even if most of the fields you supply are empty.

Most resources don’t handle this scenario gracefully, so it’s best to avoid using complex structures in a scenario where you may be designing a module for re-use, and expect that you won’t be using the functionality defined by such a structure often.

The Application in Brief

As our focus in this article is on Terraform modules, and not on other parts of the pattern such as using Packer or Chef to build an AMI, we will only touch up briefly on the non-Terraform parts of this project, so that we can focus on the Terraform code and the AWS resources that it is setting up.

The Gem

The Ruby gem in this project is a small “hello world” application running with Sinatra. This is self-contained within this project and mainly exists to give us an artifact to put on our base AMI to send to the auto-scaling group.

The server prints out the system’s hostname when fetched. This will allow us to see each node in action as we boot things up.

Packer

The built gem is loaded on to an AMI using Packer, for which the code is contained within packer/ami.json. We use chef-solo as a provisioner, which works off a self-contained cookbook named packer_payload in the cookbooks directory. This allows us a bit more of a higher-level workflow than we would have simply with shell scripts, including the ability to better integration test things and also possibly support multiple build targets.

Note that the Packer configuration takes advantage of a new Packer 0.12.0 feature that allows us to fetch an AMI to use as the base right from Packer. This is the source_ami_filter directive. Before Packer 0.12.0, you would have needed to resort to a helper, such as ubuntu_ami.sh, to get the AMI for you.

The Rakefile

The Rakefile is the build runner. It has tasks for Packer (ami), Terraform (infrastructure), and Test Kitchen (kitchen). It also has prerequisite tasks to stage cookbooks (berks_cookbooks), and Terraform modules (tf_modules). It’s necessary to pre-fetch modules when they are being used in Terraform – normally this is handled by terraform get, but the tf_modules task does this for you.

It also handles some parameterization of Terraform commands, which allows us to specify when we want to perform something else other than an apply in Terraform, or use a different configuration.

All of this is in addition to standard Bundler gem tasks like build, etc. Note that install and release tasks have been explicitly disabled so that you don’t install or release the gem by mistake.

The Terraform Modules

Now that we have that out of the way, we can talk about the fun stuff!

As mentioned at the start of the article, This project has 4 different Terraform modules. Also as mentioned, one of them (the Security Group module) is hidden from the end user, as it is consumed by two of the parent modules to create security groups to work with. This exploits the fact that Terraform can, of course, nest modules within each other, allowing for any level of re-usability when designing a module layout.

The AWS VPC Module

The first module, terraform_aws_vpc, creates not only a VPC, but also public subnets as well, complete with route tables and internet gateway attachments.

We’ve already hidden a decent amount of complexity just by doing this, but as an added bonus, redundancy is baked right into the module by distributing any network addresses passed in as subnets to the module across all availability zones available in any particular region via the aws_availability_zones data source. This process does not require previous knowledge of the zones available to the account.

The module passes out pertinent information, such as the VPC ID, the ID of the default network ACL, the created subnet IDs, the availability zones for those subnets as a map, and the ID of the route table created.

The ALB Module

The second module, terraform_aws_alb allows for the creation of AWS Application Load Balancers. If all you need is the defaults, use of this module is extremely simple, creating an ALB that will answer requests on port 80. A default target group is also created that can be used if you don’t have anything else mapped, but we want to use this with our auto-scaling group.

The Auto Scaling Module

The third module, terraform_aws_asg, is arguably the most complex of the three that we see in the sample configuration, but even at that, its required options are very slim.

The beauty of this module is that, thanks to all the aforementioned logic, you can attach more than one ASG to the same ALB with different path patterns (mentioned below), or not attach it to an ALB at all! This allows this same module to be used for a number of scenarios. This is on top of the plethora of options available to you to tune, such as CPU thresholds, health check details, and session stickiness.

Another thing to note is how the AMI for the launch configuration is being fetched from within this module. We work off the tag that we used within Packer, which is supplied as a module variable. This is then searched for within the module via an aws_ami data source. This means that no code or variables need to change when the AMI is updated – the next Terraform run will pick up the most recent AMI with the tag.

Lastly, this module supports the rolling update mechanism laid out by Paul Hinze in this post oh so long ago now. When a new AMI is detected and the auto-scaling group needs to be updated, Terraform will bring up the new ASG, attach it, wait for it to have minimum capacity, and then bring down the old one.

The Security Group Module

The last module to be mentioned, terraform_aws_security_group, is not shown anywhere in our example, but is actually used by the ALB and ASG modules to create Security Groups.

Not only does it create security groups though – it also allows for the creation of 2 kinds of ICMP allow rules. One for all ICMP, if you so choose, but more importantly, allow rules for ICMP type 3 (host unreachable) are always created, as this is how path MTU discovery works. Without this, we might end up with unnecessarily degraded performance.

Give it a Shot

After all this talk about the internals of the project and the Terraform code, you might be eager to bring this up and see it working. Let’s do that now.

Assuming you have the project cloned and AWS credentials set appropriately, do the following:

  • Run bundle install --binstubs --path vendor/bundle to load the project’s Ruby dependencies.
  • Run bundle exec rake ami. This builds the AMI.
  • Run bundle exec rake infrastructure. This will deploy the project.

After this is done, Terraform should return a alb_hostname value to you. You can now load this up in your browser. Load it once, then wait about 1 second, then load it again! Or even better, just run the following in a prompt:

while true; do curl http://ALBHOST/; sleep 1; done

And watch the hostname change between the two hosts.

Tearing it Down

Once you are done, you can destroy the project simply by passing a TF_CMD environment variable in to rake with the destroy command:

TF_CMD=destroy bundle exec rake infrastructure

And that’s it! Note that this does not delete the AMI artifact, you will need to do that yourself.

More Fun

Finally, a few items for the road. These are things that are otherwise important to note or should prove to be helpful in realizing how powerful Terraform modules can be.

Tags

You may have noticed the modules have a project_path parameter that is filled out in the example with the path to the project in GitHub. This is something that I think is important for proper AWS resource management.

Several of our resources have machine-generated names or IDs which make them hard to track on their own. Having a easy-to-reference tag alleviates that. Having the tag reference the project that consumes the resource is even better – I don’t think it gets much clearer than that.

SSL/TLS for the ALB

Try this: create a certificate using Certificate Manager, and change the alb module to the following:

Better yet, see the example here. This can be run with the following command:

And destroyed with:

You now have SSL for your ALB! Of course, you will need to point DNS to the ALB (either via external DNS, CNAME records, or Route 53 alias records – the example includes this), but it’s that easy to change the ALB into an SSL load balancer.

Adding a Second ASG

You can also use the ASG module to create two auto-scaling groups.

There is an example for the above here. Again, run it with:

And destroy it with:

You now have two auto-scaling groups, one handling requests for /foo/*, and one handling requests for /bar/*. Give it a go by reloading each URL and see the unique instances you get for each.

Acknowledgments

I would like to take a moment to thank PayByPhone for allowing me to use their existing Terraform modules as the basis for the publicly available ones at https://github.com/paybyphone. Writing this article would have been a lot more painful without them!

Also thanks to my editors, Anthony Elizondo and Andrew Langhorn for for their feedback and help with this article, and the AWS Advent Team for the chance to stand on their soapbox for my 15 minutes! 🙂

About the Author:

picture of author Chris MarchesiChris Marchesi (@vancluever) is a Systems Engineer working out of Vancouver, BC, Canada. He currently works for PayByPhone, designing tools and patterns to help its engineers and developers work with AWS. He is also a regular contributor to the Terraform project. You can view his work at https://github.com/vancluever, and also his previous articles at https://vancluevertech.com/.

About the Editors:

Andrew Langhorn is a senior consultant at ThoughtWorks. He works with clients large and small on all sorts of infrastructure, security and performance problems. Previously, he was up to no good helping build, manage and operate the infrastructure behind GOV.UK, the simpler, clearer and faster way to access UK Government services and information. He lives in Manchester, England, with his beloved gin collection, blogs at ajlanghorn.com, and is a firm believer that mince pies aren’t to be eaten before December 1st.

Anthony Elizondo is a SRE at Adobe. He enjoys making things, breaking things, and burritos. You can find him at http://twitter.com/complexsplit