Taming and controlling storage volumes

18. December 2016 2016 0

Author: James Andrews

Introduction

It’s easy to use scripts in AWS to generate a lot of EC2 virtual machines quickly. Along with the creation of EC2 instances, persistent block storage volumes via AWS EBS may be created and attached. These EC2 instances might be used briefly and then discarded but by default the EBS volumes are not destroyed.

Continuous Integration

Continuous Integration (CI) is now a standard technique in software development. When doing CI it’s usually a good idea to build at least some of the infrastructure used in what are called ‘pipelines’ from scratch. Fortunately, AWS is great at this and has a lot of methods that can be used.

When CI pipelines are mentioned, Docker is often in the same sentence although Docker and other containerized environments are not universally adopted. Businesses are still using software controlled infrastructure with separate virtual machines for each host. So I’ll be looking at this type of environment which uses EC2 as separate virtual machines.

A typical CI run will generate some infrastructure such as a EC2 virtual machine, load some programs onto it, run some tests and then tear the EC2 down afterwards. It’s during this teardown process that care needs to be taken on AWS to avoid leaving attached EBS volumes existing.

AWS allows the use of multiple environments – as many as you’d like to pay for. This allows different teams or projects to run their own CI as independent infrastructure. The benefit is that testing on multiple, different streams of development is possible. Unfortunately multiple environments can also increase the problem of remnant EBS volumes that are not correctly removed after use.

Building CI Infrastructure

I’ve adapted a couple of methods for making CI infrastructure.

1. Cloudformation + Troposphere

AWS CloudFormation is a great tool but one drawback is the use of JSON (or now yaml) configuration files to drive it.  These are somewhat human readable but are excessively long and tend to contain “unrolled” repeated phrases if several similar EC2 instances are being made.

To mitigate this problem I have started using Troposphere, which is a python module to generate CloudFormation templates.   I should point out that similar CloudFormation template making libraries are available in other languages, including Go and ruby.

It may seem like just more work to use a program to make what is essentially another program ( the template ) but by abstracting the in house rules for making an EC2 just how you need it, template development is quicker and the generator programs are easier to understand than the templates.

Here is an example troposphere program:

When the program is run it prints a json template to stdout.

Once the templates are made they can either be deployed by the console or AWS cli.

2. Boto and AWS cli

There is also a python module to directly call the AWS API.  This module is called boto.  By using this the EC2 can be made directly when running the boto program. Again, similar libraries are available in other languages.  With the troposphere generator, there is an intermediate step but boto programs directly make the AWS infrastructure items.

If you have used AWS cli, boto is very similar but allows all the inputs and outputs to be manipulated in python.  Typical boto programs are shorter than AWS cli shell programs and contain less API calls.  This is as it is much easier to get the required output data in python than in shell or the AWS cli.

Monitoring EBS volumes

If you are managing an environment where it is difficult to apply controls to scripts then there can be a buildup of “dead volumes” that are not attached to any live instance.  Some of these might be of historical interest but mostly they are just sitting there accruing charges.

Volumes that should be kept should be tagged appropriately so that they can be charged to the correct project or deleted when their time is past.  However, when scripts are written quickly to get the job done, often this is not the case.

This is exactly the situation that our team found ourself in following an often frantic migration of a legacy environment to AWS.  To audit the volumes, I wrote this boto script.  It finds all volumes that are not attached to an EC2 and which have no tags.  This script is running as a cronjob on a management node in our infrastructure.

3. AWS Config

A third option is to use AWS tags. This can be  an important technique in a complex AWS environment being managed by a team.  Tags attach metadata to almost any type of AWS object, including the EBS volumes we are looking at in this article.  The tags can be used to help apply the correct policy processes to volumes to prevent wasteful use of storage.

Establishing this sort of policy is (you might have guessed) something that Amazon have thought of already.  They provide a great tool unsurprisingly named the “AWS Configuration Management tool” for monitoring, tagging, and applying other types of policy rules.

The AWS Config management tool provides a framework for monitoring compliance to various rules you can set.  It also will run in response to system events (like the termination of an EC2) or on a timer like cron.

It has a built in rule for finding objects that have no tags (REQUIRED_TAGS, see http://docs.aws.amazon.com/config/latest/developerguide/evaluate-config_use-managed-rules.html )   This rule doesn’t quite do the same as my script but is a viable alternative.  If you need slightly different rules to the predefined ones then AWS Config has a system for adding custom rules using Lambda programs which could alternately be used.

Stopping the problem at source

If you are writing new scripts now and want to ensure that  setting delete on termination for volumes happens in your system automation scripts then the best thing to do is to add the options to make this happen in the script.  EBS volumes attached to EC2 instances have a setting for “delete on termination”

The default for volume creation with CloudFormation and the API are not set to delete on termination.  I was surprised to find that deleting a CloudFormation stack (which contained default settings) did not remove the EBS volumes it made.

A good way to overcome this is to set the delete on termination flag.

In AWS shell CLI, get the instanceID and  add this command after a “run-instances”.

In CloudFormation the Type”: “AWS::EC2::Instance” sections make volumes from the AMI automatically. To set the delete on termination add a block like this in the Properties section.

To do the same thing in troposphere make a ec2.Instance object and use the BlockDeviceMappings attribute.

This attribute could be added to the example troposphere program in the “doInstance” method at line 22 of the file.

Summary

The problem of uncontrolled EBS volume use can be brought under control by the use of tagging and using a script or AWS Config to scan for unused volumes.

Unattached, unused EBS volumes can be prevented in a software controlled infrastructure by setting the “DeleteOnTermination” flag – this does not happen by default in CloudFormation.

About the Author

James Andrews has been working for 25 years as a Systems Administrator. On the weekend he rides his bike around Wales.


Serverless everything: One-button serverless deployment pipeline for a serverless app

14. December 2016 2016 0

Author: Soenke Ruempler
Editors: Ryan S. Brown

Update: Since AWS recently released CodeBuild, things got much simpler. Please also read my follow-up post AWS CodeBuild: The missing link for deployment pipelines in AWS.

Infrastructure as Code is the new default: With tools like Ansible, Terraform, CloudFormation, and others it is getting more and more common. A multitude of services and tools can be orchestrated with code. The main advantages of automation are reproducibility, fewer human errors, and exact documentation of the steps involved.

With infrastructure expressed as code, it’s not a stretch to also want to codify deployment pipelines. Luckily, AWS has it’s own service for that named CodePipeline, which in turn can be fully codified and automated by CloudFormation (“Pipelines as Code”).

This article will show you how to create a deploy pipeline for a serverless app with a “one-button” CloudFormation template. The more concrete goals are:

  • Fully serverless: neither the pipeline nor the app itself involves server, VM or container setup/management (and yes, there are still servers, just not managed by us).
  • Demonstrate a fully automated deployment pipeline blueprint with AWS CodePipeline for a serverless app consisting of a sample backend powered by the Serverless framework and a sample frontend powered by “create-react-app”.
  • Provide a one-button quick start for creating deployment pipelines for serverless apps within minutes. Nothing should be run from a developer machine, not even an “inception script”.
  • Show that it is possible to lower complexity by leveraging AWS components so you don’t need to configure/click third party providers (e.g. TravisCi/CircleCi) as pipeline steps.

We will start with a repository consisting of a typical small web application with a front end and a back end. The deployment pipeline described in this article makes some assumptions about the project layout (see the sample project):

  • a frontend/ folder with a package.json which will produce a build into build/ when npm run build is called by the pipeline.
  • a backend/ folder with a serverless.yml. The pipeline will call the serverless deploy (the Serverless framework). It should have at least one http event so that the Serverless framework creates a service endpoint which can then be used in the frontend to call the APIs.

For a start, you can just clone or copy the sample project into your own GitHub account.

As soon as you have your project ready, we can continue to create a deployment pipeline with CloudFormation.

The actual CloudFormation template we will use here to create the deployment pipeline does not reside in the project repository. This allows us to develop/evolve the pipeline and the pipeline code and the projects using the pipeline independent from each other. It is published to an S3 bucket so we can build a one-click launch button. The launch button will direct users to the CloudFormation console with the URL to the template prefilled:

Launch Stack

After you click on the link (you need to be logged in into the AWS Console), and click “Next” to confirm that you want to use the predefined template, some CloudFormation stack parameters have to be specified:CloudFormation stack parameters

First you need to specify the GitHub Owner/Repository of the project (the one you copied earlier), a branch (usually master) and a GitHub Oauth Token as described in the CodePipeline documentation.

The other parameters specify where to find the Lambda function source code for the deployment steps, we can live with the defaults for now, stuff for another blog post. (Update: the Lambda functions became obsolete the move to AWS CodeBuild,, and so did the template parameters regarding Lambda source code location.)

The next step of the CloudFormation stack setup allows you to specify advanced settings like tags, notifications and so on. We can leave as-is as well.

On the last assistant page you need to acknowledge that CloudFormation will create IAM roles on your behalf:

CloudFormation IAM confirmation

The IAM roles are needed to give Lambda functions the right permissions to run and logs to CloudWatch. Once you pressed the “Create” button, CloudFormation will create the following AWS resources:

  • An S3 Bucket containing the website assets with website hosting enabled.
  • A deployment pipeline (AWS CodePipeline) consisting of the following steps:
    • Checks out the source code from GitHub and saves it as an artifact.
    • Back end deployment: A Lambda function build step which takes the source artifact, installs and calls the Serverless framework.
    • Front end deployment: Another Lambda function build step which takes the source artifact, runs npm build and deploys the build to the Website S3 bucket

(Update: in the meantime, I replaced the Lambda functions with AWS CodeBuild).

No servers harmed so far, and also no workstations: No error-prone installation steps in READMEs to be followed, no curl | sudo bash or other awkward setup instructions. Also no hardcoded AWS access key pairs anywhere!

A platform team in an organization could provide several of these types of templates for particular use cases, then development teams could get going just by clicking the link.

Ok, back to our example: Once the CloudFormation stack creation is fully finished, the created CodePipeline is going to run for the first time. On the AWS console:

CodePipeline running

As soon as the initial pipeline run is finished:

  • the back end CloudFormation stack has been created by the Serverless framework, depending on what you defined in the backend/serverless.yml configuration file.
  • the front end has been built and put into the website bucket.

To find out the URL of our website hosted in S3, open the resources of the CloudFormation stack and expand the outputs. The WebsiteUrl output will show the actual URL:

CloudFormation Stack output

Click on the URL link and view the website:

Deployed sample website

Voila! We are up and running!

As might have seen in the picture above, there is some JSON output: It’s actually the result of a HTTP call the front end made against the back end: the hello function, which just responds the Lambda event object.

Let’s dig a bit deeper into this detail as it shows the integration of frontend and backend: To pass the ServiceEndpoint URL to the front end build step, the back end build step is exporting all CloudFormation Outputs of the Serverless-created stack as a CodePipeline build artifact, which the front end build step takes in turn to pass it to npm build (in our case via a react-specific environment var). This is how the API call looks in react:

This Cross-site request actually works, because we specified CORS to be on in the serverless.yml:

Here is a high-level overview of the created CloudFormation stack:

Overview of the CloudFormation Stack

With the serverless pipeline and serverless project running, change something in your project, commit it and view the change propagated through the pipeline!

Additional thoughts:

I want to setup my own S3 bucket with my own CloudFormation templates/blueprints!

In case that you don’t trust me as a template provider, or you want to change the one-button CloudFormation template, you can of course host your own S3 bucket. The scope of doing that is beyond this article but you can start by looking at my CloudFormation template repo.

I want to have testing/staging in the pipeline!

The sample pipeline does not have any testing or staging steps. You can add more steps to the pipeline, e.g. another Lambda step, which calls e.g. npm test on your source code.

I need a database/cache/whatever for my app!

No problem, just add additional resources to the serverless.yml configuration file.

Summary

In this blog post I demonstrated a CloudFormation template which bootstraps a serverless deployment pipeline with AWS CodePipeline. This enables rapid application development and deployment, as development teams can use the template in a “one-button” fashion for their projects.

We have deployed a sample project with a deployment pipeline with a front and back end.

AWS gives us all the lego bricks we need to create such pipelines in an automated, codified and (almost) maintenance-free way.

Known issues / caveats

  • I am describing an experiment / working prototype here. Don’t expect high quality, battle tested code (esp. the JavaScript parts 🙂 ). It’s more about the architectural concept. Issues and Pull requests to the mentioned projects are welcome 🙂 (Update: luckily I could delete all the JS code with  the move to AWS CodeBuild)
  • All deployment steps currently run with full access roles (AdministratorAccess) which is a bad practice but it was not the focus of this article.
  • The website could also be backed by a CloudFront CDN with HTTPS and a custom domain.
  • Beware of the 5 minute execution limit in Lambda functions (e.g. more complex serverless.yml setups might take longer, this could be worked around by sourcing it resource creation a CloudFormation pipeline step, Michael Wittig has blogged about that). (Update: this point became invalid with the move to AWS CodeBuild)
  • The build steps are currently not optimized, e.g. installing npm/serverless every time is not necessary. It could use an artifact from an earlier CodePipeline step (Update: this point became invalid with the move to AWS CodeBuild)
  • The CloudFormation stack created by the Serverless framework is currently suffixed with “dev”, because that’s their default environment. The prefix should be omitted or made configurable.

Acknowledgements

Special thanks goes to the folks at Stelligent.

First for their open source work on serverless deploy pipelines with Lambda, especially the “dromedary-serverless” project. I adapted much from the Lambda code.

Second for their “one-button” concept which influenced this article a lot.

About the Author

Along with 18 years of web software development and web operations experience, Soenke Ruempler is an expert in AWS technologies (6 years in experience of development and operating), and in moving on-premise/legacy systems to the Cloud without service interruptions.

His special interests and fields of knowledge are Cloud/AWS, Serverless architectures, Systems Thinking, Toyota Kata (Kaizen), Lean Software Development and Operations, High Performance/Reliability Organizations, Chaos Engineering.

You can find him on Twitter, Github and occasionally blogging on ruempler.eu.

About the Editors

Ryan Brown is a Sr. Software Engineer at Ansible (by Red Hat) and contributor to the Serverless Framework. He’s all about using the best tool for the job, and finds simplicity and automation are a winning combo for running in AWS.


Modular cfn-init Configsets with SparkleFormation

13. December 2016 2016 0

Author: Michael F. Weinberg
Editors: Andreas Heumaier

This post lays out a modular, programmatic pattern for using CloudFormation Configsets in SparkleFormation codebases. This technique may be beneficial to:

  • Current SparkleFormation users looking to streamline EC2 instance provisioning
  • Current CloudFormation users looking to manage code instead of JSON/YAML files
  • Other AWS users needing an Infrastructure as Code solution for EC2 instance provisioning

Configsets are a CloudFormation specific EC2 feature that allow you to configure a set of instructions for cfn-init to run upon instance creation. Configsets group collections of specialized resources, providing a simple solution for basic system setup and configuration. An instance can use one or many Configsets, which are executed in a predictable order.

Because cfn-init is triggered on the instance itself, it is an excellent solution for Autoscaling Group instance provisioning, a scenario where external provisioners cannot easily discover underlying instances, or respond to scaling events.

SparkleFormation is a powerful Ruby library for composing CloudFormation templates, as well as orchestration templates for other cloud providers.

The Pattern

Many CloudFormation examples include a set of cfn-init instructions in the instance Metadata using the config key. This is an effective way to configure instances for a single template, but in an infrastructure codebase, doing this for each service template is repetitious and introduces the potential for divergent approaches to the same problem in different templates. If no config key is provided, cfn-init will automatically attempt to run a default Configset. Configsets in CloudFormation templates are represented as an array. This pattern leverages Ruby’s concat method to construct adefault Configset in SparkleFormation’s compilation step. This allows us to use Configsets to manage the instance Metadata in a modular fashion.

To start any Instance or Launch Config resources should include an empty array as the default Configset in their metadata, like so:

Additionally, the Instance or Launch Config UserData should run the cfn-init command. A best practice is to place this in a SparkleFormation registry entry. A barebones example:

With the above code, cfn-init will run the empty default Configset. Using modular registry entries, we can expand this Configset to meet our needs. Each registry file should add the defined configuration to the default Configset, like this:

A registry entry can also include more than one config block:

Calling these registry entries in the template will add them to the default Configset in the order they are called:

Note that other approaches to extending the array will also work:

sets.default += [ 'key_to_add' ], sets.default.push('key_to_add'), sets.default << 'key_to_add', etc.

Use Cases

Extending the default Configset rather than setting the config key directly makes it easy to build out cfn-initinstructions in a flexible, modular fashion. Modular Configsets, in turn, create opportunities for better Infrastructure as Code workflows. Some examples:

Development Instances

This cfn-init pattern is not a substitute for full-fledged configuration management solutions (Chef, Puppet, Ansible, Salt, etc.), but for experimental or development instances cfn-init can provide just enough configuration management without the increased overhead or complexity of a full CM tool.

I use the Chef users cookbook to manage users across my AWS infrastructure. Consequently, I very rarely make use of AWS EC2 keypairs, but I do need a solution to access an instance without Chef. My preferred solution is to use cfn-init to fetch my public keys from Github and add them to the default ubuntu (or ec2-user) user. The registry for this:

In the template, I just set a github_user parameter and include the registry, and I get access to an instance in any region without needing to do any key setup or configuration management.

This could also be paired with a configuration management registry entry and the Github user setup can be limited to development:

Compiling this with the environment variable development=true will include the Github Configset, in any other case it will run the full configuration management.

In addition to being a handy shortcut, this approach is useful for on-boarding other users/teams to an Infrastructure codebase and workflow. Even with no additional automation in place, it encourages system provisioning using a code-based workflow, and provides a groundwork to layer additional automation on top of.

Incremental Automation Adoption

Extending the development example, a modular Configset pattern is helpful for incrementally introducing automation. Attempting to introduce automation and configuration management to an infrastructure that is actively being architected can be very frustrating—each new component require not just understanding the component and its initial configuration, but also determining how best to automate and abstract that into code. This can lead to expedient, compromise implementations that add to technical debt, as they aren’t flexible enough to support emergent needs.

An incremental approach can mitigate these issues, while maintaining a focus on code and automation. Well understood components are fully automated, while some emergent features are initially implemented with a mixture of automation and manual experimentation. For example, an engineer approaching a new service might perform some baseline user setup and package installation via an infrastructure codebase, but configure the service manually while determining the ideal configuration. Once that configuration matures, the automation resources necessary to achieve it are included in the codebase.

CloudFormation Configsets are effective options for package installation and are also good for fetching private assets from S3 buckets. An engineer might use a Configset to setup her user on a development instance, along with the baseline package dependencies and a tarball of private assets. By working with the infrastructure codebase from the outset, she has the advantage of knowing that any related AWS components are provisioned and configured as they would be in a production environment, so she can iterate directly on service configuration. As the service matures, the Configset instructions that handled user and package installation may be replaced by more sophisticated configuration management tooling, but this is a simple one-line change in the template.

Organization Wide Defaults

In organizations where multiple engineers or teams contribute discrete application components in the same infrastructure, adopting standard approaches across the organization is very helpful. Standardization often hinges on common libraries that are easy to include across a variety of contexts. The default Configset pattern makes it easy to share registry entries across an organization, whether in a shared repository or internally published gems. Once an organizational pattern is codified in a registry entry, including it is a single line in the template.

This is especially useful in organizations where certain infrastructure-wide responsibilities are owned by a subset of engineers (e.g. Security or SRE teams). These groups can publish a gem (SparklePack) containing a universal configuration covering their concerns that the wider group of engineers can include by default, essentially offering these in an Infrastructure as a Service model. Monitoring, Security, and Service Discovery are all good examples of the type of universal concerns that can be solved this way.

Conclusion

cfn-init Configsets can be a powerful tool for Infrastructure as Code workflows, especially when used in a modular, programmatic approach. The default Configset pattern in SparkleFormation provides an easy to implement, consistent approach to managing Configsets across an organization–either with a single codebase or vendored in as gems/SparklePacks. Teams looking to increase the flexibility of their AWS instance provisioning should consider this pattern, and a progammatic tool such as SparkleFormation.

For working examples, please checkout this repo.

About the Author

Michael F. Weinberg is an Infrastructure & Automation specialist, with a strong interest in cocktails and jukeboxes. He currently works at Hired as a Systems Engineer. His open source projects live at http://github.com/reverseskate.


Deploy your AWS Infrastructure Continuously

01. December 2016 2016 0

Author: Michael Wittig

Continuously integrating and deploying your source code is the new standard in many successful internet companies. But what about your infrastructure? Can you deploy a change to your infrastructure in an automated way? Can you run automated tests on your infrastructure to ensure that a change has no unintended side effects? In this post I will show you how you can apply the same processes to your AWS infrastructure that you apply to your source code. You will learn how the AWS services CloudFormation, CodePipeline and Lambda can be combined to continuously deploy infrastructure.

Precondition

You may think: “Source code is text files, but my infrastructure is different. I don’t have a source file for my infrastructure.” Infrastructure as Code as defined by Martin Fowler is a concept that is helping bring software development practices to infrastructure practices.

Infrastructure as code is the approach to defining computing and network infrastructure through source code that can then be treated just like any software system.
– Martin Fowler

AWS CloudFormation is one implementation of Infrastructure as Code. CloudFormation is a high quality and free service offered by AWS. To understand CloudFormation you need to know about templates and stacks. The template is the source code, a textual representation of your infrastructure. The stack is the actual running infrastructure described by the template. So a CloudFormation template is exactly what we need, a plain text file. The CloudFormation service interprets the template and turns it into a running infrastructure.

Now, our infrastructure is defined by a text file which is exactly what we need to apply the same processes to it that we have for source code.

The Pipeline

The pipeline to build and deploy is a sequence of steps that are necessary to ship changes to your users. Starting with a change in the code repository and ending in your production environment. The following figure shows a Pipeline that runs inside AWS CodePipeline, the AWS CD service.

AWS CodePipeline - Deploying infrastructure continuously

Whenever a git push is made to a repository hosted on GitHub the pipeline starts to run by fetching the current version of the repository. After that, the pipeline creates or updates itself because the pipeline definition itself is also treated as source code. After that, the up-to-date pipeline creates or updates the test environment. After this step, infrastructure in the test environment looks exactly as it was defined in the template. This is also a good place to deploy the application to the test environment. I’m using Elastic Beanstalk to host the demo application. Now it’s time to check if the infrastructure is still in a good shape. We want to make sure that everything runs as it is defined in the tests. The tests may check if a certain port is reachable, if a certain user can login via SSH, if a certain port is NOT reachable, and so on, and so forth. If the tests are successful, the production environment is adapted to the new template and the new application version is deployed.

Implementation

From Source to Deploy PipelineCodePipeline has native support for GitHub, CloudFormation, Elastic Beanstalk, and Lambda. So I can use all the services and tie them together using CodePipeline. You can find the full source code and detailed setup instructions in this GitHub repository: michaelwittig/automation-for-the-people

 

The following template snippet shows an excerpt of the full pipeline description. Here you see how the pipeline can be configured to checkout the GitHub repository and create/update itself:

 

Summary

Infrastructure as Code enables you to apply the same CI & CD processes to infrastructure that you already know from software development. On AWS, you can use CloudFormation to turn a text representation of your infrastructure into a running environment stack. CodePipeline can be used to orchestrate the deployment process and you can implement custom logic, such as infrastructure tests, in a programming language that you can run on AWS Lambda. Finally you can treat your infrastructure as code and deploy each commit with confidence into production.

About the Author

Michael WittigMichael Wittig is author of Amazon Web Services in Action (Manning) and writes frequently about AWS on cloudonaut.io. He helps his clients to gain value from Amazon Web Services. As a software engineer he develops cloud-native real-time web and mobile applications. He migrated the complete IT infrastructure of the first bank in Germany to AWS. He has expertise in distributed system development and architecture, with experience in algorithmic trading and real-time analytics.