Exploring Concurrency in Python & AWS

04. December 2016 2016 0

Exploring Concurrency in Python & AWS

From Threads to Lambdas (and lambdas with threads)

Author: Mohit Chawla

Editors: Jesse Davis, Neil Millard

The scope of the current article is to demonstrate multiple approaches to solve a seemingly simple problem of intra-S3 file transfers – using pure Python and a hybrid approach of Python and cloud based constructs, specifically AWS Lambda, with a comparison of the two concurrency approaches.

Problem Background

The problem was to transfer 250 objects daily, each of size 600-800 MB, from one S3 bucket to another. In addition, an initial bulk backup of 1500 objects (6 months of data) had to be taken, totaling 1 TB.

Attempt 1

The easiest way to do this appears to loop over all the objects and transfer them one by one:

This had a runtime of 1 hour 45 minutes. Oops.

Attempt 2

Lets use some threads !

Python offers multiple concurrency methods:

  • asyncio, based on event loops and asynchronous I/O.
  • concurrent.futures, which provides high level abstractions like ThreadPoolExecutor and ProcessPoolExecutor.
  • threading, which provides low level abstractions to build your own solution using threads, semaphores and locks.
  • multiprocessing, which is similar to threading, but for processes.

I used the concurrent.futures module, specifically the ThreadPoolExecutor, which seems to be a good fit for I/O tasks.

Note about the GIL:

Python implements a GIL (Global Interpreter Lock) which limits only a single thread to run at a time, inside a single Python interpreter. This is not a limitation for an I/O intensive task, such as the one being discussed in this article. For more details about how it works, see http://www.dabeaz.com/GIL/.

Here’s the code when using the ThreadPoolExecutor:

This code took 1 minute 40 seconds to execute, woo !

Concurrency with Lambda

I was happy with this implementation, until, at an AWS meetup, there was a discussion about using AWS Lambda and SNS for the same thing, and I thought of trying that out.

AWS Lambda is a compute service that lets you run code without provisioning or managing servers. It can be combined with AWS SNS, which is a message push notification service which can deliver and fan-out messages to several services, including E-Mail, HTTP and Lambda, which as allows the decoupling of components.

To use Lambda and SNS for this problem, a simple pipeline was devised: One Lambda function publishes object names as messages to SNS and another Lambda function is subscribed to SNS for copying the objects.

The following piece of code publishes names of objects to copy to an SNS topic. Note the use of threads to make this faster.

Yep, that’s all the code.

Now, you maybe asking yourself, how is the copy operation actually concurrent ?
The unit of concurrency in AWS Lambda is actually the function invocation. For each published message, the Lambda function is invoked, which means for multiple messages published in parallel, an equivalent number of invocations will be made for the Lambda function. According to AWS, that number for stream based sources is given by:

By default, this is limited to 100 concurrent executions, but can be raised on request.

The execution time for the above code was 2 minutes 40 seconds. This is higher than the pure Python approach, partly because the invocations were throttled by AWS.

I hope you enjoyed reading this article, and if you are an AWS or Python user, hopefully this example will be useful for your own projects.

Note – I gave this as a talk at PyUnconf ’16 in Hamburg, you can see the slides at https://speakerdeck.com/alcy/exploring-concurrency-in-python-and-aws.

About the Author:

Mohit Chawla is a systems engineer, living in Hamburg. He has contributed to open source projects over the last seven years, and has a few projects of his own. Apart from systems engineering, he has a strong interest in data visualization.


server-free pubsub ( and nearly code-free )

02. December 2016 2016 0

Author: Ed Anderson

Editors: Evan Mouzakitis, Brian O’Rourke

Intro

This article will introduce you to creating serverless PubSub microservices by building a simple Slack based word counting service.

Lambda Overview

These PubSub microservices are AWS Lambda based. Lambda is a service that does not require you to manage servers in order to run code. The high level overview is that you define events ( called triggers ) that will cause a packaging of your code ( called a function ) to be invoked. Inside your package ( aka function ), a specific function within a file ( called a handler ) will be called.

If you’re feeling a bit confused by overloaded terminology, you are not alone. For now, here’s the short list:

Lambda term  Common Name Description
Trigger AWS Service Component that invokes Lambda
Function software package Group of files needed to run code (includes libraries)
Handler file.function in your package The filename/function name to execute

 

There are many different types of triggers ( S3, API Gateway, Kinesis streams, and more). See this page for a complete list. Lambdas run in the context of a specific IAM Role. This means that, in addition to features provided by your language of choice ( python, nodejs, java, scala ), you can call from your Lambda to other AWS Services ( like DynamoDB ).

Intro to the PubSub Microservices

These microservices, once built, will count words typed into Slack. The services are:

  1. The first service splits up the user-input into individual words and:
    • increments the counter for each word
    • supplies a response to the user showing the current count of any seen words
    • triggers functions 2 and 3 which execute concurrently
  2. The second service also splits up the user-input into individual words and:
    • adds a count of 10 to each of those words
  3. The third service logs the input it receives.

While you might not have a specific need for a word counter, the concepts demonstrated here can be applied elsewhere. For example, you may have a project where you need to run several things in series, or perhaps you have a single event that needs to trigger concurrent workflows.

For example:

  • Concurrent workflows triggered by a single event:
    • New user joins org, and needs accounts created in several systems
    • Website user is interested in a specific topic, and you want to curate additional content to present to the user
    • There is a software outage, and you need to update several systems ( statuspage, nagios, etc ) at the same time
    • Website clicks need to be tracked in a system used by Operations, and a different system used by Analytics
  • Serial workflows triggered by a single event:
    • New user needs a Google account created, then that Google account needs to be given permission to access another system integrated with Google auth.
    • A new version of software needs to be packaged, then deployed, then activated –
    • Cup is inserted to a coffee machine, then the coffee machine dispenses coffee into the cup

 

  • The API Gateway ( trigger ) will call a Lambda Function that will split whatever text it is given into specific words
    • Upsert a key in a DynamoDB table with the number 1
    • Drop a message on a SNS Topic
  • The SNS Topic ( trigger ) will have two lambda functions attached to it that will
    • Upsert the same keys in the dynamodb with the number 10
    • Log a message to CloudWatchLogs
Visualization of Different Microservices comprising the Slack Based Word counter
Visualization of the Microservices

 

Example code for AWS Advent near-code-free PubSub. Technologies used:

  • Slack ( outgoing webhooks )
  • API Gateway
  • IAM
  • SNS
  • Lambda
  • DynamoDB

Pub/Sub is teh.best.evar* ( *for some values of best )

I came into the world of computing by way of The Operations Path. The Publish-Subscribe Pattern has always been near and dear to my ❤️.

There are a few things about PubSub that I really appreciate as an “infrastructure person”.

  1. Scalability. In terms of the transport layer ( usually a message bus of some kind ), the ability to scale is separate from the publishers and the consumers. In this wonderful thing which is AWS, we as infrastructure admins can get out of this aspect of the business of running PubSub entirely.
  2. Loose Coupling. In the happy path, publishers don’t know anything about what subscribers are doing with the messages they publish. There’s admittedly a little hand-waving here, and folks new to PubSub ( and sometimes those that are experienced ) get rude surprises as messages mutate over time.
  3. Asynchronous. This is not necessarily inherent in the PubSub pattern, but it’s the most common implementation that I’ve seen. There’s quite a lot of pressure that can be absent from Dev Teams, Operations Teams, or DevOps Teams when there is no expectation from the business that systems will retain single millisecond response times.
  4. New Cloud Ways. Once upon a time, we needed to queue messages in PubSub systems ( and you might you might still have a need for that feature ), but with Lambda, we can also invoke consumers on demand as messages pass through our system. We don’t necessarily hace to keep things in the queue at all. Message appears, processing code runs, everybody’s happy.

Yo dawg, I heard you like ️☁️

One of the biggest benefits that we can enjoy from being hosted with AWS is not having to manage stuff. Running your own message bus might be something that separates your business from your competition, but it might also be undifferentiated heavy lifting.

IMO, if AWS can and will handle scaling issues for you ( to say nothing of only paying for the transactions that you use ), then it might be the right choice for you to let them take care of that for you.

I would also like to point out that running these things without servers isn’t quite the same thing as running them in a traditional setup. I ended up redoing this implementation a few times as I kept finding the rough edges of running things serverless. All were ultimately addressable, but I wanted to keep the complexity of this down somewhat.

WELCOME TO THE FUTURE, FRIENDS

TL;DR GIMMIE SOME EXAMPLES

CloudFormation is pretty well covered by AWS Advent, we’ll configure this little diddy via the AWS console.

TO THE BATCODE CAVE!

Setup the first lambda, which will be linked to an outgoing webhook in slack

Setup the DynamoDB

👇 You can follow the steps below, or view this video 👉 Video to DynamoDB Create

  1. Console
  2. DynamoDB
  3. Create Table
    1. Table Name table
    2. Primary Key word
    3. Create

Setup the First Lambda

This Lambda accepts the input from a Slack outgoing webhook, splits the input into separate words, and adds a count of one to each word. It further returns a json response body to the outgoing webhook that displays a message in slack.

If the Lambda is triggered with the input awsadvent some words, this Lambda will create the following three keys in dynamodb, and give each the value of one.

  • awsadvent = 1
  • some = 1
  • words = 1

👇 You can follow the steps below, or view this video 👉 Video to Create the first Lambda

  1. Make the first Lambda, which accepts slack outgoing webook input, and saves that in DynamoDB
    1. Console
    2. Lambda
    3. Get Started Now
    4. Select Blueprint
      1. Blank Function
    5. Configure Triggers
      1. Click in the empty box
      2. Choose API Gateway
    6. API Name
      1. aws_advent ( This will be the /PATH of your API Call )
    7. Security
      1. Open
    8. Name
      1. aws_advent
    9. Runtime
      1. Python 2.7
    10. Code Entry Type
      1. Inline
      2. It’s included as app.py in this repo. There are more Lambda Packaging Examples here
    11. Environment Variables
      1. DYNAMO_TABLE = table
    12. Handler
      1. app.handler
    13. Role
      1. Create new role from template(s)
      2. Name
        1. aws_advent_lambda_dynamo
    14. Policy Templates
      1. Simple Microservice permissions
    15. Triggers
      1. API Gateway
      2. save the URL

Link it to your favorite slack

👇 You can follow the steps below, or view this video 👉 Video for setting up the slack outbound wehbook

  1. Setup an outbound webhook in your favorite Slack team.
  2. Manage
  3. Search
  4. outgoing wehbooks
  5. Channel ( optional )
  6. Trigger words
    1. awsadvent
    2. URLs
  7.  Your API Gateway Endpoint on the Lambda from above
  8. Customize Name
  9.  awsadvent-bot
  10. Go to slack
    1. Join the room
    2. Say the trigger word
    3. You should see something like 👉 something like this

☝️☝️ CONGRATS YOU JUST DID CHATOPS ☝️☝️


Ok. now we want to do the awesome PubSub stuff

Make the SNS Topic

We’re using a SNS Topic as a broker. The producer ( the aws_advent Lambda ) publishes messages to the SNS Topic. Two other Lambdas will be consumers of the SNS Topic, and they’ll get triggered as new messages come into the Topic.

👇 You can follow the steps below, or view this video 👉 Video for setting up the SNS Topic

  1. Console
  2. SNS
  3. New Topic
  4. Name awsadvent
  5. Note the topic ARN

Add additional permissions to the first Lambda

This permission will allow the first Lambda to talk to the SNS Topic. You also need to set an environment variable on the aws_advent Lambda to have it be able to talk to the SNS Topic.

👇 You can follow the steps below, or view this video 👉 Adding additional IAM Permissions to the aws_lambda role

  1. Give additional IAM permissions on the role for the first lambda
    1. Console
    2. IAM
    3. Roles aws_advent_lambda_dynamo
      1. Permissions
      2. Inline Policies
      3. click here
      4. Policy Name
      5. aws_advent_lambda_dynamo_snspublish

Add the SNS Topic ARN to the aws_advent Lambda

👇 You can follow the steps below, or view this video 👉 Adding a new environment variable to the lambda

There’s a conditional in the aws_advent lambda that will publish to a SNS topic, if the SNS_TOPIC_ARN environment variable is set. Set it, and watch more PubSub magic happen.

  1. Add the SNS_TOPIC_ARN environment variable to the aws_advent lambda
    1. Console
    2. LAMBDA
    3. aws_advent
    4. Scroll down
    5. SNS_TOPIC_ARN
      1. The SNS Topic ARN from above.

Create a consumer Lambda: aws_advent_sns_multiplier

This microservice increments the values collected by the aws_advent Lambda. In a real world application, I would probably not take the approach of having a second Lambda function update values in a database that are originally input by another Lambda function. It’s useful here to show how work can be done outside of the Request->Response flow for a request. A less contrived example might be that this Lambda checks for words with high counts, to build a leaderboard of words.

This Lambda function will subscribe to the SNS Topic, and it is triggered when a message is delivered to the SNS Topic. In the real world, this Lambda might do something like copy data to a secondary database that internal users can query without impacting the user experience.

👇 You can follow the steps below, or view this video 👉 Creating the sns_multiplier lambda

  1. Console
  2. lambda
  3. Create a Lambda function
  4. Select Blueprint 1. search sns 1. sns-message python2.7 runtime
  5. Configure Triggers
    1. SNS topic
      1. awsadvent
      2. click enable trigger
  6. Name
    1. sns_multiplier
  7. Runtime
    1. Python 2.7
  8. Code Entry Type
    1. Inline
      1. It’s included as sns_multiplier.py in this repo.
  9. Handler
    1. sns_multiplier.handler
  10. Role
    1. Create new role from template(s)
  11. Policy Templates
    1. Simple Microservice permissions
  12. Next
  13. Create Function

Go back to slack and test it out.

Now that you have the most interesting parts hooked up together, test it out!

What we’d expect to happen is pictured here 👉 everything working

👇 Writeup is below, or view this video 👉 Watch it work

  • The first time we sent a message, the count of the number of times the words are seen is one. This is provided by our first Lambda
  • The second time we sent a message, the count of the number of times the words are seen is twelve. This is a combination of our first and second Lambdas working together.
    1. The first invocation set the count to current(0) + one, and passed the words off to the SNS topic. The value of each word in the database was set to 1.
    2. After SNS recieved the message, it ran the sns_multiplier Lambda, which added ten to the value of each word current(1) + 10. The value of each word in the database was set to 11.
    3. The second invocation set the count of each word to current(11) + 1. The value of each word in the database was set to 12.

️️💯💯💯 Now you’re doing pubsub microservices 💯💯💯

Setup the logger Lambda as well

This output of this Lambda will be viewable in the CloudWatch Logs console, and it’s only showing that we could do something else ( anything else, even ) with this microservice implementation.

  1. Console
  2. Lambda
  3. Create a Lambda function
  4. Select Blueprint
    1. search sns
    2. sns-message python2.7 runtime
  5. Configure Triggers
    1. SNS topic
      1. awsadvent
      2. click enable trigger
  6. Name
    1. sns_logger
  7. Runtime
    1. Python 2.7
  8. Code Entry Type
    1. Inline
      1. It’s included as sns_logger.py in this repo.
  9. Handler
    1. sns_logger.handler
  10. Role
    1. Create new role from template(s)
  11. Policy Templates
    1. Simple Microservice permissions
  12. Next
  13. Create Function

In conclusion

PubSub is an awsome model for some types of work, and in AWS with Lambda we can work inside this model relatively simply. Plenty of real-word work depends on the PubSub model.

You might translate this project to things that you do need to do like software deployment, user account management, building leaderboards, etc.

AWS + Lambda == the happy path

It’s ok to lean on AWS for the heavy lifting. As our word counter becomes more popular, we probably won’t have to do anything at all to scale with traffic. Having our code execute on a request-driven basis is a big win from my point of view. “Serverless” computing is a very interesting development in cloud computing. Look for ways to experiment with it, there are plenty of benefits to it ( other than novelty ).

Some benefits you can enjoy via Serverless PubSub in AWS:

  1. Scaling the publishers. Since this used API Gateway to terminate user requests to a Lambda function:
    1. You don’t have idle resources burning money, waiting for traffic
    2. You don’t have to scale because traffic has increased or decreased
  2. Scaling the bus / interconnection. SNS did the following for you:
    1. Scaled to accommodate the volume of traffic we send to it
    2. Provided HA for the bus
    3. Pay-per-transaction. You don’t have to pay for idle resources!
  3. Scaling the consumers. Having lambda functions that trigger on a message being delivered to SNS:
    1. Scaled the lambda invocations to the volume of traffic.
    2. Provides some sense of HA

Lambda and the API Gateway are works in progress.

Lambda is a new technology. If you use it, you will find some rough edges.

The API Gateway is a new technology. If you use it, you will find some rough edges.

Don’t let that dissuade you from trying them out!

I’m open for further discussion on these topics. Find me on twitter @edyesed

About the Author:

Ed Anderson has been working with the internet since the days of gopher and lynx. Ed has worked in healthcare, regional telecom, failed startups, multinational shipping conglomerates, and is currently working at RealSelf.com.

Ed is into dadops,  devops, and chat bots.

Writing in the third person is not Ed’s gift. He’s much more comfortable poking the private cloud bear,  destroying ec2 instances, and writing lambda functions be they use case appropriate or not.

He can be found on Twitter at @edyesed.

About the Editors:

Evan Mouzakitis is a Research Engineer at Datadog. He is passionate about solving problems and helping others. He has written about monitoring many popular technologies, including Lambda, OpenStack, Hadoop, and Kafka.

Brian O’Rourke is the co-founder of RedisGreen, a highly available and highly instrumented Redis service. He has more than a decade of experience building and scaling systems and happy teams, and has been an active AWS user since S3 was a baby.


Deploy your AWS Infrastructure Continuously

01. December 2016 2016 0

Author: Michael Wittig

Continuously integrating and deploying your source code is the new standard in many successful internet companies. But what about your infrastructure? Can you deploy a change to your infrastructure in an automated way? Can you run automated tests on your infrastructure to ensure that a change has no unintended side effects? In this post I will show you how you can apply the same processes to your AWS infrastructure that you apply to your source code. You will learn how the AWS services CloudFormation, CodePipeline and Lambda can be combined to continuously deploy infrastructure.

Precondition

You may think: “Source code is text files, but my infrastructure is different. I don’t have a source file for my infrastructure.” Infrastructure as Code as defined by Martin Fowler is a concept that is helping bring software development practices to infrastructure practices.

Infrastructure as code is the approach to defining computing and network infrastructure through source code that can then be treated just like any software system.
– Martin Fowler

AWS CloudFormation is one implementation of Infrastructure as Code. CloudFormation is a high quality and free service offered by AWS. To understand CloudFormation you need to know about templates and stacks. The template is the source code, a textual representation of your infrastructure. The stack is the actual running infrastructure described by the template. So a CloudFormation template is exactly what we need, a plain text file. The CloudFormation service interprets the template and turns it into a running infrastructure.

Now, our infrastructure is defined by a text file which is exactly what we need to apply the same processes to it that we have for source code.

The Pipeline

The pipeline to build and deploy is a sequence of steps that are necessary to ship changes to your users. Starting with a change in the code repository and ending in your production environment. The following figure shows a Pipeline that runs inside AWS CodePipeline, the AWS CD service.

AWS CodePipeline - Deploying infrastructure continuously

Whenever a git push is made to a repository hosted on GitHub the pipeline starts to run by fetching the current version of the repository. After that, the pipeline creates or updates itself because the pipeline definition itself is also treated as source code. After that, the up-to-date pipeline creates or updates the test environment. After this step, infrastructure in the test environment looks exactly as it was defined in the template. This is also a good place to deploy the application to the test environment. I’m using Elastic Beanstalk to host the demo application. Now it’s time to check if the infrastructure is still in a good shape. We want to make sure that everything runs as it is defined in the tests. The tests may check if a certain port is reachable, if a certain user can login via SSH, if a certain port is NOT reachable, and so on, and so forth. If the tests are successful, the production environment is adapted to the new template and the new application version is deployed.

Implementation

From Source to Deploy PipelineCodePipeline has native support for GitHub, CloudFormation, Elastic Beanstalk, and Lambda. So I can use all the services and tie them together using CodePipeline. You can find the full source code and detailed setup instructions in this GitHub repository: michaelwittig/automation-for-the-people

 

The following template snippet shows an excerpt of the full pipeline description. Here you see how the pipeline can be configured to checkout the GitHub repository and create/update itself:

 

Summary

Infrastructure as Code enables you to apply the same CI & CD processes to infrastructure that you already know from software development. On AWS, you can use CloudFormation to turn a text representation of your infrastructure into a running environment stack. CodePipeline can be used to orchestrate the deployment process and you can implement custom logic, such as infrastructure tests, in a programming language that you can run on AWS Lambda. Finally you can treat your infrastructure as code and deploy each commit with confidence into production.

About the Author

Michael WittigMichael Wittig is author of Amazon Web Services in Action (Manning) and writes frequently about AWS on cloudonaut.io. He helps his clients to gain value from Amazon Web Services. As a software engineer he develops cloud-native real-time web and mobile applications. He migrated the complete IT infrastructure of the first bank in Germany to AWS. He has expertise in distributed system development and architecture, with experience in algorithmic trading and real-time analytics.

welcome to aws advent 2016

05. October 2016 welcome 0

We’re pleased to announce that AWS Advent is returning.

What is the AWS Advent event? Many technology platforms have started a yearly tradition for the month of December revealing an article per day written and edited by volunteers in the style of an advent calendar, a special calendar used to count the days in anticipation of Christmas starting on December 1. The AWS Advent event explores everything around the Amazon Web Services platform.

Examples of past AWS articles:

Please explore the rest of this site for more examples of past topics.

There are a large number of AWS services, and many that have never been covered on AWS advent in previous years. We’re looking for articles that range in audience level from beginning to advanced from beginners to experts in AWS. Introductory, security, architecture, and design patterns with any of the AWS services are welcome topics.

Interested in being part of AWS Advent 2016? 

Process for submission acceptance

  • Interesting title
  • Fresh point of view, unique, timely topic
  • Points relevant and interesting to the topic
  • Scope of the topic matches the intended audience
  • Availability to pair with editor and other volunteers to polish up submission

People who have volunteered to evaluate submissions will start reviewing without identifying information about the individuals to focus on the content evaluation. AWS Advent editors Brandon Burton, and Jennifer Davis will evaluate the program for diversity   We will pair folks up with available volunteers to do technical and copy editing.

Process for volunteer acceptance

  • Availability!

Important Dates

  • Blind submission review begins – October 24, 2016
  • Authors and other volunteers rolling submissions start – October 26, 2016
  • Submissions accepted until advent calendar complete.
  • Rough drafts due – 12:00am November 21, 2016
  • Final drafts due – 12:00am November 30, 2016

Please be aware that we are working on a code of conduct for participants of this event. To start, we are borrowing from the Chef Community Guidelines:

  • Be welcoming, inclusive, friendly, and patient.
  • Be considerate.
  • Be respectful.
  • Be professional.
  • Be careful in the words that you choose.
  • When we disagree, let’s all work together to understand why.

 

Thank you, and we look forward to a great AWS Advent in 2016!

Jennifer Davis, @sigje

Brandon Burton, @solarce


AWS Advent 2014 is a wrap!


AWS Advent 2014: Repeatable Infrastructure with CloudFormation and YAML

Ted Timmons is a long-term devops nerd and works for Stanson Health, a healthcare startup with a fully remote engineering team.

One key goal of a successful devops process – and successful usage of AWS – is to create automated, repeatable processes. It may be acceptable to spin up EC2 instances by hand in the early stage of a project, but it’s important to convert this from a manual experiment to a fully described system before the project reaches production.

There are several great tools to describe the configuration of a single instance- Ansible, Chef, Puppet, Salt- but these tools aren’t well-suited for describing the configuration of an entire system. This is where Amazon’s CloudFormation comes in.

CloudFormation was launched in 2011. It’s fairly daunting to get started with, errors in CloudFormation templates are typically not caught until late in the process, and since it is fed by JSON files it’s easy to make mistakes. Proper JSON is unwieldy (stray commas, unmatched closing blocks), but it’s fairly easy to write YAML and convert it to JSON.

EC2-VPC launch template

Let’s start with a simple CloudFormation template to create an EC2 instance. In this example many things are hardcoded, like the instance type and AMI. This cuts down on the complexity of the example. Still, it’s a nontrivial example that creates a VPC and other resources. The only prerequisite for this example is to create a keypair in the US-West-2 region called “advent2014”.

As you look at this template, notice both the quirks of CloudFormation (especially “Ref” and “Fn::GetAtt”) and the quirks of JSON. Even with some indentation the brackets are complex, and correct comma placement is difficult while editing a template.

JSON to YAML

Next, let’s convert this JSON example to YAML. There’s a quick converter in this article’s repository, with python and pip installed, the only other dependency should be to install PyYAML with pip.

Since JSON doesn’t maintain position of hashes/dicts, the output order may vary. Here’s what it looks like immediately after conversion:

Only a small amount of reformatting is needed to make this file pleasant: I removed unnecessary quotes, combined some lines, and moved the ‘Type’ line to the top of each resource.

YAML to JSON to CloudFormation

It’s fairly easy to see the advantages of YAML in this case- it has a massive reduction in brackets and quotes and no need for commas. However, we need to convert this back to JSON for CloudFormation to use. Again, the converter is in this article’s repository.

That’s it!

Ansible assembly

If you would like to use Ansible to prepare and publish to CloudFormation, my company shared an Ansible module to compile YAML into a single JSON template. The shared version of the script is entirely undocumented, but it compiles a full directory structure of YAML template snippets into a template. This significantly increases readability. Just placecloudformation_assemble in your library/ folder and use it like any other module.

If there’s interest, I’ll help to document and polish this module so it can be submitted to Ansible. Just fork and send a pull request.

Links


AWS Advent 2014: CloudFormation woes: Keep calm and use Ansible

Today’s post on using Ansible to help you get the most out of CloudFormation comes to use from Soenke Ruempler, who’s helping keep things running smoothly at Jimdo.

No more outdated information, a single source of truth. Describing almost everything as code, isn’t this one of the DevOps dreams? Recent developments have made this dream even closer. In the Era of APIs, tools like TerraForm and Ansible have evolved which are able to codify the creation and maintenance of entire “organizational ecosystems”.

This blog post is a brief description of the steps we have taken to come closer to this goal at my employer Jimdo. Before we begin looking at particular implementations, let’s take the helicopter view and have a look at the current state and the problems with it.

Current state

We began to move to AWS in 2011 and have been using CloudFormation from the beginning. While we currently describe almost everything in CloudFormation, there are some legacy pieces which were just “clicked” through the AWS console. In order to to have some primitive auditing and documentation for those, we usually document all “clicked” settings with a Jenkins job, which runs Cucumber scenarios that do a live inspection of the settings (by querying the AWS APIs with a read-only user).

While this setup might not look that bad and has a basic level of codification, there are several drawbacks, especially with CloudFormation itself, which we are going to have a look at now.

Problems with the current state

Existing AWS resources cannot be managed by CloudFormation

Maybe you have experienced this same issue: You start off with some new technology or provider and initially use the UI to play around. And suddenly, those clicked spikes are in production. At least this is the story how we came to AWS at Jimdo 😉

So you might say: “OK, then let’s rebuild the clicked resources into a CloudFormation stack.” Well, the problem is that we didn’t describe basic components like VPC and Subnets as CloudFormation stacks in the first place, and as other production setups rely on those resources, we cannot change this as easily anymore.

Not all AWS features are immediately available in CloudFormation

Here is another issue: The usual AWS feature release process is that a component team releases a new feature (e.g. ElastiCache replica groups), but the CloudFormation part is missing (the CloudFormation team at AWS is a separate team with its own roadmap). And since CloudFormation isn’t open source, we cannot add the missing functionality by ourselves.

So, in order to use those “Non-CloudFormation” features, we used to click the setup as a workaround, and then again document the settings with Cucumber.

But the click-and-document-with-cucumber approach seems to have some drawbacks:

  • It’s not an enforced policy to document, so colleagues might miss the documentation step or see no value in it
  • It might be incomplete as not all clicked settings are documented
  • It encourages a “clicking culture”, which is the exact opposite of what we want to achieve

So we need something which could be extended as a CloudFormation stack with resources that we couldn’t (yet) express in CloudFormation. And we need them to be grouped together semantically, as code.

Post processors for CloudFormation stacks

Some resources require post-processing in order to be fully ready. Imagine the creation of an RDS MySQL database with CloudFormation. The physical database was created by CloudFormation, but what about databases, users, and passwords? This cannot be done with CloudFormation, so we need to work around this as well.

Our current approaches vary from manual steps documented in a wiki to a combination of Puppet and hiera-aws: Puppet – running on some admin node – retrieves RDS instance endpoints by tags and then iterates over them and executes shell scripts. This is a form of post-processing entirely decoupled from the CloudFormation stack, actually in terms of time (hourly Puppet run) and in also in terms of “location” (it’s in another repository). A very complicated way just for the sake of automation.

Inconvenient toolset

Currently we use the AWS CLI tools in a plain way. Some coworkers use the old tools, some use the new ones. And I guess there are even folks with their own wrappers / bash aliases.

A “good” example is the missing feature of changing tags of CloudFormation stacks after creation. So if you forgot to do this in the first place, you’d need to recreate the entire stack! The CLI tools do not automatically add tags to stacks, so this is easily forgotten and should be automated. As a result we need to think of a wrapper around CloudFormation which automates those situations.

Hardcoded / copy and pasted data

The idea of “single source information” or “single source of truth” is to never have a representation of data saved in more than one location. In the database world, it’s called “database normalization”. This is a very common pattern which should be followed unless you have an excellent excuse.

But, if you may not know better, you are under time pressure, or your tooling is still immature, it’s hard to keep the data single-sourced. This usually leads to copying and pasting hardcoding data.

Examples regarding AWS are usually resource IDs like Subnet-IDs, Security Groups or – in our case- our main VPC ID.

While this may not be an issue at first, it will come back to you in the future, e.g. if you want to rollout your stacks in another AWS region, perform disaster recovery, or you have to grep for hardcoded data in several codebases when doing refactorings, etc.

So we needed something to access information of other CloudFormation stacks and/or otherwise created resources (from the so called “clicked infrastructure”) without ever referencing IDs, Security Groups, etc. directly.

Possible solutions

Now we have a good picture of what our current problems are and we can actually look for solutions!

My research resulted in 3 possible tools: AnsibleTerraForm and Salt.

As of writing this Ansible seems to be the only currently available tool which can deal with existing CloudFormation stacks out of the box and also seems to meet the other criteria at first glance, I decided to move on with it.

Spiking the solution with Ansible

Describing an existing CloudFormation stack as Ansible Playbook

One of the mentioned problems are the inconvenient CloudFormation CLI tools: To create/update a stack, you would have to synthesize at least the stack name, template file name, and parameters, which is no fun and error-prone. For example:

With Ansible, we can describe a new or existing CloudFormation stack with a few lines as an Ansible Playbook, here one example:

Creating and updating (converging) the CloudFormation stack becomes as straightforward as:

Awesome! We finally have great tooling! The YAML syntax is machine and human readable and our single source of truth from now on.

Extending an existing CloudFormation stack with Ansible

As for added power, it should be easier to implement AWS functionality that’s currently missing from CloudFormation as an Ansible module than a CloudFormation external resource […] and performing other out of band tasks, letting your ticketing system know about a new stack for example, is a lot easier to integrate into Ansible than trying to wrap the cli tools manually.

— Dean Wilson

The above example stack uses the AWS ElastiCache feature of Redis replica groups, which unfortunately isn’t currently supported by CloudFormation. We could only describe the main ElastiCache cluster in CloudFormation. As a workaround, we used to click this missing piece and documented it with Cucumber as explained above.

A short look at the Ansible documentation reveals there is currently no support for ElastiCache replica groups in Ansible as well. But a quick research shows we have the possibility to extend Ansible with custom modules.

So I started spiking my own Ansible module to handle ElastiCache replica groups, inspired by the existing “elasticache” module. This involved the following steps:

  1. Put the module under “library/”, e.g. elasticache_replication_group.py (I published the unfinished skeleton as a Gist for reference)
  2. Add an output to the existing CloudFormation stack which is creating the ElastiCache cluster, in order to return the ID(s) of the cache cluster(s): We need them to create the read replica group(s). Register the output of the cloudformation Ansible task:
  1. Extend the playbook to create the ElastiCache replica group by reusing the output of thecloudformation task:

Pretty awesome: Ansible works as a glue language while staying very readable. Actually it’s possible to read through the playbook and have an idea what’s going on.

Another great thing is that we can even extend core functionality of Ansible without any friction (as waiting for upstream to accept a commit, build/deploy new packages, etc) which should increase the tool acceptance across coworkers even more.

This topic touches another use-case: The possibility to “chain” CloudFormation stacks with Ansible: Reusing Outputs from Stacks as parameters for other stacks. This is especially useful to split big monolithic stacks into smaller ones which as a result can be managed and reused independently (separation of concerns).

Last but not least, it’s now easy to extend the Ansible playbook with post processing tasks (remember the RDS/Database example above).

Describing existing AWS resources as a “Stack”

As mentioned above, one issue with CloudFormation is a a way to import existing infrastructure into a stack. Luckily, Ansible supports most of the AWS functionality so we can create a playbook to express existing infrastructure as code.

To discover the possibilities, I converted a fraction of our current production VPC/subnet setup into an Ansible playbook:

As you can see, there is not even a hardcoded VPC ID! Ansible identifies the VPC by a Tag-CIDR tuple, which meets our initial requirement of “no hardcoded data”.

To stress this, I changed the aws_region variable to another AWS region, and it was possible to create the basic VPC setup in another region, which is another sign for a successful single-source-of-truth.

Single source information

Now we want to reuse the information of the VPC which we just brought “under control” in the last example. Why should we do this? Well, in order to be fully automated (which is our goal), we cannot afford any hardcoded information.

Let’s start with the VPC ID, which should be one of the most requested IDs. Getting it is relatively easy because we can just extract it from the ec2_vpc module output and assign it as a variable with the set_fact Ansible module:

OK, but we also need to reuse the subnet information – and to avoid hardcoding, we need to address them without using subnet IDs. As we tagged the subnets above, we could use the tuple (name-tag, Availability zone) to identify and group them.

With the awesome help from the #ansible IRC channel folks, I could make it work to extract one subnet by ID and Tag from the output:

While this satisfies the single source requirement, it doesn’t seem to scale very well with a bunch of subnets. Imagine you’d have to do this for each subnet (we already have more than 50 at Jimdo).

After some research I found out that it’s possible to add custom filters to Ansible that allow to manipulate data with Python code:

We can now assign the subnets for later usage like this in Ansible:

This is a great way to prepare the subnets for later usage, e.g. in iterations, to create RDS or ElastiCache subnet groups. Actually, almost everything in a VPC needs subnet information.

Those examples should be enough for now to give us confidence that Ansible is a great tool which fits our needs. Takeaways

As of of writing this, Ansible and CloudFormation seem to be a perfect fit for me. The combination turns out to be a solid solution to the following problems:

  • Single source of information / no hardcoded data
  • Combining documentation and “Infrastructure as Code”
  • Powerful wrapper around basic AWS CLI tooling
  • Inception point for other orchestration software (e. g. CloudFormation)
  • Works with existing AWS resources
  • Easy to extend (Modules, Filters, etc: DSL weaknesses can be worked around by hooking in python code)

Next steps / Vision

After spiking the solution, I could imagine the following next steps for us:

  • Write playbooks for all existing stacks and generalize concepts by extracting common concepts (e.g. common tags)
  • Transform all the tests in Cucumber to Ansible playbooks in order to have a single source
  • Remove hardcoded IDs from existing CloudFormation stacks by parameterizing them via Ansible.
  • Remove AWS Console (write) access to our Production AWS account in order to enforce the “Infrastructure as Code” paradigm
  • Bring more clicked infrastructure / ecosystem under IaC-control by writing more Ansible modules (e.g. GitHub Teams and Users, Fastly services, Heroku Apps, Pingdom checks)
  • Spinning up the VPC including some services in another region in order to prove we are fully single-sourced (e. g. no hardcoded IDs) and automated.
  • Trying out Ansible Tower for:
    • Regular convergence runs in order to avoid configuration drift and maybe even revert clicked settings (similar to “Simian army” approach)
    • A “single source of Infrastructure updates”
  • Practices like Game Days to actually test Disaster recovery scenarios

I hope this blog post has brought some new thoughts and inspirations to the readers. Happy holidays!

Resources


AWS Advent 2014 – Using Terraform to build infrastructure on AWS

Today’s post on using Terraform to build infrastructure on AWS comes from Justin Downing.

Introduction

Building interconnected resources on AWS can be challenging. A simple web application can have a load balancer, application servers, DNS records, and a security group. While a sysadmin can launch and manage these resources from the AWS web console or a CLI tool like fog, doing so can be time consuming and tedious considering all the metadata you have to shuffle amongst the other resources.

An elegant solution to this problem has been solved by the fine folks at Hashicorp: Terraform. This tool aims to take the concept of “infrastructure as code” and add the missing pieces that other provisioning tools like fog miss, namely the glue to interconnect your resources. For anyone with a background in software configuration management (Chef, Puppet), then using Terraform should be a natural fit for describing and configuring infrastructure resources.

Terraform can be used with several different providers including AWS, GCE, and Digital Ocean. We will be discussing provisioning resources on AWS. You can read more about the built-in AWS provider here.

Installation

Terraform is written in go and distributed as a package of binaries. You can download the appropriate package from the website. If you are using OSX and homebrew, you can simply brew install terraform to get everything installed and setup.

Configuration

Now, that you have Terraform installed, let’s build some infrastructure! Terraform configuration files are text files that resemble JSON, but are more readable and can include comments. These files should end in .tf (more details on configuration is available here). Rather than invent an example to use Terraform with AWS, I’m going to step through the example published by Hashicorp.

NOTE: I am assuming here that you have AWS keys capable of creating/terminating resources. Also, it would help if had the AWS CLI is installed and configured as Terraform will use those credentials to interract with AWS. The example below is using AWS region us-west-2.

Let’s use the AWS Two-Tier example to build an ELB and EC2 instance:

Here, we initialized a new directory with the example. Then, we created a new keypair and saved the private key to our directory. Here, you will note the files with the tf extension. These are the configuration files used to describe the resources we want to build. As the name indicates, one is the main configuration, one contains the variables used, and one describes the desired output. When you build this configuration, Terrraform will combine all .tf files in the current directory to greate theresource graph.

Make a Plan

I encourage you to review the configuration details in main.tfvariables.tf, and outputs.tf. With the help of comments and descriptions, it’s very easy to learn how different resources are intended to work together. You can also run plan to see how Terraform intends to build the resources you declared.

This also doubles as a linter by checking the validity of your configuration files. For example, if I comment out the instance_type in main.tf, we receive an error:

Variables

You will note that some pieces of the configuration are parameterized. This is very useful when sharing your Terraform plans, committing them to source control, or protecting sensitive data like access keys. By using variables and setting defaults for some, you allow for better portability when you share your Terraform plan with other members of your team. If you define a variable that does not have a default value, Terraform will require that you provide a value before proceeding. You can either (a) provide the values on the command line or (b) write them to a terraform.tfvars file. This file acts like a “secrets” file with a key/value pair on each line. For example:

Due to the sensitive information included in this file, it is recommended that you includeterraform.tfvars in your source control ignore list (eg: echo terraform.tfvars >> .gitignore) if you want to share your plan.

Build Your Infrastructure

Now, we can build the resources using apply:

The output above is truncated, but Terraform did a few things for us here:

  1. Created a ‘terraform-example’ security group allowing SSH and HTTP access
  2. Created an EC2 instances from the Ubuntu 12.04 AMI
  3. Created an ELB instance and used the EC2 instance as its backend
  4. Printed the ELB public DNS address in the Outputs section
  5. Saved the state of your infrastructure in a terraform.tfstate file

You should be able to open the ELB public address in a web browser and see “Welcome to Nginx!” (note: this may take a minute or two after initialization in order for the ELB health check to pass).

The terraform.tfstate file is very important as it tracks the status of your resources. As such, if you are sharing your configurations, it is recommended that you include this file in source control. This way, after initializing some resources, another member of your team will not try and re-initialize those same resources. In fact, she can see the status of the resources with terraform show. In the event the state has not been kept up-to-date, you can use terraform refresh to update the state file.

And…that’s it! With a few descriptive text files, Terraform is able to build cooperative resources on AWS in a matter of minutes. You no longer need complicated wrappers around existing AWS libraries/tools to orchestrate the creation or destruction of resources. When you are finished, you can simply run terraform destroy to remove all the resources described in your .tf configuration files.

Conclusion

With Terraform, building infrastructure resources is as simple as describing them in text. Of course, there is a lot more you can do with this tool including managing DNS records and configure Mailgun. You can even mix these providers together in a single plan (eg: EC2 instances, DNSimple records, Atlas metadata) and Terraform will manage it all! Check out the documentation and examples for the details.

Terraform Docs: https://terraform.io/docs/index.html
Terraform Examples: https://github.com/hashicorp/terraform/tree/master/examples