welcome to aws advent 2017

09. October 2017 2017, welcome 0

We’re pleased to announce that AWS Advent is returning.

What is the AWS Advent event? Many technology platforms have started a yearly tradition for the month of December revealing an article per day written and edited by volunteers in the style of an advent calendar, a special calendar used to count the days in anticipation of Christmas starting on December 1. The AWS Advent event explores everything around the Amazon Web Services platform.

Examples of past AWS articles:

Please explore the rest of this site for more examples of past topics.

There are a large number of AWS services, and many that have never been covered on AWS advent in previous years. We’re looking for articles that range in audience level from beginning to advanced from beginners to experts in AWS. Introductory, security, architecture, and design patterns with any of the AWS services are welcome topics.

Interested in being part of AWS Advent 2017? 

Process for submission acceptance

  • Interesting title
  • Fresh point of view, unique, timely topic
  • Points relevant and interesting to the topic
  • Scope of the topic matches the intended audience
  • Availability to pair with editor and other volunteers to polish up submission

People who have volunteered to evaluate submissions will start reviewing without identifying information about the individuals to focus on the content evaluation. AWS Advent curators Brandon Burton, and Jennifer Davis will evaluate the program for diversity. We will pair folks up with available volunteers to do technical and copy editing.

Process for volunteer acceptance

  • Availability!

Important Dates

  • Authors and other volunteers rolling acceptances start – October 23, 2017
  • Submissions accepted until advent calendar complete.
  • Rough drafts due – 12:00am November 22, 2017
  • Final drafts due – 12:00am November 29, 2017

Expectations

  • Be welcoming, inclusive, friendly, and patient.
  • Be considerate.
  • Be respectful.
  • Be professional.
  • Be careful in the words that you choose.
  • When we disagree, let’s all work together to understand why.

 

Thank you, and we look forward to a great AWS Advent in 2017!

Jennifer Davis, @sigje

Brandon Burton, @solarce


When the Angry CFO Comes Calling: AWS Cost Control

24. December 2016 2016 0

Author: Corey Quinn
Editors: Jesse Davis

Controlling costs in AWS is a deceptively complex topic — as anyone who’s ever gone over an AWS billing statement is sadly aware. Individual cost items in Amazon’s cloud environments seem so trivial– 13¢ an hour for an EC2 instance, 5¢ a month for a few files in an S3 bucket… until before you realize it, you’re potentially spending tens of thousands of dollars on your AWS infrastructure, and your CFO is turning fascinating shades of purple. It’s hard to concentrate on your work over the screaming, so let’s take a look into fixing that.

There are three tiers of cost control to consider with respect to AWS.

First Tier

The first and simplest tier is to look at your utilization. Intelligent use of Reserved Instances, ensuring that you’re sizing your instances appropriately, validating that you’re aware of what’s running in your environment– all of these can unlock significant savings at scale, and there are a number of good ways to expose this data. CloudabilityCloudDynCloudCheckr, and other services expose this information, as does Amazon’s own Trusted Advisor– if you’ve opted to pay for either AWS’s Business or Enterprise support tiers. Along this axis, Amazon also offers significant discounting once you’re in a position where signing an Enterprise Agreement makes sense.

Beware: here be dragons! Reserved Instances come in both 1 and 3 year variants– and the latter is almost always inappropriate. By locking in pricing for specific instances types, you’re opting out of three years of AWS price reductions– as well as generational improvements in instances. If Amazon releases an instance class that’s more appropriate for your workload eight months from your purchase of a 3 year RI, you get twenty-eight months of “sunk cost” before a wholesale migration to the new class becomes viable. As a rule of thumb, unless your accounting practices force you into a three year RI model, it’s best to pass them up; the opportunity cost doesn’t justify the (marginal) savings you get over one year reservations.

Second Tier

This is all well and good, but it only takes you so far. The second tier of cost control includes taking a deeper dive into how you’re using AWS’s services, while controlling for your business case. If you have a development environment that’s only used during the day, programmatically stopping it at night and starting it again the following morning can cut your costs almost in half– without upsetting the engineers, testers, and business units who rely on that environment.

Another example of this is intelligent use of Spot Instances or Spot Fleets. This requires a bit of a deep dive into your environment to determine a few things, including what your workload requirements are, how you’ve structured your applications to respond to instances joining or leaving your environment at uncontrolled times, and the amount of engineering effort required to get into a place where this approach will work for you. That said, if you’re able to leverage Spot fleets, it unlocks the potential for massive cost savings– north of 70% is not uncommon.

Third Tier

The third tier of cost control requires digging into the nature of how your application interacts with AWS resources. This is highly site specific, and requires an in-depth awareness of both your application and AWS work. “Aurora looks awesome for this use case!” without paying attention to your IOPS can result in a surprise bill for tens of thousands of dollars per month– a most unwelcome surprise for most companies! Understanding not only how AWS works on the fringes, but understanding what your application is doing becomes important.

Depending upon where you’re starting from, reducing your annual AWS bill by more than half is feasible. Amazon offers many opportunities to save money; your application architecture invariably offers many more. By tweaking these together, you can realize the kind of savings that both you and your CFO’s rising blood pressure can both enjoy.

About the Author

Principal at The Quinn Advisory Group, Corey Quinn has a history as an engineering manager, public speaker, and advocate for cloud strategies which speak to company culture. He specializes in helping companies control and optimize their AWS cloud footprint without disrupting the engineers using it. He lives in San Francisco with his wife, two dogs, and as of May 2017 his first child.

 


Securing Machine access in AWS

23. December 2016 2016 0

Authors: Richard Ortenberg, and Aren Sandersen

Hosting your infrastructure in AWS can provide numerous operational benefits, but can also result in weakened security if you’re not careful. AWS uses a shared responsibility model in which Amazon and its customers are jointly responsible for securing their cloud infrastructure. Even with Amazon’s protections, the number of attack vectors in a poorly secured cloud system is practically too high to count: password lists get dumped, private SSH keys get checked in to GitHub, former employees reuse old credentials, current employees fall victim to spear-phishing, and so on. The most critical first steps that an organization can take towards better security in AWS is putting its infrastructure in a VPN or behind a bastion host and improving its user host access system.

The Edge

A bastion host (or jump box) is a specific host that provides the only means of access to the rest of your hosts. A VPN, on the other hand, lets your computer into the remote network, allowing direct access to hosts. Both a VPN and bastion host have their strengths and weaknesses, but the main value they provide is funnelling all access through a single point. Using this point of entry (or “edge”) to gain access to your production systems is an important security measure. If your endpoints are not behind an edge and are directly accessible on the internet, you’ll have multiple systems to patch in case of a zero-day and each server must be individually “hardened” against common attacks. With a VPN or bastion, your main concern is only hardening and securing the edge.

If you prefer to use a bastion host, Amazon provides an example of how to set one up: https://aws.amazon.com/blogs/security/how-to-record-ssh-sessions-established-through-a-bastion-host/

 

If you’d rather run a VPN, here are just a few of the more popular options:

  • Run the open-source version of OpenVPN which is available in many Linux distributions.
  • Use a prebuilt OpenVPN Access Server (AS) in the AWS Marketplace. This requires a small license fee but set up and configuration are much easier.
  • Use the Foxpass VPN in AWS Marketplace.

Two Factor Authentication

One of the most critical security measures you can implement next is to configure two-factor authentication (2FA) on your VPN or bastion host. Two-factor authentication requires that users enter a code or click a button on a device in their possession to verify a login attempt, making unauthorized access difficult.

Many two-factor systems use a smartphone based service like Duo or Google Authenticator. Third party devices like RSA keys and Yubikeys are also quite common. Even if a user’s password or SSH keys are compromised, it is much harder for an attacker to also gain access to a user’s physical device or phone. Additionally, these physical devices are unable to be stolen remotely, decreasing an attack vector by multiple orders of magnitude.

For 2FA, bastion hosts use a PAM plugin which both Duo and Google Authenticator provide. If you’re using a VPN, most have built-in support for two-factor authentication.

User Host Access

Finally, you need to make sure that your servers are correctly secured behind the edge. A newly-instantiated EC2 server is configured with a single user (usually ‘ec2-user’ or ‘ubuntu’) and a single public SSH key. If multiple people need to access the server, however, then you need a better solution than sharing the private key amongst the team. Sharing a private key is akin to sharing a password to the most important parts of your infrastructure.

Instead, each user should generate their own SSH key pair, keeping the private half on their machine and installing the public half on servers which they need access to.

From easy to more complex here are three mechanisms to improve user access:

  • Add everyone’s public keys to the /home/ec2-user/.ssh/authorized_keys file. Now each person’s access can be revoked independently of the other users.
  • Create several role accounts (e.g. ‘rwuser’, ‘rouser’ that have read/write and read-only permissions, respectively) and install users’ public keys appropriately into each role’s authorized_keys file.
  • Create individual user accounts on each host. Now you have the ability to manage permissions separately for each user.

Best-practice is to use either infrastructure automation tools (e.g. Chef, Puppet, Ansible, Salt) or an LDAP-based system, (e.g. Foxpass), to create and manage the above-mentioned accounts, keys, and permissions.

Summary

There are many benefits to hosting your infrastructure in AWS. Don’t just depend on Amazon or other third parties to protect your infrastructure. Set up a VPN or bastion, patch your vulnerable systems as soon as possible, turn on 2FA, and implement a user access strategy that is more complex than just sharing a password or an SSH key.

About the Authors:

Richard Ortenberg is currently a software developer at Foxpass. He is a former member of the CloudFormation team at AWS.

Aren Sandersen is the founder of Foxpass. Previously he has run engineering, operations and IT teams at Pinterest, Bebo, and Oodle.

 


Getting Started with CodeDeploy

22. December 2016 2016 0

Author: Craig Bruce
Editors: Alfredo Cambera, Chris Castle

Introduction

While running a web site you need a way to deploy your application code in a repeatable, reliable and scalable fashion.

CodeDeploy is part of the AWS Developer Tools family which includes CodeCommit, CodePipeline, and AWS Command Line Interface. CodeCommit is for managed git repositories, CodePipeline is a service to help you build, test and automate while the AWS Command Line Interface is your best friend for accessing the API in an interactive fashion. They do integrate with each, more on that later.

Concepts

Let’s start with the CodeDeploy specific terminology:

  • Application – A unique identifier to tie your deployment group, revisions and deployments together.
  • Deployment Group – The instances you wish to deploy to.
  • Revision – Your application code that you wish to deploy.
  • Deployment – Deploy a specific revision to a specific deployment group.
  • CodeDeploy service – The managed service from AWS which oversees everything.
  • CodeDeploy agent – The agent you install on your instances for them to check in with the CodeDeploy service.

Getting up and running is straightforward. Install the CodeDeploy agent onto your EC2 instances and then head to the AWS Management Console to create an application and a deployment group. You associated your deployment group with an Auto Scaling group or EC2 tag(s). One of the new features is that on-premise instances are supported as well now. As this resource is outside of AWS’ view you need to register them with the CodeDeploy service before it can deploy to them. Your resources must have access to the public AWS API endpoints to communicate with the CodeDeploy service. This offers some really interesting options for hybrid deployment (deploying to both your EC2 and on-premise resources) – not many AWS services support any non-AWS resources. CodeDeploy is now aware of the instances which belong to a deployment group and whenever you request a deployment it will update them all.

Your revision is essentially a compressed file of your source code with one one extra file, appspec.yml, which the CodeDeploy agent will use to help unpack your files and optionally run any lifecycle event hooks you may have specified. Let’s say you need to tidy up some files before a deployment. For a Python web application you may want to remove those pesky *.pyc files. Define a lifecycle event hook to delete those files before you unpack your new code.

Upload your revision to S3 (or you can provide a commit ID from a GitHub repository, although not CodeCommit – more on this later), provide a deployment group and CodeDeploy is away. Great job, you web application code has now been deployed to your instances.

Managing deployments

As is becoming increasingly common, most AWS services are best when used with other AWS services, in this case CloudWatch offers some new options using CloudWatch Alarms and CloudWatch Events. CloudWatch Alarms can be used to stop deployments. Let’s say the CPU utilization on your instances is over 75%, this can trigger an alarm and CodeDeploy will stop any deployments on these instances. The deployment status will update to Stopped. This prevents deployments when there is an increased chance of a deployment problem.

Also new is adding triggers to your deployment groups which is powered by CloudWatch Events. An event could be “deployment succeeds” at which point a message is sent to a SNS topic. This topic could be subscribed to a Lambda function which sends a Success! message to your Deployment channel in Slack/HipChat. There are various events you can use deployment start, stop, failed as well as individual instance states, like starts, failed or succeeds. Be aware of noisy notifications though, you probably don’t want to know about every instance, in every deployment. Plus just like AWS you can be throttled by posting too many messages to Slack/HipChat in a short period.

Deployments do not always go smoothly and if there is a problem the quickest way to restore service is to revert to the last known good revision, typically the last one. Rollbacks have now been added and can be triggered in two ways. Firstly by rolling back if the new deployment fails. Secondly by rolling back if a CloudWatch Alarm is triggered. For example after a deployment if CPU usage is over 90% for 5 minutes, automatically roll the deployment back. In either case you want to know a rollback action was performed – handy that your deployment groups have notifications now.

Integration with CodePipeline

Currently CodeCommit is not a supported entry point for CodeDeploy you provide your revision via an object in S3 or a commit ID in a GitHub repository. You can however use CodeCommit as the source action for CodePipeline, behind the scenes it drops it in S3 for you before passing onto CodeDeploy. So you can build a pipeline in CodePipeline that uses CodeCommit and CodeDeploy actions. Now you have a pipeline you can add further actions as well such as integration with your CI/CD system.

Conclusion

CodeDeploy is a straightforward service to setup and a valuable tool in your DevOps toolbox. Recent updates make it easier to get notified about the status of your deployments, avoid deployments where alarms are triggered and enabling automatic rollback if there is an issue with a new deployment. Best of all use of CodeDeploy for EC2 instances is free, you only pay for your revisions in S3 (so not very much at all). If you were undecided about CodeDeploy try it today!

Notes

Read the excellent CodeDeploy documentation to see learn about all the fine details. Three features were highlighted in this post, learn more about them:

If you are new to CodeDeploy then follow the Getting Started guide to setup your IAM access and issue your first deployment.

About the Author

Dr. Craig Bruce is a Scientific Software Development Manager at OpenEye Scientific. He is responsible for the DevOps group working on Orion, a cloud-native platform for early-stage drug discovery which is enabling chemists to design new medicines. Orion is optimized for AWS and Craig is an AWS Certified DevOps Engineer.

About the Editors

Alfredo Cambera is a Venezuelan outdoorsman, passionate about DevOps, AWS, automation, Data Visualization, Python and open source technologies. He works as Senior Operations Engineer for a company that offers Mobile Engagement Solutions around the globe.

Chris Castle is a Delivery Manager within Accenture’s Technology Architecture practice. During his tenure, he has spent time with major financial services and media companies. He is currently involved in the creation of a compute request and deployment platform to enable migration of his client’s internal applications to AWS.


Paginating AWS API Results using the Boto3 Python SDK

21. December 2016 2016 0

Author: Doug Ireton

Boto3 is Amazon’s officially supported AWS SDK for Python. It’s the de facto way to interact with AWS via Python.

If you’ve used Boto3 to query AWS resources, you may have run into limits on how many resources a query to the specified AWS API will return, generally 50 or 100 results, although S3 will return up to 1000 results. The AWS APIs return “pages” of results. If you are trying to retrieve more than one “page” of results you will need to use a paginator to issue multiple API requests on your behalf.

Introduction

Boto3 provides Paginators to automatically issue multiple API requests to retrieve all the results (e.g. on an API call toEC2.DescribeInstances). Paginators are straightforward to use, but not all Boto3 services provide paginator support. For those services you’ll need to write your own paginator in Python.

In this post, I’ll show you how to retrieve all query results for Boto3 services which provide Pagination support, and I’ll show you how to write a custom paginator for services which don’t provide built-in pagination support.

Built-In Paginators

Most services in the Boto3 SDK provide Paginators. See S3 Paginators for example.

Once you determine you need to paginate your results, you’ll need to call the get_paginator() method.

How do I know I need a Paginator?

If you suspect you aren’t getting all the results from your Boto3 API call, there are a couple of ways to check. You can look in the AWS console (e.g. number of Running Instances), or run a query via the aws command-line interface.

Here’s an example of querying an S3 bucket via the AWS command-line. Boto3 will return the first 1000 S3 objects from the bucket, but since there are a total of 1002 objects, you’ll need to paginate.

Counting results using the AWS CLI

Here’s a boto3 example which, by default, will return the first 1000 objects from a given S3 bucket.

Determining if the results are truncated

The S3 response dictionary provides some helpful properties, like IsTruncated, KeyCount, and MaxKeys which tell you if the results were truncated. If resp['IsTruncated'] is True, you know you’ll need to use a Paginator to return all the results.

Using Boto3’s Built-In Paginators

The Boto3 documentation provides a good overview of how to use the built-in paginators, so I won’t repeat it here.

If a given service has Paginators built-in, they are documented in the Paginators section of the service docs, e.g.AutoScaling, and EC2.

Determine if a method can be paginated

You can also verify if the boto3 service provides Paginators via the client.can_paginate() method.


So, that’s it for built-in paginators. In this section I showed you how to determine if your API results are being truncated, pointed you to Boto3’s excellent documentation on Paginators, and showed you how to use the can_paginate() method to verify if a given service method supports pagination.

If the Boto3 service you are using provides paginators, you should use them. They are tested and well documented. In the next section, I’ll show you how to write your own paginator.

How to Write Your Own Paginator

Some Boto3 services, such as AWS Config don’t provide paginators. For these services, you will have to write your own paginator code in Python to retrieve all the query results. In this section, I’ll show you how to write your own paginator.

You Might Need To Write Your Own Paginator If…

Some Boto3 SDK services aren’t as built-out as S3 or EC2. For example, the AWS Config service doesn’t provide paginators. The first clue is that the Boto3 AWS ConfigService docs don’t have a “Paginators” section.

The can_paginate Method

You can also ask the individual service client’s can_paginate method if it supports paginating. For example, here’s how to do that for the AWS config client. In the example below, we determine that the config service doesn’t support paginating for the get_compliance_details_by_config_rule method.

Operation Not Pageable Error

If you try to paginate a method without a built-in paginator, you will get an error similar to this:

If you get an error like this, it’s time to roll up your sleeves and write your own paginator.

Writing a Paginator

Writing a paginator is fairly straightforward. When you call the AWS service API, it will return the maximum number of results, and a long hex string token, next_token if there are more results.

Approach

To create a paginator for this, you make calls to the service API in a loop until next_token is empty, collecting the results from each loop iteration in a list. At the end of the loop, you will have all the results in the list.

In the example code below, I’m calling the AWS Config service to get a list of resources (e.g. EC2 instances), which are not compliant with the required-tags Config rule.

As you read the example code below, it might help to read the Boto3 SDK docs for theget_compliance_details_by_config_rule method, especially the “Response Syntax” section.

Example Paginator

Example Paginator – main() Method

In the example above, the main() method creates the config client and initializes the next_token variable. Theresources list will hold the final results set.

The while loop is the heart of the paginating code. In each loop iteration, we call theget_compliance_details_by_config_rule method, passing next_token as a parameter. Again, next_token is a long hex string returned by the given AWS service API method. It’s our “claim check” for the next set of results.

Next, we extract the current_batch of AWS resources and the next_token string from the compliance_detailsdictionary returned by our API call.

Example Paginator – get_resources_from() Helper Method

The get_resources_from(compliance_details) is an extracted helper method for parsing the compliance_detailsdictionary. It returns our current batch (100 results) of resources and our next_token “claim check” so we can get the next page of results from config.get_compliance_details_by_config_rule().

I hope the example is helpful in writing your own custom paginator.


In this section on writing your own paginators I showed you a Boto3 documentation example of a service without built-in Paginator support. I discussed the can_paginate method and showed you the error you get if you call it on a method which doesn’t support pagination. Finally, I discussed an approach for writing a custom paginator in Python and showed a concrete example of a custom paginator which passes the NextToken “claim check” string to fetch the next page of results.

Summary

In this post, I covered Paginating AWS API responses with the Boto3 SDK. Like most APIs (Twitter, GitHub, Atlassian, etc) AWS paginates API responses over a set limit, generally 50 or 100 resources. Knowing how to paginate results is crucial when dealing with large AWS accounts which may contain thousands of resources.

I hope this post has taught you a bit about paginators and how to get all your results from the AWS APIs.

About the Author

Doug Ireton is a Sr. DevOps engineer at 1Strategy, an AWS Consulting Partner specializing in Amazon Web Services (AWS). He has 23 years experience in IT, working at Microsoft, Washington Mutual Bank, and Nordstrom in diverse roles from testing, Windows Server engineer, developer, and Chef engineer, helping app and platform teams manage thousands of servers via automation.


Alexa is checking your list

20. December 2016 2016 0

Author: Matthew Williams
Editors: Benjamin Marsteau, Scott Francis

Recently I made a kitchen upgrade: I bought an Amazon Dot. Alexa, the voice assistant inside the intelligent puck, now plays a key role in the preparation of meals every day. With both hands full, I can say “Alexa, start a 40-minute timer” and not have to worry about burning the casserole. However, there is a bigger problem coming up that I feel it might also help me out on. It is the gift-giving season, and I have been known to get the wrong things. Wouldn’t it be great if I could have Alexa remind me what I need to get for each person on my list? Well, that simple idea took me down a path that has consumed me for a little too long. And as long as I built it, I figured I would share it with you.

Architecting a Solution

Now it is important to remember that I am a technologist and therefore I am going to go way beyond what’s necessary. [ “anything worth doing is worth overdoing.” — anon. ] Rather than just building the Alexa side of things, I decided to create the entire ecosystem. My wife and I are the first in our families to add Alexa to their household, so that means I need a website for my friends and family to add what they want. And of course, that website needs to talk to a backend server with a REST API to collect the lists into a database. And then Alexa needs to use that same API to read off my lists.

OK, so spin up an EC2 instance and build away, right? I did say I am a technologist, right? That means I have to use the shiniest tools to get the job done. Otherwise, it would just be too easy.

My plan is to use a combination of AWS Lambda to serve the logic of the application, the API Gateway to host the REST endpoints, DynamoDB for saving the data, and another Lambda to respond to Alexa’s queries.

The Plan of Attack

Based on my needs, I think I came up with the ideal plan of attack. I would tackle the problems in the following order:

  1. Build the Backend – The backend includes the logic, API, and database.
    1. Build a Database to Store the Items
    2. Lambda Function to Add an Item
    3. Lambda Function to Delete an Item
    4. Lambda Function to List All Items
    5. Configure the API Gateway
  2. Build the User Interface – The frontend can be simple: show a list, and let folks add and remove from that list.
  3. Get Alexa Talking to the Service – That is why we are here, right?

There are some technologies used that you should understand before beginning. You do not have to know everything about Lambda or the API Gateway or DynamoDB, but let’s go over a few of the essentials.

Lambda Essentials

The purpose of Lambda is to run the functions you write. Configuration is pretty minimal, and you only get charged for the time your functions run (you get a lot of free time). You can do everything from the web console, but after setting up a few functions, you will want another way. See this page for more about AWS Lambda.

API Gateway Essentials

The API Gateway is a service to make it easier to maintain and secure your APIs. Even if I get super popular, I probably won’t get charged much here as it is $3.50 per million API calls. See this page for more about the Amazon API Gateway.

DynamoDB Essentials

DynamoDB is a simple (and super fast) NoSQL database. My application has simple needs, and I am going to need a lot more friends before I reach the 25 GB and 200 million requests per month that are on the free plan. See this page for more about Amazon DynamoDB.

Serverless Framework

Sure I can go to each service’s console page and configure them, but I find it a lot easier to have it automated and in source control. There are many choices in this category including the Serverless framework, Apex, Node Lambda, and many others. They all share similar features so you should review them to see which fits your needs best. I used the Serverless framework for my implementation.

Alexa Skills

When you get your Amazon Echo or Dot home, you interact with Alexa, the voice assistant. The things that she does are Alexa Skills. To build a skill you need to define a list of phrases to recognize, what actions they correspond to, and write the code that performs those actions.

Let’s Start Building

There are three main components that need to be built here: API, Web, and Skill. I chose a different workflow for each of them. The API uses the Serverless framework to define the CloudFormation template, Lambda Functions, IAM Roles, and API Gateway configuration. The Webpage uses a Gulp workflow to compile and preview the site. And the Alexa skill uses a Yeoman generator. Each workflow has its benefits and it was exciting to use each.

If you would like to follow along, you can clone the GitHub repo: https://github.com/DataDog/AWS-Advent-Alexa-Skill-on-Lambda.

Building the Server

The process I went through was:

  1. Install Serverless Framework (npm i -g serverless)
  2. Create the first function (sls create -n <service name> -t aws-nodejs)The top-level concept in Serverless is that of a service. You create a service, then all the Lambda functions, CloudFormation templates, and IAM roles defined in the serverless.yaml file support that service.Add the resources needed to a CloudFormation template in the serverless.yaml file. For example:Refer to the CloudFormation docs and the Serverless Resources docs for more about this section.
  3. Add the resources needed to a CloudFormation template in the serverless.yaml file. For example:
    alexa_1
    Refer to the CloudFormation docs and the Serverless Resources docs for more about this section.
  4. Add the IAM Role statements to allow your Lambda access to everything needed. For example:
    alexa_2
  5. Add the Lambda functions you want to use in this service. For example:
    alexa_3
    The events section lists the triggers that can kick off this function. **http** means to use the API Gateway. I spent a little time in the API Gateway console and got confused. But these four lines in the serverless.yaml file were all I needed.
  6. Install serverless-webpack npm and add it to the YAML file:
    alexa_4
    This configuration tells Serverless to use WebPack to bundle all your npm modules together in the right way. And if you want to use EcmaScript 2015 this will run Babel to convert back down to a JavaScript version that Lambda can use.  You will have to setup your webpack.config.js and .babelrc files to get everything working.
  7. Write the functions. For the function I mentioned earlier, I added the following to my items.js file:
    alexa_5
    This function sets the table name in my DynamoDB and then grabs all the rows. No matter what the result is, a response is formatted using this createResponse function:
    alexa_6Notice the header. Without this, Cross Origin Resource Sharing will not work. You will get nothing but 502 errors when you try to consume the API.
  8. Deploy the Service:

    Now I use 99Design’s aws-vault to store my AWS access keys rather than adding them to a rc file that could accidentally find its way up to GitHub. So the command I use is:

    If everything works, it creates the DynamoDB table, configures the API Gateway APIs, and sets up the Lambdas. All I have to do is try them out from a new application or using a tool like Paw or Postman. Then rinse and repeat until everything works.

Building the Frontend

alexa_7

Remember, I am a technologist, not an artist. It works, but I will not be winning any design awards. It is a webpage with a simple table on it and loads up some Javascript to show my DynamoDB table:

alexa_8

Have I raised the technologist card enough times yet? Well, because of that I need to keep to the new stuff even with the Javascript features I am using. That means I am writing the code in ECMAScript 2015, so I need to use Babel to convert it to something usable in most browsers. I used Gulp for this stage to keep building the files and then reloading my browser with each change.

Building the Alexa Skill

Now that we have everything else working, it is time to build the Alexa Skill. Again, Amazon has a console for this which I used for the initial configuration on the Lambda that backs the skill. But then I switched over to using Matt Kruse’s Alexa App framework. What I found especially cool about his framework was that it works with his alexa-app-server so I can test out the skill locally without having to deploy to Amazon.

For this one I went back to the pre-ECMAScript 2015 syntax but I hope that doesn’t mean I lose technologist status in your eyes.

Here is a quick look at a simple Alexa response to read out the gift list:

alexa_9

Summary

And now we have an end to end solution around working with your gift lists. We built the beginnings of an API to work with gift lists. Then we added a web frontend to allow users to add to the list. And then we added an Alexa skill to read the list while both hands are on a hot pan. Is this overkill? Maybe. Could I have stuck with a pen and scrap of paper? Well, I guess one could do that. But what kind of technologist would I be then?

About the Author

Matt Williams is the DevOps Evangelist at Datadog. He is passionate about the power of monitoring and metrics to make large-scale systems stable and manageable. So he tours the country speaking and writing about monitoring with Datadog. When he’s not on the road, he’s coding. You can find Matt on Twitter at @Technovangelist.

About the Editors

Benjamin Marsteau is a System administrator | Ops | Dad | and tries to give back to the community has much as it gives him.


AWS network security monitoring with FlowLogs

19. December 2016 2016 0

Author: Lennart Koopmann
Editors: Zoltán Lajos Kis

Regardless if you are running servers in AWS or your own data center, you need to have a high level of protection against intrusions. No matter how strict your security groups and local iptables are configured, there is always the chance that a determined attacker will make it past these barriers and move laterally within your network. In this post, I will walk through how to protect your AWS network with FlowLogs. From implementation and collection of FlowLogs in CloudWatch, to the analyzation of the data with Graylog, a log management system, you will be fully equipped to monitor your environment.

Introduction

As Rob Joyce, Chief of TAO at the NSA discussed in his talk at USENIX Enigma 2015, it’s critical to know your own network: What is connecting where, which ports are open, and what are usual connection patterns.

Fortunately AWS has the FlowLogs feature, which allows you to get a copy of raw network connection logs with a significant amount of metadata. This feature can be compared to Netflow capable routers, firewalls, and switches in classic, on-premise data centers.

FlowLogs are available for every AWS entity that uses Elastic Network Interfaces. The most important services that do this are EC2, ELB, ECS and RDS.

What information do FlowLogs include?

Let’s look at an example message:

This message tells us that the following network connection was observed:

  • 2 – The VPC flow log version is 2
  • 123456789010- The AWS account id was 123456789010
  • eni-abc 123de- The recording network interface was eni-abc123de. (ENI is Elastic Network Interface)
  • 172.31.16.139:20641 and 172.31.16.21.22 – 172.31.16.139:20641 attempted to connect to 172.31.16.21:22
  • 6 – The IANA protocol number used was 6 (TCP)
  • 20 and 429 – 4249 bytes were exchanged over 20 packets
  • 1418530010 – The start of the capture window in Unix seconds was 12/4/2016 at 4:06 am (UTC) ((A capture window is a duration of time which AWS aggregates before publishing the logs.The published logs will have a more accurate timestamp as metadata later.)
  • 1418630070 – The end of the capture window in Unix seconds was 12/4/2016 at 4:07 am (UTC)
  • ACCEPT – The recorded traffic was accepted. (If the recorded traffic was refused, it would say “REJECT”).
  • OK – All data was logged normally during the capture window: OK. This could also be set to NODATA if there were no observed connections or SKIPDATA if some connection were recorded but not logged for internal capacity reasons or errors.

Note that if your network interface has multiple IP addresses and traffic is sent to a secondary private IP address, the log will show the primary private IP address.

By storing this data and making it searchable, we will be able to answer several security related questions and get a definitive overview of our network.

How does the FlowLogs feature work?

FlowLogs must be enabled per network interface or VPC (Amazon Virtual Private Cloud) wide. You can enable it for a specific network interface by browsing to a network interface in your EC2 (Amazon Elastic Compute Cloud) console and clicking “Create Flow Log” in the Flow Logs tab. A VPC allows you to get a private network to place your EC2 instances into. In addition, all EC2 instances automatically receive a primary ENI so you do not need to fiddle with setting up ENIs.

Enabling FlowLogs for a whole VPC or subnet works similarly by browsing to the details page of a VPC or subnet and selecting “Create Flow Log” form the Flow Logs tab.

AWS will always write FlowLogs to a CloudWatch Log Group. This means that you can instantly browse your logs through the CloudWatch console and confirm that the configuration worked. (Allow 10-15 minutes to complete the first capture window as FlowLogs do not capture real-time log streams, but have a few minutes’ delay.)

How to collect and analyze FlowLogs

Now that you have the FlowLogs in CloudWatch, you will notice that the vast amount of data makes it difficult to extract intelligence from it. You will need an additional tool to further aggregate and present the data.

Luckily, there are two ways to access CloudWatch logs. You can either use the CloudWatch API directly or forward incoming data to a Kinesis stream.

In this post, I’ll be using Graylog as log management tool to further analyze the FlowLogs data simply because this is the tool I have the most experience with. Graylog is an open-source tool that you can download and run on your own without relying on any third-party. You should be able to use other tools like the ELK stack or Splunk, too. Choose your favorite!

The AWS plugin for Graylog has a direct integration with FlowLogs through Kinesis that only needs a few runtime configuration parameters. There is also official Graylog AWS machine images (AMIs) to get started quickly.

FlowLogs in Graylog will look like this:

Example analysis and use-cases

Now let’s view a few example searches and analysis that you can run with this.

Typically, you would browse through the data and explore. It would not take long until you find an out-of-place connection pattern that should not be there.

Example 1: Find internal services that have direct connections from the outside

Imagine you are running web services that should not be accessible from the outside directly, but only through an ELB load balancer.

Run the following query to find out if there are direct connections that are bypassing the ELBs:

In a perfect setup, this search would return no results. However, if it does return results, you should check your security groups and make sure that there is no direct traffic from the outside allowed.

We can also dig deeper into the addresses that connected directly to see who owns them and where they are located:

Example 2: Data flow from databases

Databases should only deliver data back to applications that have a legitimate need for that data. If data is flowing to any other destination, this can be an indication of a data breach or an attacker preparing to exfiltrate data from within your networks.

This simple query below will show you if any data was flowing from a RDS instance to a location outside of your own AWS networks:

This hopefully does not return a result, but let’s still investigate. We can follow where the data is flowing to by drilling deeper into the dst_addr field from a result that catches internal connections.

As you see, all destination addresses have a legitimate need for receiving data from RDS. This of course does not mean that you are completely safe, but it does rule out several attack vectors.

Example 3: Detect known C&C channels

If something in your networks is infected with malware, there is a high chance that it will communicate back with C&C (Command & Control) servers. Luckily, this communication cannot be hidden on the low level we are monitoring so we will be able to detect it.

The Graylog Threat Intelligence plugin can compare recorded IP addresses against lists of known threats. A simple query to find this traffic would look like this:

Note that these lists are fairly accurate, but never 100% complete. A hit tells you that something might be wrong, but an empty result does not guarantee that there are no issues.

For an even higher hit rate, you can collect DNS traffic and match the requested hostnames against known threat sources using Graylog.

Use-cases outside of security

The collected data is also incredibly helpful in non-security related use-cases. For example, you can run a query like this to find out where your load balancers (ELBs) are making requests to:

Looking from the other side, you could see which ELBs a particular EC2 instance is answering to:

Next steps

CloudTrail

You can send CloudTrail events into Graylog and correlate recorded IP addresses with FlowLog activity. This will allow you to follow what a potential attacker or suspicious actor has performed at your perimeter or even inside your network.

Dashboards

With the immense amount of data and information coming in every second, it is important to have measures in place that will help you keep an overview and not miss any suspicious activity.

Dashboards are a great way to incorporate operational awareness without having to perform manual searches and analysis. Every minute you invest in good dashboards will save you time in the future.

Alerts

Alerts are a helpful tool for monitoring your environment. For example, Graylog can automatically trigger an email or Slack message the moment a login from outside of your trusted network occurs. Then, you can immediately investigate the activity in Graylog.

Conclusion

Monitoring and analyzing your FlowLogs is vital for staying protected against intrusions. By combining the ease of AWS CloudWatch with the flexibility of Graylog, you can dive deeper in your data and spot anomalies.

About the Author

Lennart Koopmann is the founder of Graylog and started the project in 2010. He has a strong software development background and is also experienced in network and information security.

About the Editors

Zoltán Lajos Kis joined Ericsson in 2007 to work with scalable peer-to-peer and self organizing networks. Since then he has worked with various telecom core products and then on Software Defined Networks. Currently his focus is on cloud infrastructure and big data services.


Taming and controlling storage volumes

18. December 2016 2016 0

Author: James Andrews

Introduction

It’s easy to use scripts in AWS to generate a lot of EC2 virtual machines quickly. Along with the creation of EC2 instances, persistent block storage volumes via AWS EBS may be created and attached. These EC2 instances might be used briefly and then discarded but by default the EBS volumes are not destroyed.

Continuous Integration

Continuous Integration (CI) is now a standard technique in software development. When doing CI it’s usually a good idea to build at least some of the infrastructure used in what are called ‘pipelines’ from scratch. Fortunately, AWS is great at this and has a lot of methods that can be used.

When CI pipelines are mentioned, Docker is often in the same sentence although Docker and other containerized environments are not universally adopted. Businesses are still using software controlled infrastructure with separate virtual machines for each host. So I’ll be looking at this type of environment which uses EC2 as separate virtual machines.

A typical CI run will generate some infrastructure such as a EC2 virtual machine, load some programs onto it, run some tests and then tear the EC2 down afterwards. It’s during this teardown process that care needs to be taken on AWS to avoid leaving attached EBS volumes existing.

AWS allows the use of multiple environments – as many as you’d like to pay for. This allows different teams or projects to run their own CI as independent infrastructure. The benefit is that testing on multiple, different streams of development is possible. Unfortunately multiple environments can also increase the problem of remnant EBS volumes that are not correctly removed after use.

Building CI Infrastructure

I’ve adapted a couple of methods for making CI infrastructure.

1. Cloudformation + Troposphere

AWS CloudFormation is a great tool but one drawback is the use of JSON (or now yaml) configuration files to drive it.  These are somewhat human readable but are excessively long and tend to contain “unrolled” repeated phrases if several similar EC2 instances are being made.

To mitigate this problem I have started using Troposphere, which is a python module to generate CloudFormation templates.   I should point out that similar CloudFormation template making libraries are available in other languages, including Go and ruby.

It may seem like just more work to use a program to make what is essentially another program ( the template ) but by abstracting the in house rules for making an EC2 just how you need it, template development is quicker and the generator programs are easier to understand than the templates.

Here is an example troposphere program:

When the program is run it prints a json template to stdout.

Once the templates are made they can either be deployed by the console or AWS cli.

2. Boto and AWS cli

There is also a python module to directly call the AWS API.  This module is called boto.  By using this the EC2 can be made directly when running the boto program. Again, similar libraries are available in other languages.  With the troposphere generator, there is an intermediate step but boto programs directly make the AWS infrastructure items.

If you have used AWS cli, boto is very similar but allows all the inputs and outputs to be manipulated in python.  Typical boto programs are shorter than AWS cli shell programs and contain less API calls.  This is as it is much easier to get the required output data in python than in shell or the AWS cli.

Monitoring EBS volumes

If you are managing an environment where it is difficult to apply controls to scripts then there can be a buildup of “dead volumes” that are not attached to any live instance.  Some of these might be of historical interest but mostly they are just sitting there accruing charges.

Volumes that should be kept should be tagged appropriately so that they can be charged to the correct project or deleted when their time is past.  However, when scripts are written quickly to get the job done, often this is not the case.

This is exactly the situation that our team found ourself in following an often frantic migration of a legacy environment to AWS.  To audit the volumes, I wrote this boto script.  It finds all volumes that are not attached to an EC2 and which have no tags.  This script is running as a cronjob on a management node in our infrastructure.

3. AWS Config

A third option is to use AWS tags. This can be  an important technique in a complex AWS environment being managed by a team.  Tags attach metadata to almost any type of AWS object, including the EBS volumes we are looking at in this article.  The tags can be used to help apply the correct policy processes to volumes to prevent wasteful use of storage.

Establishing this sort of policy is (you might have guessed) something that Amazon have thought of already.  They provide a great tool unsurprisingly named the “AWS Configuration Management tool” for monitoring, tagging, and applying other types of policy rules.

The AWS Config management tool provides a framework for monitoring compliance to various rules you can set.  It also will run in response to system events (like the termination of an EC2) or on a timer like cron.

It has a built in rule for finding objects that have no tags (REQUIRED_TAGS, see http://docs.aws.amazon.com/config/latest/developerguide/evaluate-config_use-managed-rules.html )   This rule doesn’t quite do the same as my script but is a viable alternative.  If you need slightly different rules to the predefined ones then AWS Config has a system for adding custom rules using Lambda programs which could alternately be used.

Stopping the problem at source

If you are writing new scripts now and want to ensure that  setting delete on termination for volumes happens in your system automation scripts then the best thing to do is to add the options to make this happen in the script.  EBS volumes attached to EC2 instances have a setting for “delete on termination”

The default for volume creation with CloudFormation and the API are not set to delete on termination.  I was surprised to find that deleting a CloudFormation stack (which contained default settings) did not remove the EBS volumes it made.

A good way to overcome this is to set the delete on termination flag.

In AWS shell CLI, get the instanceID and  add this command after a “run-instances”.

In CloudFormation the Type”: “AWS::EC2::Instance” sections make volumes from the AMI automatically. To set the delete on termination add a block like this in the Properties section.

To do the same thing in troposphere make a ec2.Instance object and use the BlockDeviceMappings attribute.

This attribute could be added to the example troposphere program in the “doInstance” method at line 22 of the file.

Summary

The problem of uncontrolled EBS volume use can be brought under control by the use of tagging and using a script or AWS Config to scan for unused volumes.

Unattached, unused EBS volumes can be prevented in a software controlled infrastructure by setting the “DeleteOnTermination” flag – this does not happen by default in CloudFormation.

About the Author

James Andrews has been working for 25 years as a Systems Administrator. On the weekend he rides his bike around Wales.


Session management for Web-Applications on AWS Cloud

17. December 2016 2016 0

Author: Nitheesh Poojary
Editors: Vinny Carpenter, Justin Alan Ryan

Introduction

When a user uses web pages in a given browser, a user session is created by the server and the session ID is managed internally during the web session of the user. For example, when a user viewed three pages and logs out, this is termed as one web session. The HTTP protocol is stateless, so the server and the browser should have a way of storing the identity of each user session. Each HTTP request-response between the client and application happens on a separate TCP connection. Each request to the web server is executed independently without any knowledge of any previous requests. Hence it is very important to store & manage the web session of a user.

Below is the typical example for session scenarios in real world web application:

  1. The user’s browser send the first request to the server.
  2. The server checks whether the browser has passed along the browser cookie that contained session information.
  3. If the server does not ‘know’ the client
    1. The server creates a new unique identifier and puts it in a Map (roughly), as a key, whose value is the newly created Session. It also sends a cookie response containing the unique identifier.
    2. The browser stores the session cookie containing the unique identifier and uses it for each subsequent request to identify itself uniquely with web server
  4. If the server already knows the client – the server obtains the session data corresponding to the passed unique identifier found in the session cookie and serves the requested page. The data is typically stored in server memory and looked upon each request via session key.
Example: In most shopping cart applications, when a user adds an item to their cart and continues browsing the site, a session is created and stored on the server side.

On-Premise Web Server Architecture

  1. Load Balancing: Load Balancing automatically distributes the incoming traffic across multiple instances attached to the load balancer.
  2. Application/WebTier: It controls application functionality by performing detailed processing such as calculation or making decisions. This layer typically communicates with all other parts of the architecture such as a database, caching, queue, web service calls, etc. The process data will be provided back to presentation layer i.e. web server which serves the web pages.
  3. Database-Tier: This is the persistent layer which will have RDBMS servers where information is stored and retrieved. The information is then passed back to the logic tier for processing and then eventually back to the user. In some older architectures, sessions were handled at Database Tier.

Alternatives solution on AWS for session management

Sticky Session

The solution described above works fine when we are running application on a single server. Current business scenario demands high scalability & availability of the hosted solution, but this approach has limitations of horizontal scaling for large scale system requirements. AWS cloud platform has all the required infrastructure, network components designed for horizontal scaling across multiple zones to ensure high availability on demand to make most efficient use of resources and minimize the cost. Application deployment / migration on AWS requires pro-active thinking especially in the case of application session management.

Considering our scenario mentioned above, the developer enables server sessions. When our application needs scale up (Horizontal Scaling) from one to many servers, we will deploy the application server behind a load balancer. By default Elastic load balancer routes each user’s request to the application instance with less load using round robin algorithm (Is this true for the classic ELB without using weighted ELB?). We have to ensure that load balancer sends all requests from a single user to the same server where is session is created. In this scenario, ELB sticky session (also known as session affinity) comes in handy as it does NOT require any code changes within the application. When we enable sticky session in ELB, the ELB keeps track of all user requests and which server it has routed their past requests and start sending requests to the same server.

ELB supports two ways of managing the stickiness’ duration: either by specifying the duration explicitly or by indicating that the stickiness expiration should follow the expiration of the application server’s session cookie.

Web Application Architecture on AWS

Challenges with Sticky Session

  1. Scaling instance Down: This problem comes in when a load balancer is forced to redirect users to a different server because one of the servers fails health checks. ELB by design does not route requests to unhealthy servers leading to loss of all user’s session data associated with that unhealthy server. Users will be logged out of the application abruptly and asked to login again leading to user dissatisfaction. When scaling up (adding more servers), ELB maintains stickiness of existing sessions. Only new connections will be forwarded to the newly-added servers.
  2. ELB Round-Robin Algorithm: ELB used round robin algorithm to distribute the load to the servers. ELB sends load fairly evenly to all servers. If the server becomes unresponsive for some reasons, ELB detects this and begins to redirect the traffic to the different server. The resulting application server will be up and user experience a glitch rather than an outage.
  3. Request from same IP: ELB associates sessions with a user is through the IP address of the requests. If Multiple users are passing through NAT, all user requests redirected the same server.

Best Practices

As sticky session implementation of session management has challenges when the concurrency usage is higher as well as thicker, we can also use some of the technologies highlighted below to create a more scalable as well as manageable architecture.

  1. Session storing using RDBMS with Read Replica: Common solution to overcome this problem is to setup a dedicated session-state server with a database. The Web Session is created and written to the RDS Master database, and subsequent Session reads are done from the Read Replica slaves. JBoss and Tomcat have built-in mechanisms to handle session from a dedicated server such as MySQL. Web Sessions from the Application layer are synchronized in the Centralized Master database.  This approach is not recommended for applications which have heavy traffic and required high scalability. This solution requires a high-performance SSD storage with dedicated IOPS.  This approach has a few drawbacks like DB license, growth management, failover / high availability mechanism, etc.
  2. NoSQL-Dynamo DB: The challenges faced while using RDBMS for Session storing was its administration workload as well as scalability. AWS Dynamo DB is a NoSQL database that can handle massive concurrent read and writes. Using AWS Dynamo DB console one can configure reads/writes per second and accordingly Dynamo DB will provision the required infrastructure at the backend. So scalability and administration needs are taken care by the service itself. Internally all data items are stored on Solid State Drives (SSDs) and are automatically replicated across three Availability Zones in a Region to provide built-in high availability and data durability. AWS Dynamo DB also provides SDK’s and session state extensions for a variety of languages such as Java,Net, PHP, Ruby, etc.

References

Following links can be used handling session management for Java and PHP application:

 


Limiting your Attack Surface in the AWS cloud

15. December 2016 2016 0

Author: Andrew Langhorn
Editors: Dean Wilson

The public cloud provides many benefits to organizations both large and small: you pay only for what you use, you can fine-tune and tailor your infrastructure to suit your needs, and change these configurations easily on-the-fly, and you can bank on a reliable and scalable infrastructure underpinning it all that’s managed on your behalf. Increasingly, organizations are moving to using public cloud, and in many cases, the death knell for the corporate data centre is beginning to sound.
In this post, we will discuss how you can architect your applications in the AWS cloud to be as secure as possible, and to look at some often-underused features and little-known systems that can help you do this. We’ll also consider the importance of building security into your application infrastructure from the start.

Build everything in a VPC

A VPC, or Virtual Private Cloud, is a virtual, logically-separate section of the AWS cloud in which you define the architecture of your network as if it were a physical one, with consideration for how you connect to and from other networks.

This gives you an extraordinary amount of control over how your packets flow, both inside your network and outside of it. Many organizations treat their AWS VPCs as extensions of on-premise data centers, or other public clouds. To this end, given the number of advantages, you’ll definitely want to be at least looking at housing your infrastructure in a VPC from the get-go.

AWS accounts created since 2013 have had to make use of a VPC, even if you’re not defining your own; AWS will spin all of your VPC-supported infrastructure up inside a ‘default’ VPC which you’re assigned when you create your account. But, this default VPC isn’t great — ideally, you want VPCs which correspond with how you’ll use your account: maybe per-environment, per-application or for use in a blue/green scenario.

Therefore, not only make use of VPC, but define your VPCs yourself. They’re fairly straightforward to use; for instance, the aws_vpc resource in Hashicorp’s Terraform tool requires only a few parameters, to quickly instantiate a VPC in entirety.

Subnets

Subnets are a way of logically dividing a network you’ve defined and created, and are a mainstay of networking. They allow different bits of your network to be separated away from each other. Permissions for traffic flow between subnets is managed by other devices, such as firewalls. In no way are subnets an AWS-specific thing. Regardless, they’re extremely useful.

Largely speaking, there are two types of subnet: public and private. As you might guess, a public subnet is one that can be addressed by IP addresses which can be announced to the world as existing, and a private subnet is one that can only addressed by IPs defined in RFC1918.

Perhaps next time you build some infrastructure, consider instead having everything you can in a private subnet, and using your public subnet as a DMZ. I like to treat my public subnets in this way, and use them only for things that are throw-away, that can be re-built easily, and which don’t have any direct access to any sensitive data: yes, that involves creating security group rules, updating ACLs and such, but the ability to remove any direct access at a deep level is so fundamental to the ways in which the internet works adds to my belief that securing stacks in an onion-like fashion (defense-in-depth) is the best way to do it. An often-followed pattern is to use Elastic Load Balancers in the public subnets, and EC2 Auto-Scaling Groups in the private ones. Routing between the two subnets is handled by the Elastic Load Balancers, and routing egress from the private subnet can be handled by a NAT Gateway.

Route tables

Inside VPCs, you can use route tables to control the path packets take over IP from source to destination, much as you can on almost any other internet-connected device. Routing tables in AWS are no different to those outside AWS. One thing they’re very useful for, and we’ll come back to this later, is for helping route traffic for S3 over a private interface to S3, or for enforcing separation of concerns at a low IP-based level, helping you meet compliance and regulation requirements.

Inside a VPC, you are able to define a Flow Log, which captures details about the packets traveling across your network, and dumps them in to Cloudwatch Logs for you to scrutinize at a later date, or stream to S3, Redshift, the Elasticsearch Service or elsewhere using a service such as Kinesis Firehose.

Security groups

Security groups work just like stateful ingress and egress firewalls. You define a group, add some rules for ingress traffic, and some more for egress traffic, and watch your packets flow. By default, they deny access to any ingress traffic and allow all egress traffic, which means that if you don’t set security groups up, you won’t be able to get to your AWS infrastructure.

It’s possible, and entirely valid, to create a rule to allow all traffic on all protocols both ingress and egress, but in doing so, you’re not really using security groups but working around them. They’re your friend: they can help you meet compliance regulations, satisfy your security-focused colleagues and are – largely – a mainstay and a staple of networking and running stuff on the internet. At least, by default, you can’t ignore them!

If you’re just starting out, consider using standard ports for services living in the AWS cloud. You can enable DNS resolution at a VPC level, and use load balancers as described below, to help you use the same ports for your applications across your infrastructure, helping simplify your security groups.

Note that there’s a limit on the number of security group rules you can have – the combined total of rules and groups cannot exceed 250. So, that’s one group with 250 rules, 250 groups with one rule, or any mixture thereof. Use them wisely, and remember that you can attach the same group to multiple AWS resources. One nice pattern is to create a group with common rules – such as Bastion host SSH ingress, monitoring services ingress etc. – and attach it to multiple resources. That way, changes are applied quickly, and you’re using security groups efficiently.

Network ACLs

Once you’ve got your security groups up and running, and traffic’s flowing smoothly, take a look at network ACLs, which work in many ways as a complement to security groups: they act at a lower-level, but reinforce the rules you’ve previously created in your security groups, and are often used to explicitly deny traffic. Take a look at adding them when you’re happy your security groups don’t need too much further tweaking!

Soaking up TCP traffic with Elastic Load Balancers and AWS Shield

Elastic Load Balancers are useful for, as the name suggests, balancing traffic across multiple application pools. However, they’re also fairly good at scaling upward, hence the ‘elastic’ in their name. We can harness that elasticity to provide a good solid barrier between the internet and our applications, but also to bridge public and private subnets.

Since you can restrict traffic on both internal (as in, facing your compute infrastructure) and external (facing away from it), Elastic Load Balancers both allow traffic to bridge subnets, but also act as a barrier to shield against TCP floods in to your private subnets.

This year, AWS announced Shield, a managed DDoS protection offering, which is enabled by default for all customers. A paid-for offering, AWS Shield Advanced, offers support for integrating with your Elastic Load Balancers, CloudFront distributions and Route 53 record sets, as well as a consulting function, the DDoS Response Team, and protection for your AWS bill against traffic spikes causing you additional cost.

Connecting to services over private networks

If you’ve managed to create a service entirely within a private subnet, then the last thing you really want to do is to have to connect over public networks to get access to certain data, especially if you’re in a regulated environment, or care about the security of your data (which you really should do!).

Thankfully, AWS provides two ways of accessing to your data over private networks. Some services, such as Amazon RDS and Amazon ElastiCache, allow you to have the A record they insert in to DNS under an Amazon-managed zone populated by an available IP address in your private subnet. That way, whilst your DNS record is in the open, the A record is only really useful if you’re already inside the subnet where it’s connected to the Amazon-managed service. The record is published in a public zone, but anyone else who tries to connect to the address will either be unable to, or will get to a system on their own network at the same address!

Another, newer, way of connecting to a service from a private address is to use a VPC Endpoint, where Amazon establishes a route to a public service – currently, only S3 is supported – from within your private subnet, and amends your route table appropriately. This means traffic hits S3 entirely via your private subnet, by extending the borders of S3 and your subnet close to each other, so that S3 can appear in your subnet.

STS: the Security Token Service

The AWS Security Token Service works with Identity and Access Management (IAM) to allow you to request temporary IAM credentials for users who authenticate using federated identity services (see below) or for users defined directly in IAM itself. I like to use the STS GetFederationToken API call with federated users, since they can authenticate with my existing on-premise service, and receive temporary IAM credentials directly from AWS in a self-service fashion.

By default, AWS enables STS in all regions, which allows potential attackers to request credentials.

By default, AWS enables STS in all available regions. Instead, it’s safer to turn STS on only when you need it in specific regions, since that way, you’re scoping your attack surface solely to regions which you know you rely upon. You can turn STS region endpoints on and off, with the exception of the US (East) region, in the IAM console under the Account Settings tab.

You can disable STS on a region-by-region basis; consider doing so unless you’re using a region.

Federating AWS Identity and Access Management using SAML or OIDC

Many organizations already have some pre-existing authentication database for authenticating employees trying to connect to their email inboxes, to expenses systems, and a whole host of other internal systems. There are typically policies and procedures around access control already in place, often involving onboarding and offboarding processes, so when a colleague joins, leaves or changes role in an organization, the authentication database and related permissions are adequately updated.

You can federate authentication systems which use SAML or OpenID Connect (OIDC) to IAM, allowing authentication of your users to occur locally against existing systems. This works well with products such as Active Directory (through Active Directory Federation Services) and Google Apps for Work, but I’ve also heard about Oracle WebLogic, Auth0, Shibboleth, Okta, Salesforce, Apache Altu, and modules for Nginx and Apache being used.

That way, when a colleague joins, as long as they’ve been granted the relevant permissions in your authentication service, they’ve got access to assume IAM roles, which you define, in the AWS console. And, when they leave, their access is revoked from AWS as soon as you remove their federated identity account. Unfortunately, though, there exists a caveat: since a generated STS token can’t be revoked, then if the identity account has been removed, the STS token may still work, and is still valid. To work around this, another good practice is to enforce a low expiration time, since the default of twelve hours is quite high.

Credentials provider chain

Whilst you may use a username and passphrase to get access to the AWS Management Console, the vast majority of programmatic access is going to authenticate with AWS using an IAM access key and secret key pair. You generate these on a per-user basis in the IAM console, with a maximum of two available at any one time. However, it’s how you use them that’s really the crux of the matter.

The concept of the credentials provider chain exists to assist services calling an AWS API through one of the many language-specific SDKs work out where to look for IAM credentials, and in what order to use them.

The AWS SDKs look for IAM credentials in the following order:

  • through environment variables
  • through JVM system properties
  • on disk, at ~/.aws/credentials
  • from the AWS EC2 Container Service
  • or from the EC2 metadata service

I’m never a massive fan of hard-coding credentials on disk, so I prefer recommending that keys are either transparently handled through the metadata service (you can use IAM roles and instance profiles to help you provide keys to instances, wrapped using STS) when an EC2 instance makes a call needing authentication, or that the keys are passed using environment variables. Regardless, properly setting IAM policies is important: if your application needs only access to put files in to S3 and read from ElastiCache, then only let it do that!

Multi-factor authentication

IAM allows you to enforce the use of multi-factor authentication (MFA), or two-factor authentication as it’s often known elsewhere. It’s generally good practice to use this on all of your accounts – especially your root account, since that holds special privileges that IAM accounts don’t get by default, such as access to your billing information.

It’s generally recommended that you enable MFA on your root account, create another IAM user for getting access to the Management Console and APIs, and then create your AWS infrastructure using these IAM users. In essence, you should get out of the habit of using the root account as quickly as possible after enforcing MFA on it.

In many organisations, access to the root account is not something you want to tie down to one named user, but when setting up MFA, you need to provide two codes from an MFA device to enable it (since this is how AWS checks that your MFA device has been set up correctly and is in sync). The QR code provided contains the secret visible using the link below it, and this secret can be stored in a password vault or a physical safe, where others can use it to re-generate the QR code, if required. Scanning the QR code will also give you a URL which you can use on some devices to trigger opening of an app like Google Authenticator. You can request AWS disables MFA on your root account at any time, per the documentation.

Conclusion

Hopefully, you’re already doing – or at least thinking – about some of the ideas above for use in the AWS infrastructure in your organization, but if you’re not, then start thinking about them. Whilst some of the services mentioned above, such as Shield and IAM, offer security as part of their core offering, others – like using Elastic Load Balancers to soak up TCP traffic, using Network ACLs to explicitly deny traffic, or thinking about your architecture by considering public subnets as DMZs – are often overlooked as they’re a little less obvious.

Hopefully, the tips above can help you create a more secure stack in future.

About the Author

Andrew Langhorn is a senior consultant at ThoughtWorks. He works with clients large and small on all sorts of infrastructure, security and performance problems. Previously, he was up to no good helping build, manage and operate the infrastructure behind GOV.UK, the simpler, clearer and faster way to access UK Government services and information. He lives in Manchester, England, with his beloved gin collection, blogs at ajlanghorn.com, and is a firm believer that mince pies aren’t to be eaten before December 1st.

About the Editors

Dean Wilson (@unixdaemon) is a professional FOSS Sysadmin, occasional coder and very occasional blogger at www.unixdaemon.net. He is currently working as a web operations engineer for Government Digital Service in the UK.