AWS network security monitoring with FlowLogs

19. December 2016 2016 0

Author: Lennart Koopmann
Editors: Zoltán Lajos Kis

Regardless if you are running servers in AWS or your own data center, you need to have a high level of protection against intrusions. No matter how strict your security groups and local iptables are configured, there is always the chance that a determined attacker will make it past these barriers and move laterally within your network. In this post, I will walk through how to protect your AWS network with FlowLogs. From implementation and collection of FlowLogs in CloudWatch, to the analyzation of the data with Graylog, a log management system, you will be fully equipped to monitor your environment.

Introduction

As Rob Joyce, Chief of TAO at the NSA discussed in his talk at USENIX Enigma 2015, it’s critical to know your own network: What is connecting where, which ports are open, and what are usual connection patterns.

Fortunately AWS has the FlowLogs feature, which allows you to get a copy of raw network connection logs with a significant amount of metadata. This feature can be compared to Netflow capable routers, firewalls, and switches in classic, on-premise data centers.

FlowLogs are available for every AWS entity that uses Elastic Network Interfaces. The most important services that do this are EC2, ELB, ECS and RDS.

What information do FlowLogs include?

Let’s look at an example message:

This message tells us that the following network connection was observed:

  • 2 – The VPC flow log version is 2
  • 123456789010- The AWS account id was 123456789010
  • eni-abc 123de- The recording network interface was eni-abc123de. (ENI is Elastic Network Interface)
  • 172.31.16.139:20641 and 172.31.16.21.22 – 172.31.16.139:20641 attempted to connect to 172.31.16.21:22
  • 6 – The IANA protocol number used was 6 (TCP)
  • 20 and 429 – 4249 bytes were exchanged over 20 packets
  • 1418530010 – The start of the capture window in Unix seconds was 12/4/2016 at 4:06 am (UTC) ((A capture window is a duration of time which AWS aggregates before publishing the logs.The published logs will have a more accurate timestamp as metadata later.)
  • 1418630070 – The end of the capture window in Unix seconds was 12/4/2016 at 4:07 am (UTC)
  • ACCEPT – The recorded traffic was accepted. (If the recorded traffic was refused, it would say “REJECT”).
  • OK – All data was logged normally during the capture window: OK. This could also be set to NODATA if there were no observed connections or SKIPDATA if some connection were recorded but not logged for internal capacity reasons or errors.

Note that if your network interface has multiple IP addresses and traffic is sent to a secondary private IP address, the log will show the primary private IP address.

By storing this data and making it searchable, we will be able to answer several security related questions and get a definitive overview of our network.

How does the FlowLogs feature work?

FlowLogs must be enabled per network interface or VPC (Amazon Virtual Private Cloud) wide. You can enable it for a specific network interface by browsing to a network interface in your EC2 (Amazon Elastic Compute Cloud) console and clicking “Create Flow Log” in the Flow Logs tab. A VPC allows you to get a private network to place your EC2 instances into. In addition, all EC2 instances automatically receive a primary ENI so you do not need to fiddle with setting up ENIs.

Enabling FlowLogs for a whole VPC or subnet works similarly by browsing to the details page of a VPC or subnet and selecting “Create Flow Log” form the Flow Logs tab.

AWS will always write FlowLogs to a CloudWatch Log Group. This means that you can instantly browse your logs through the CloudWatch console and confirm that the configuration worked. (Allow 10-15 minutes to complete the first capture window as FlowLogs do not capture real-time log streams, but have a few minutes’ delay.)

How to collect and analyze FlowLogs

Now that you have the FlowLogs in CloudWatch, you will notice that the vast amount of data makes it difficult to extract intelligence from it. You will need an additional tool to further aggregate and present the data.

Luckily, there are two ways to access CloudWatch logs. You can either use the CloudWatch API directly or forward incoming data to a Kinesis stream.

In this post, I’ll be using Graylog as log management tool to further analyze the FlowLogs data simply because this is the tool I have the most experience with. Graylog is an open-source tool that you can download and run on your own without relying on any third-party. You should be able to use other tools like the ELK stack or Splunk, too. Choose your favorite!

The AWS plugin for Graylog has a direct integration with FlowLogs through Kinesis that only needs a few runtime configuration parameters. There is also official Graylog AWS machine images (AMIs) to get started quickly.

FlowLogs in Graylog will look like this:

Example analysis and use-cases

Now let’s view a few example searches and analysis that you can run with this.

Typically, you would browse through the data and explore. It would not take long until you find an out-of-place connection pattern that should not be there.

Example 1: Find internal services that have direct connections from the outside

Imagine you are running web services that should not be accessible from the outside directly, but only through an ELB load balancer.

Run the following query to find out if there are direct connections that are bypassing the ELBs:

In a perfect setup, this search would return no results. However, if it does return results, you should check your security groups and make sure that there is no direct traffic from the outside allowed.

We can also dig deeper into the addresses that connected directly to see who owns them and where they are located:

Example 2: Data flow from databases

Databases should only deliver data back to applications that have a legitimate need for that data. If data is flowing to any other destination, this can be an indication of a data breach or an attacker preparing to exfiltrate data from within your networks.

This simple query below will show you if any data was flowing from a RDS instance to a location outside of your own AWS networks:

This hopefully does not return a result, but let’s still investigate. We can follow where the data is flowing to by drilling deeper into the dst_addr field from a result that catches internal connections.

As you see, all destination addresses have a legitimate need for receiving data from RDS. This of course does not mean that you are completely safe, but it does rule out several attack vectors.

Example 3: Detect known C&C channels

If something in your networks is infected with malware, there is a high chance that it will communicate back with C&C (Command & Control) servers. Luckily, this communication cannot be hidden on the low level we are monitoring so we will be able to detect it.

The Graylog Threat Intelligence plugin can compare recorded IP addresses against lists of known threats. A simple query to find this traffic would look like this:

Note that these lists are fairly accurate, but never 100% complete. A hit tells you that something might be wrong, but an empty result does not guarantee that there are no issues.

For an even higher hit rate, you can collect DNS traffic and match the requested hostnames against known threat sources using Graylog.

Use-cases outside of security

The collected data is also incredibly helpful in non-security related use-cases. For example, you can run a query like this to find out where your load balancers (ELBs) are making requests to:

Looking from the other side, you could see which ELBs a particular EC2 instance is answering to:

Next steps

CloudTrail

You can send CloudTrail events into Graylog and correlate recorded IP addresses with FlowLog activity. This will allow you to follow what a potential attacker or suspicious actor has performed at your perimeter or even inside your network.

Dashboards

With the immense amount of data and information coming in every second, it is important to have measures in place that will help you keep an overview and not miss any suspicious activity.

Dashboards are a great way to incorporate operational awareness without having to perform manual searches and analysis. Every minute you invest in good dashboards will save you time in the future.

Alerts

Alerts are a helpful tool for monitoring your environment. For example, Graylog can automatically trigger an email or Slack message the moment a login from outside of your trusted network occurs. Then, you can immediately investigate the activity in Graylog.

Conclusion

Monitoring and analyzing your FlowLogs is vital for staying protected against intrusions. By combining the ease of AWS CloudWatch with the flexibility of Graylog, you can dive deeper in your data and spot anomalies.

About the Author

Lennart Koopmann is the founder of Graylog and started the project in 2010. He has a strong software development background and is also experienced in network and information security.

About the Editors

Zoltán Lajos Kis joined Ericsson in 2007 to work with scalable peer-to-peer and self organizing networks. Since then he has worked with various telecom core products and then on Software Defined Networks. Currently his focus is on cloud infrastructure and big data services.


Taming and controlling storage volumes

18. December 2016 2016 0

Author: James Andrews

Introduction

It’s easy to use scripts in AWS to generate a lot of EC2 virtual machines quickly. Along with the creation of EC2 instances, persistent block storage volumes via AWS EBS may be created and attached. These EC2 instances might be used briefly and then discarded but by default the EBS volumes are not destroyed.

Continuous Integration

Continuous Integration (CI) is now a standard technique in software development. When doing CI it’s usually a good idea to build at least some of the infrastructure used in what are called ‘pipelines’ from scratch. Fortunately, AWS is great at this and has a lot of methods that can be used.

When CI pipelines are mentioned, Docker is often in the same sentence although Docker and other containerized environments are not universally adopted. Businesses are still using software controlled infrastructure with separate virtual machines for each host. So I’ll be looking at this type of environment which uses EC2 as separate virtual machines.

A typical CI run will generate some infrastructure such as a EC2 virtual machine, load some programs onto it, run some tests and then tear the EC2 down afterwards. It’s during this teardown process that care needs to be taken on AWS to avoid leaving attached EBS volumes existing.

AWS allows the use of multiple environments – as many as you’d like to pay for. This allows different teams or projects to run their own CI as independent infrastructure. The benefit is that testing on multiple, different streams of development is possible. Unfortunately multiple environments can also increase the problem of remnant EBS volumes that are not correctly removed after use.

Building CI Infrastructure

I’ve adapted a couple of methods for making CI infrastructure.

1. Cloudformation + Troposphere

AWS CloudFormation is a great tool but one drawback is the use of JSON (or now yaml) configuration files to drive it.  These are somewhat human readable but are excessively long and tend to contain “unrolled” repeated phrases if several similar EC2 instances are being made.

To mitigate this problem I have started using Troposphere, which is a python module to generate CloudFormation templates.   I should point out that similar CloudFormation template making libraries are available in other languages, including Go and ruby.

It may seem like just more work to use a program to make what is essentially another program ( the template ) but by abstracting the in house rules for making an EC2 just how you need it, template development is quicker and the generator programs are easier to understand than the templates.

Here is an example troposphere program:

When the program is run it prints a json template to stdout.

Once the templates are made they can either be deployed by the console or AWS cli.

2. Boto and AWS cli

There is also a python module to directly call the AWS API.  This module is called boto.  By using this the EC2 can be made directly when running the boto program. Again, similar libraries are available in other languages.  With the troposphere generator, there is an intermediate step but boto programs directly make the AWS infrastructure items.

If you have used AWS cli, boto is very similar but allows all the inputs and outputs to be manipulated in python.  Typical boto programs are shorter than AWS cli shell programs and contain less API calls.  This is as it is much easier to get the required output data in python than in shell or the AWS cli.

Monitoring EBS volumes

If you are managing an environment where it is difficult to apply controls to scripts then there can be a buildup of “dead volumes” that are not attached to any live instance.  Some of these might be of historical interest but mostly they are just sitting there accruing charges.

Volumes that should be kept should be tagged appropriately so that they can be charged to the correct project or deleted when their time is past.  However, when scripts are written quickly to get the job done, often this is not the case.

This is exactly the situation that our team found ourself in following an often frantic migration of a legacy environment to AWS.  To audit the volumes, I wrote this boto script.  It finds all volumes that are not attached to an EC2 and which have no tags.  This script is running as a cronjob on a management node in our infrastructure.

3. AWS Config

A third option is to use AWS tags. This can be  an important technique in a complex AWS environment being managed by a team.  Tags attach metadata to almost any type of AWS object, including the EBS volumes we are looking at in this article.  The tags can be used to help apply the correct policy processes to volumes to prevent wasteful use of storage.

Establishing this sort of policy is (you might have guessed) something that Amazon have thought of already.  They provide a great tool unsurprisingly named the “AWS Configuration Management tool” for monitoring, tagging, and applying other types of policy rules.

The AWS Config management tool provides a framework for monitoring compliance to various rules you can set.  It also will run in response to system events (like the termination of an EC2) or on a timer like cron.

It has a built in rule for finding objects that have no tags (REQUIRED_TAGS, see http://docs.aws.amazon.com/config/latest/developerguide/evaluate-config_use-managed-rules.html )   This rule doesn’t quite do the same as my script but is a viable alternative.  If you need slightly different rules to the predefined ones then AWS Config has a system for adding custom rules using Lambda programs which could alternately be used.

Stopping the problem at source

If you are writing new scripts now and want to ensure that  setting delete on termination for volumes happens in your system automation scripts then the best thing to do is to add the options to make this happen in the script.  EBS volumes attached to EC2 instances have a setting for “delete on termination”

The default for volume creation with CloudFormation and the API are not set to delete on termination.  I was surprised to find that deleting a CloudFormation stack (which contained default settings) did not remove the EBS volumes it made.

A good way to overcome this is to set the delete on termination flag.

In AWS shell CLI, get the instanceID and  add this command after a “run-instances”.

In CloudFormation the Type”: “AWS::EC2::Instance” sections make volumes from the AMI automatically. To set the delete on termination add a block like this in the Properties section.

To do the same thing in troposphere make a ec2.Instance object and use the BlockDeviceMappings attribute.

This attribute could be added to the example troposphere program in the “doInstance” method at line 22 of the file.

Summary

The problem of uncontrolled EBS volume use can be brought under control by the use of tagging and using a script or AWS Config to scan for unused volumes.

Unattached, unused EBS volumes can be prevented in a software controlled infrastructure by setting the “DeleteOnTermination” flag – this does not happen by default in CloudFormation.

About the Author

James Andrews has been working for 25 years as a Systems Administrator. On the weekend he rides his bike around Wales.


Session management for Web-Applications on AWS Cloud

17. December 2016 2016 0

Author: Nitheesh Poojary
Editors: Vinny Carpenter, Justin Alan Ryan

Introduction

When a user uses web pages in a given browser, a user session is created by the server and the session ID is managed internally during the web session of the user. For example, when a user viewed three pages and logs out, this is termed as one web session. The HTTP protocol is stateless, so the server and the browser should have a way of storing the identity of each user session. Each HTTP request-response between the client and application happens on a separate TCP connection. Each request to the web server is executed independently without any knowledge of any previous requests. Hence it is very important to store & manage the web session of a user.

Below is the typical example for session scenarios in real world web application:

  1. The user’s browser send the first request to the server.
  2. The server checks whether the browser has passed along the browser cookie that contained session information.
  3. If the server does not ‘know’ the client
    1. The server creates a new unique identifier and puts it in a Map (roughly), as a key, whose value is the newly created Session. It also sends a cookie response containing the unique identifier.
    2. The browser stores the session cookie containing the unique identifier and uses it for each subsequent request to identify itself uniquely with web server
  4. If the server already knows the client – the server obtains the session data corresponding to the passed unique identifier found in the session cookie and serves the requested page. The data is typically stored in server memory and looked upon each request via session key.
Example: In most shopping cart applications, when a user adds an item to their cart and continues browsing the site, a session is created and stored on the server side.

On-Premise Web Server Architecture

  1. Load Balancing: Load Balancing automatically distributes the incoming traffic across multiple instances attached to the load balancer.
  2. Application/WebTier: It controls application functionality by performing detailed processing such as calculation or making decisions. This layer typically communicates with all other parts of the architecture such as a database, caching, queue, web service calls, etc. The process data will be provided back to presentation layer i.e. web server which serves the web pages.
  3. Database-Tier: This is the persistent layer which will have RDBMS servers where information is stored and retrieved. The information is then passed back to the logic tier for processing and then eventually back to the user. In some older architectures, sessions were handled at Database Tier.

Alternatives solution on AWS for session management

Sticky Session

The solution described above works fine when we are running application on a single server. Current business scenario demands high scalability & availability of the hosted solution, but this approach has limitations of horizontal scaling for large scale system requirements. AWS cloud platform has all the required infrastructure, network components designed for horizontal scaling across multiple zones to ensure high availability on demand to make most efficient use of resources and minimize the cost. Application deployment / migration on AWS requires pro-active thinking especially in the case of application session management.

Considering our scenario mentioned above, the developer enables server sessions. When our application needs scale up (Horizontal Scaling) from one to many servers, we will deploy the application server behind a load balancer. By default Elastic load balancer routes each user’s request to the application instance with less load using round robin algorithm (Is this true for the classic ELB without using weighted ELB?). We have to ensure that load balancer sends all requests from a single user to the same server where is session is created. In this scenario, ELB sticky session (also known as session affinity) comes in handy as it does NOT require any code changes within the application. When we enable sticky session in ELB, the ELB keeps track of all user requests and which server it has routed their past requests and start sending requests to the same server.

ELB supports two ways of managing the stickiness’ duration: either by specifying the duration explicitly or by indicating that the stickiness expiration should follow the expiration of the application server’s session cookie.

Web Application Architecture on AWS

Challenges with Sticky Session

  1. Scaling instance Down: This problem comes in when a load balancer is forced to redirect users to a different server because one of the servers fails health checks. ELB by design does not route requests to unhealthy servers leading to loss of all user’s session data associated with that unhealthy server. Users will be logged out of the application abruptly and asked to login again leading to user dissatisfaction. When scaling up (adding more servers), ELB maintains stickiness of existing sessions. Only new connections will be forwarded to the newly-added servers.
  2. ELB Round-Robin Algorithm: ELB used round robin algorithm to distribute the load to the servers. ELB sends load fairly evenly to all servers. If the server becomes unresponsive for some reasons, ELB detects this and begins to redirect the traffic to the different server. The resulting application server will be up and user experience a glitch rather than an outage.
  3. Request from same IP: ELB associates sessions with a user is through the IP address of the requests. If Multiple users are passing through NAT, all user requests redirected the same server.

Best Practices

As sticky session implementation of session management has challenges when the concurrency usage is higher as well as thicker, we can also use some of the technologies highlighted below to create a more scalable as well as manageable architecture.

  1. Session storing using RDBMS with Read Replica: Common solution to overcome this problem is to setup a dedicated session-state server with a database. The Web Session is created and written to the RDS Master database, and subsequent Session reads are done from the Read Replica slaves. JBoss and Tomcat have built-in mechanisms to handle session from a dedicated server such as MySQL. Web Sessions from the Application layer are synchronized in the Centralized Master database.  This approach is not recommended for applications which have heavy traffic and required high scalability. This solution requires a high-performance SSD storage with dedicated IOPS.  This approach has a few drawbacks like DB license, growth management, failover / high availability mechanism, etc.
  2. NoSQL-Dynamo DB: The challenges faced while using RDBMS for Session storing was its administration workload as well as scalability. AWS Dynamo DB is a NoSQL database that can handle massive concurrent read and writes. Using AWS Dynamo DB console one can configure reads/writes per second and accordingly Dynamo DB will provision the required infrastructure at the backend. So scalability and administration needs are taken care by the service itself. Internally all data items are stored on Solid State Drives (SSDs) and are automatically replicated across three Availability Zones in a Region to provide built-in high availability and data durability. AWS Dynamo DB also provides SDK’s and session state extensions for a variety of languages such as Java,Net, PHP, Ruby, etc.

References

Following links can be used handling session management for Java and PHP application:

 


Limiting your Attack Surface in the AWS cloud

15. December 2016 2016 0

Author: Andrew Langhorn
Editors: Dean Wilson

The public cloud provides many benefits to organizations both large and small: you pay only for what you use, you can fine-tune and tailor your infrastructure to suit your needs, and change these configurations easily on-the-fly, and you can bank on a reliable and scalable infrastructure underpinning it all that’s managed on your behalf. Increasingly, organizations are moving to using public cloud, and in many cases, the death knell for the corporate data centre is beginning to sound.
In this post, we will discuss how you can architect your applications in the AWS cloud to be as secure as possible, and to look at some often-underused features and little-known systems that can help you do this. We’ll also consider the importance of building security into your application infrastructure from the start.

Build everything in a VPC

A VPC, or Virtual Private Cloud, is a virtual, logically-separate section of the AWS cloud in which you define the architecture of your network as if it were a physical one, with consideration for how you connect to and from other networks.

This gives you an extraordinary amount of control over how your packets flow, both inside your network and outside of it. Many organizations treat their AWS VPCs as extensions of on-premise data centers, or other public clouds. To this end, given the number of advantages, you’ll definitely want to be at least looking at housing your infrastructure in a VPC from the get-go.

AWS accounts created since 2013 have had to make use of a VPC, even if you’re not defining your own; AWS will spin all of your VPC-supported infrastructure up inside a ‘default’ VPC which you’re assigned when you create your account. But, this default VPC isn’t great — ideally, you want VPCs which correspond with how you’ll use your account: maybe per-environment, per-application or for use in a blue/green scenario.

Therefore, not only make use of VPC, but define your VPCs yourself. They’re fairly straightforward to use; for instance, the aws_vpc resource in Hashicorp’s Terraform tool requires only a few parameters, to quickly instantiate a VPC in entirety.

Subnets

Subnets are a way of logically dividing a network you’ve defined and created, and are a mainstay of networking. They allow different bits of your network to be separated away from each other. Permissions for traffic flow between subnets is managed by other devices, such as firewalls. In no way are subnets an AWS-specific thing. Regardless, they’re extremely useful.

Largely speaking, there are two types of subnet: public and private. As you might guess, a public subnet is one that can be addressed by IP addresses which can be announced to the world as existing, and a private subnet is one that can only addressed by IPs defined in RFC1918.

Perhaps next time you build some infrastructure, consider instead having everything you can in a private subnet, and using your public subnet as a DMZ. I like to treat my public subnets in this way, and use them only for things that are throw-away, that can be re-built easily, and which don’t have any direct access to any sensitive data: yes, that involves creating security group rules, updating ACLs and such, but the ability to remove any direct access at a deep level is so fundamental to the ways in which the internet works adds to my belief that securing stacks in an onion-like fashion (defense-in-depth) is the best way to do it. An often-followed pattern is to use Elastic Load Balancers in the public subnets, and EC2 Auto-Scaling Groups in the private ones. Routing between the two subnets is handled by the Elastic Load Balancers, and routing egress from the private subnet can be handled by a NAT Gateway.

Route tables

Inside VPCs, you can use route tables to control the path packets take over IP from source to destination, much as you can on almost any other internet-connected device. Routing tables in AWS are no different to those outside AWS. One thing they’re very useful for, and we’ll come back to this later, is for helping route traffic for S3 over a private interface to S3, or for enforcing separation of concerns at a low IP-based level, helping you meet compliance and regulation requirements.

Inside a VPC, you are able to define a Flow Log, which captures details about the packets traveling across your network, and dumps them in to Cloudwatch Logs for you to scrutinize at a later date, or stream to S3, Redshift, the Elasticsearch Service or elsewhere using a service such as Kinesis Firehose.

Security groups

Security groups work just like stateful ingress and egress firewalls. You define a group, add some rules for ingress traffic, and some more for egress traffic, and watch your packets flow. By default, they deny access to any ingress traffic and allow all egress traffic, which means that if you don’t set security groups up, you won’t be able to get to your AWS infrastructure.

It’s possible, and entirely valid, to create a rule to allow all traffic on all protocols both ingress and egress, but in doing so, you’re not really using security groups but working around them. They’re your friend: they can help you meet compliance regulations, satisfy your security-focused colleagues and are – largely – a mainstay and a staple of networking and running stuff on the internet. At least, by default, you can’t ignore them!

If you’re just starting out, consider using standard ports for services living in the AWS cloud. You can enable DNS resolution at a VPC level, and use load balancers as described below, to help you use the same ports for your applications across your infrastructure, helping simplify your security groups.

Note that there’s a limit on the number of security group rules you can have – the combined total of rules and groups cannot exceed 250. So, that’s one group with 250 rules, 250 groups with one rule, or any mixture thereof. Use them wisely, and remember that you can attach the same group to multiple AWS resources. One nice pattern is to create a group with common rules – such as Bastion host SSH ingress, monitoring services ingress etc. – and attach it to multiple resources. That way, changes are applied quickly, and you’re using security groups efficiently.

Network ACLs

Once you’ve got your security groups up and running, and traffic’s flowing smoothly, take a look at network ACLs, which work in many ways as a complement to security groups: they act at a lower-level, but reinforce the rules you’ve previously created in your security groups, and are often used to explicitly deny traffic. Take a look at adding them when you’re happy your security groups don’t need too much further tweaking!

Soaking up TCP traffic with Elastic Load Balancers and AWS Shield

Elastic Load Balancers are useful for, as the name suggests, balancing traffic across multiple application pools. However, they’re also fairly good at scaling upward, hence the ‘elastic’ in their name. We can harness that elasticity to provide a good solid barrier between the internet and our applications, but also to bridge public and private subnets.

Since you can restrict traffic on both internal (as in, facing your compute infrastructure) and external (facing away from it), Elastic Load Balancers both allow traffic to bridge subnets, but also act as a barrier to shield against TCP floods in to your private subnets.

This year, AWS announced Shield, a managed DDoS protection offering, which is enabled by default for all customers. A paid-for offering, AWS Shield Advanced, offers support for integrating with your Elastic Load Balancers, CloudFront distributions and Route 53 record sets, as well as a consulting function, the DDoS Response Team, and protection for your AWS bill against traffic spikes causing you additional cost.

Connecting to services over private networks

If you’ve managed to create a service entirely within a private subnet, then the last thing you really want to do is to have to connect over public networks to get access to certain data, especially if you’re in a regulated environment, or care about the security of your data (which you really should do!).

Thankfully, AWS provides two ways of accessing to your data over private networks. Some services, such as Amazon RDS and Amazon ElastiCache, allow you to have the A record they insert in to DNS under an Amazon-managed zone populated by an available IP address in your private subnet. That way, whilst your DNS record is in the open, the A record is only really useful if you’re already inside the subnet where it’s connected to the Amazon-managed service. The record is published in a public zone, but anyone else who tries to connect to the address will either be unable to, or will get to a system on their own network at the same address!

Another, newer, way of connecting to a service from a private address is to use a VPC Endpoint, where Amazon establishes a route to a public service – currently, only S3 is supported – from within your private subnet, and amends your route table appropriately. This means traffic hits S3 entirely via your private subnet, by extending the borders of S3 and your subnet close to each other, so that S3 can appear in your subnet.

STS: the Security Token Service

The AWS Security Token Service works with Identity and Access Management (IAM) to allow you to request temporary IAM credentials for users who authenticate using federated identity services (see below) or for users defined directly in IAM itself. I like to use the STS GetFederationToken API call with federated users, since they can authenticate with my existing on-premise service, and receive temporary IAM credentials directly from AWS in a self-service fashion.

By default, AWS enables STS in all regions, which allows potential attackers to request credentials.

By default, AWS enables STS in all available regions. Instead, it’s safer to turn STS on only when you need it in specific regions, since that way, you’re scoping your attack surface solely to regions which you know you rely upon. You can turn STS region endpoints on and off, with the exception of the US (East) region, in the IAM console under the Account Settings tab.

You can disable STS on a region-by-region basis; consider doing so unless you’re using a region.

Federating AWS Identity and Access Management using SAML or OIDC

Many organizations already have some pre-existing authentication database for authenticating employees trying to connect to their email inboxes, to expenses systems, and a whole host of other internal systems. There are typically policies and procedures around access control already in place, often involving onboarding and offboarding processes, so when a colleague joins, leaves or changes role in an organization, the authentication database and related permissions are adequately updated.

You can federate authentication systems which use SAML or OpenID Connect (OIDC) to IAM, allowing authentication of your users to occur locally against existing systems. This works well with products such as Active Directory (through Active Directory Federation Services) and Google Apps for Work, but I’ve also heard about Oracle WebLogic, Auth0, Shibboleth, Okta, Salesforce, Apache Altu, and modules for Nginx and Apache being used.

That way, when a colleague joins, as long as they’ve been granted the relevant permissions in your authentication service, they’ve got access to assume IAM roles, which you define, in the AWS console. And, when they leave, their access is revoked from AWS as soon as you remove their federated identity account. Unfortunately, though, there exists a caveat: since a generated STS token can’t be revoked, then if the identity account has been removed, the STS token may still work, and is still valid. To work around this, another good practice is to enforce a low expiration time, since the default of twelve hours is quite high.

Credentials provider chain

Whilst you may use a username and passphrase to get access to the AWS Management Console, the vast majority of programmatic access is going to authenticate with AWS using an IAM access key and secret key pair. You generate these on a per-user basis in the IAM console, with a maximum of two available at any one time. However, it’s how you use them that’s really the crux of the matter.

The concept of the credentials provider chain exists to assist services calling an AWS API through one of the many language-specific SDKs work out where to look for IAM credentials, and in what order to use them.

The AWS SDKs look for IAM credentials in the following order:

  • through environment variables
  • through JVM system properties
  • on disk, at ~/.aws/credentials
  • from the AWS EC2 Container Service
  • or from the EC2 metadata service

I’m never a massive fan of hard-coding credentials on disk, so I prefer recommending that keys are either transparently handled through the metadata service (you can use IAM roles and instance profiles to help you provide keys to instances, wrapped using STS) when an EC2 instance makes a call needing authentication, or that the keys are passed using environment variables. Regardless, properly setting IAM policies is important: if your application needs only access to put files in to S3 and read from ElastiCache, then only let it do that!

Multi-factor authentication

IAM allows you to enforce the use of multi-factor authentication (MFA), or two-factor authentication as it’s often known elsewhere. It’s generally good practice to use this on all of your accounts – especially your root account, since that holds special privileges that IAM accounts don’t get by default, such as access to your billing information.

It’s generally recommended that you enable MFA on your root account, create another IAM user for getting access to the Management Console and APIs, and then create your AWS infrastructure using these IAM users. In essence, you should get out of the habit of using the root account as quickly as possible after enforcing MFA on it.

In many organisations, access to the root account is not something you want to tie down to one named user, but when setting up MFA, you need to provide two codes from an MFA device to enable it (since this is how AWS checks that your MFA device has been set up correctly and is in sync). The QR code provided contains the secret visible using the link below it, and this secret can be stored in a password vault or a physical safe, where others can use it to re-generate the QR code, if required. Scanning the QR code will also give you a URL which you can use on some devices to trigger opening of an app like Google Authenticator. You can request AWS disables MFA on your root account at any time, per the documentation.

Conclusion

Hopefully, you’re already doing – or at least thinking – about some of the ideas above for use in the AWS infrastructure in your organization, but if you’re not, then start thinking about them. Whilst some of the services mentioned above, such as Shield and IAM, offer security as part of their core offering, others – like using Elastic Load Balancers to soak up TCP traffic, using Network ACLs to explicitly deny traffic, or thinking about your architecture by considering public subnets as DMZs – are often overlooked as they’re a little less obvious.

Hopefully, the tips above can help you create a more secure stack in future.

About the Author

Andrew Langhorn is a senior consultant at ThoughtWorks. He works with clients large and small on all sorts of infrastructure, security and performance problems. Previously, he was up to no good helping build, manage and operate the infrastructure behind GOV.UK, the simpler, clearer and faster way to access UK Government services and information. He lives in Manchester, England, with his beloved gin collection, blogs at ajlanghorn.com, and is a firm believer that mince pies aren’t to be eaten before December 1st.

About the Editors

Dean Wilson (@unixdaemon) is a professional FOSS Sysadmin, occasional coder and very occasional blogger at www.unixdaemon.net. He is currently working as a web operations engineer for Government Digital Service in the UK.


Protecting AWS Credentials

15. December 2016 2016 0

Author: Brian Nuszkowski
Editors: Chris Henry, Chris Castle

AWS provides its users with the opportunity to leverage their vast offering of advanced data center resources via a tightly integrated API. While the goal is to provide easy access to these resources, we must do so with security in mind. Peak efficiency via automation is the pinnacle of our industry. At the core of our automation and operation efforts lie the ‘keys’ to the kingdom. Actually, I really do mean keys; access keys. As an AWS Administrator or Power User, we’ve probably all used them, and there is probably at least 1 (valid!) forgotten copy somewhere in your home directory. Unauthorized access to AWS resources via compromised access keys usually occurs via:

  • Accidental commit to version control systems
  • Machine compromise
  • Unintentional sharing/capturing during live demonstrations or recordings

Without the proper controls, if your credentials are obtained by an unauthorized party, they can be used by anyone with internet access. So, we’ll work to transform how we look at our access keys, by treating them less as secrets that we guard with great care, and more like disposable items. We’ll do that by embracing Multi-factor Authentication (MFA, but also referred to as Two Factor Authentication or 2FA).

In this scenario, we’re looking to protect IAM users who are members of the Administrators IAM group. We’ll do this by:

  1. Enabling MFA for IAM Users
  2. Authoring and applying an MFA Enforced IAM Policy
  3. Leveraging the Security Token Service to create MFA enabled credentials

1. Enable MFA for applicable IAM Users

This can be done by adding a Multi-Factor Authentication Device in each user’s Security Credentials section. I prefer Duo Mobile, but any TOTP application will work. Your MFA device will be uniquely identified by it’s ARN and will look something like: arn:aws:iam::1234567889902:mfa/hugo

2. Apply an MFA Enforced Policy

Create a Managed or Inline policy using the json above and attach it to the IAM User or Group whose credentials you wish to protect. This IAM policy above allows all actions against any resource if the request’s credentials are labeled as having successfully performed MFA.

3. Create MFA Enabled Credentials via the Security Token Service

Now that you’re enforcing MFA for API requests via Step 2, your existing access keys are no longer primarily used for making requests. Instead, you’ll use these keys in combination with your MFA passcode to create a new set of temporary credentials that are issued via the Security Token Service.

The idea now is to keep your temporary, priviliged credentials valid for only as long as you need them. e.g. The life of an administrative task or action. I like to recommend creating credentials that have a valid duration of less than or equal to 1 hour. Shrinking the timeframe for which your credentials are valid, limits the risk of their exposure. Credentials that provide administrative level privileges on Friday, from 10am to 11am, aren’t very useful to an attacker on Friday evening.

To create temporary credentials, you reference the current Time Based One Time Passcode (TOTP) in your MFA application and perform either of the following operations:

mfa

3a. Use a credential helper tool such as aws-mfa to fetch and manage your AWS credentials file
3b. If you’re an aws cli user, you can run:

aws sts get-session-token --duration-seconds 3600 --serial-number <ARN of your MFA Device> --token-code 783462 and using its output, manually update your AWS credentials file or environment variables.

3c. Write your own application that interfaces with STS using one of AWS’s SDKs!

AWS Console and MFA

Implementing MFA for console usage is a much simpler process. By performing Step 1, the console automatically prompts for your MFA passcode upon login. Awesome, and easy!

Service Accounts

There are scenarios where temporary credentials do not fit the workload of long running tasks. Having to renew credentials every 60 minutes for long-running or recurring automated processes seems highly counterintuitive. In this case, it’s best to create what I like to call an IAM Service Account. An IAM Service Account is just a normal IAM User, but it’s functionally used by an application or process, instead of a human being. Because the service account won’t use MFA, you’ll want to reduce the risk associated to its credentials in the event of their exposure. You do this by combining a least privilege policy, meaning only give access to what’s absolutely necessary, with additional controls, such as source IP address restrictions.

An example Service Account IAM Policy that only allows EC2 instance termination from an allowed IP address range.

MFA Protection on Identity Providers and Federation

While AWS offers MFA Protection for Cross-Account Delegation, this only applies to requests originating from an AWS account. AWS does not have visibility into the MFA status of external identity providers (IdP). If your organization uses an external Identity Provider to broker access to AWS, either via SAML or a custom federation solution, it is advised that you implement a MFA solution, such as Duo, in your IdP’s authentication workflow.

Stay safe, have fun, and keep building!

About the Author

Brian Nuszkowski (@nuszkowski) is a Software Engineer at Uber’s Advanced Technologies Center. He is on the organizing committee for DevOpsDays Detroit and has spoken at several conferences throughout North America such as DevOps Days Austin, Pittsburg, Toronto, and O’Reilly’s Velocity conference in New York.

About the Editors

Chris Henry is a technologist who has devoted his professional life to building technology and teams that create truly useful products. He believes in leveraging the right technologies to keep systems up and safe, and teams productive. Previously, Chris led technology teams at Behance, the leading online platform for creatives to showcase and discover creative work, and later at Adobe, which acquired Behance in 2012. A large part of his time has been spent continually examining and improving systems and processes with the goal of providing the best experience to end users. He’s currently building IssueVoter.org, a simple way to send Congress opinions about current legislature and track their results. He occasionally blogs at http://chr.ishenry.com about tech, travel, and his cat.

Chris Castle is a Delivery Manager within Accenture’s Technology Architecture practice. During his tenure, he has spent time with major financial services and media companies. He is currently involved in the creation of a compute request and deployment platform to enable migration of his client’s internal applications to AWS.


Serverless everything: One-button serverless deployment pipeline for a serverless app

14. December 2016 2016 0

Author: Soenke Ruempler
Editors: Ryan S. Brown

Update: Since AWS recently released CodeBuild, things got much simpler. Please also read my follow-up post AWS CodeBuild: The missing link for deployment pipelines in AWS.

Infrastructure as Code is the new default: With tools like Ansible, Terraform, CloudFormation, and others it is getting more and more common. A multitude of services and tools can be orchestrated with code. The main advantages of automation are reproducibility, fewer human errors, and exact documentation of the steps involved.

With infrastructure expressed as code, it’s not a stretch to also want to codify deployment pipelines. Luckily, AWS has it’s own service for that named CodePipeline, which in turn can be fully codified and automated by CloudFormation (“Pipelines as Code”).

This article will show you how to create a deploy pipeline for a serverless app with a “one-button” CloudFormation template. The more concrete goals are:

  • Fully serverless: neither the pipeline nor the app itself involves server, VM or container setup/management (and yes, there are still servers, just not managed by us).
  • Demonstrate a fully automated deployment pipeline blueprint with AWS CodePipeline for a serverless app consisting of a sample backend powered by the Serverless framework and a sample frontend powered by “create-react-app”.
  • Provide a one-button quick start for creating deployment pipelines for serverless apps within minutes. Nothing should be run from a developer machine, not even an “inception script”.
  • Show that it is possible to lower complexity by leveraging AWS components so you don’t need to configure/click third party providers (e.g. TravisCi/CircleCi) as pipeline steps.

We will start with a repository consisting of a typical small web application with a front end and a back end. The deployment pipeline described in this article makes some assumptions about the project layout (see the sample project):

  • a frontend/ folder with a package.json which will produce a build into build/ when npm run build is called by the pipeline.
  • a backend/ folder with a serverless.yml. The pipeline will call the serverless deploy (the Serverless framework). It should have at least one http event so that the Serverless framework creates a service endpoint which can then be used in the frontend to call the APIs.

For a start, you can just clone or copy the sample project into your own GitHub account.

As soon as you have your project ready, we can continue to create a deployment pipeline with CloudFormation.

The actual CloudFormation template we will use here to create the deployment pipeline does not reside in the project repository. This allows us to develop/evolve the pipeline and the pipeline code and the projects using the pipeline independent from each other. It is published to an S3 bucket so we can build a one-click launch button. The launch button will direct users to the CloudFormation console with the URL to the template prefilled:

Launch Stack

After you click on the link (you need to be logged in into the AWS Console), and click “Next” to confirm that you want to use the predefined template, some CloudFormation stack parameters have to be specified:CloudFormation stack parameters

First you need to specify the GitHub Owner/Repository of the project (the one you copied earlier), a branch (usually master) and a GitHub Oauth Token as described in the CodePipeline documentation.

The other parameters specify where to find the Lambda function source code for the deployment steps, we can live with the defaults for now, stuff for another blog post. (Update: the Lambda functions became obsolete the move to AWS CodeBuild,, and so did the template parameters regarding Lambda source code location.)

The next step of the CloudFormation stack setup allows you to specify advanced settings like tags, notifications and so on. We can leave as-is as well.

On the last assistant page you need to acknowledge that CloudFormation will create IAM roles on your behalf:

CloudFormation IAM confirmation

The IAM roles are needed to give Lambda functions the right permissions to run and logs to CloudWatch. Once you pressed the “Create” button, CloudFormation will create the following AWS resources:

  • An S3 Bucket containing the website assets with website hosting enabled.
  • A deployment pipeline (AWS CodePipeline) consisting of the following steps:
    • Checks out the source code from GitHub and saves it as an artifact.
    • Back end deployment: A Lambda function build step which takes the source artifact, installs and calls the Serverless framework.
    • Front end deployment: Another Lambda function build step which takes the source artifact, runs npm build and deploys the build to the Website S3 bucket

(Update: in the meantime, I replaced the Lambda functions with AWS CodeBuild).

No servers harmed so far, and also no workstations: No error-prone installation steps in READMEs to be followed, no curl | sudo bash or other awkward setup instructions. Also no hardcoded AWS access key pairs anywhere!

A platform team in an organization could provide several of these types of templates for particular use cases, then development teams could get going just by clicking the link.

Ok, back to our example: Once the CloudFormation stack creation is fully finished, the created CodePipeline is going to run for the first time. On the AWS console:

CodePipeline running

As soon as the initial pipeline run is finished:

  • the back end CloudFormation stack has been created by the Serverless framework, depending on what you defined in the backend/serverless.yml configuration file.
  • the front end has been built and put into the website bucket.

To find out the URL of our website hosted in S3, open the resources of the CloudFormation stack and expand the outputs. The WebsiteUrl output will show the actual URL:

CloudFormation Stack output

Click on the URL link and view the website:

Deployed sample website

Voila! We are up and running!

As might have seen in the picture above, there is some JSON output: It’s actually the result of a HTTP call the front end made against the back end: the hello function, which just responds the Lambda event object.

Let’s dig a bit deeper into this detail as it shows the integration of frontend and backend: To pass the ServiceEndpoint URL to the front end build step, the back end build step is exporting all CloudFormation Outputs of the Serverless-created stack as a CodePipeline build artifact, which the front end build step takes in turn to pass it to npm build (in our case via a react-specific environment var). This is how the API call looks in react:

This Cross-site request actually works, because we specified CORS to be on in the serverless.yml:

Here is a high-level overview of the created CloudFormation stack:

Overview of the CloudFormation Stack

With the serverless pipeline and serverless project running, change something in your project, commit it and view the change propagated through the pipeline!

Additional thoughts:

I want to setup my own S3 bucket with my own CloudFormation templates/blueprints!

In case that you don’t trust me as a template provider, or you want to change the one-button CloudFormation template, you can of course host your own S3 bucket. The scope of doing that is beyond this article but you can start by looking at my CloudFormation template repo.

I want to have testing/staging in the pipeline!

The sample pipeline does not have any testing or staging steps. You can add more steps to the pipeline, e.g. another Lambda step, which calls e.g. npm test on your source code.

I need a database/cache/whatever for my app!

No problem, just add additional resources to the serverless.yml configuration file.

Summary

In this blog post I demonstrated a CloudFormation template which bootstraps a serverless deployment pipeline with AWS CodePipeline. This enables rapid application development and deployment, as development teams can use the template in a “one-button” fashion for their projects.

We have deployed a sample project with a deployment pipeline with a front and back end.

AWS gives us all the lego bricks we need to create such pipelines in an automated, codified and (almost) maintenance-free way.

Known issues / caveats

  • I am describing an experiment / working prototype here. Don’t expect high quality, battle tested code (esp. the JavaScript parts 🙂 ). It’s more about the architectural concept. Issues and Pull requests to the mentioned projects are welcome 🙂 (Update: luckily I could delete all the JS code with  the move to AWS CodeBuild)
  • All deployment steps currently run with full access roles (AdministratorAccess) which is a bad practice but it was not the focus of this article.
  • The website could also be backed by a CloudFront CDN with HTTPS and a custom domain.
  • Beware of the 5 minute execution limit in Lambda functions (e.g. more complex serverless.yml setups might take longer, this could be worked around by sourcing it resource creation a CloudFormation pipeline step, Michael Wittig has blogged about that). (Update: this point became invalid with the move to AWS CodeBuild)
  • The build steps are currently not optimized, e.g. installing npm/serverless every time is not necessary. It could use an artifact from an earlier CodePipeline step (Update: this point became invalid with the move to AWS CodeBuild)
  • The CloudFormation stack created by the Serverless framework is currently suffixed with “dev”, because that’s their default environment. The prefix should be omitted or made configurable.

Acknowledgements

Special thanks goes to the folks at Stelligent.

First for their open source work on serverless deploy pipelines with Lambda, especially the “dromedary-serverless” project. I adapted much from the Lambda code.

Second for their “one-button” concept which influenced this article a lot.

About the Author

Along with 18 years of web software development and web operations experience, Soenke Ruempler is an expert in AWS technologies (6 years in experience of development and operating), and in moving on-premise/legacy systems to the Cloud without service interruptions.

His special interests and fields of knowledge are Cloud/AWS, Serverless architectures, Systems Thinking, Toyota Kata (Kaizen), Lean Software Development and Operations, High Performance/Reliability Organizations, Chaos Engineering.

You can find him on Twitter, Github and occasionally blogging on ruempler.eu.

About the Editors

Ryan Brown is a Sr. Software Engineer at Ansible (by Red Hat) and contributor to the Serverless Framework. He’s all about using the best tool for the job, and finds simplicity and automation are a winning combo for running in AWS.


Modular cfn-init Configsets with SparkleFormation

13. December 2016 2016 0

Author: Michael F. Weinberg
Editors: Andreas Heumaier

This post lays out a modular, programmatic pattern for using CloudFormation Configsets in SparkleFormation codebases. This technique may be beneficial to:

  • Current SparkleFormation users looking to streamline EC2 instance provisioning
  • Current CloudFormation users looking to manage code instead of JSON/YAML files
  • Other AWS users needing an Infrastructure as Code solution for EC2 instance provisioning

Configsets are a CloudFormation specific EC2 feature that allow you to configure a set of instructions for cfn-init to run upon instance creation. Configsets group collections of specialized resources, providing a simple solution for basic system setup and configuration. An instance can use one or many Configsets, which are executed in a predictable order.

Because cfn-init is triggered on the instance itself, it is an excellent solution for Autoscaling Group instance provisioning, a scenario where external provisioners cannot easily discover underlying instances, or respond to scaling events.

SparkleFormation is a powerful Ruby library for composing CloudFormation templates, as well as orchestration templates for other cloud providers.

The Pattern

Many CloudFormation examples include a set of cfn-init instructions in the instance Metadata using the config key. This is an effective way to configure instances for a single template, but in an infrastructure codebase, doing this for each service template is repetitious and introduces the potential for divergent approaches to the same problem in different templates. If no config key is provided, cfn-init will automatically attempt to run a default Configset. Configsets in CloudFormation templates are represented as an array. This pattern leverages Ruby’s concat method to construct adefault Configset in SparkleFormation’s compilation step. This allows us to use Configsets to manage the instance Metadata in a modular fashion.

To start any Instance or Launch Config resources should include an empty array as the default Configset in their metadata, like so:

Additionally, the Instance or Launch Config UserData should run the cfn-init command. A best practice is to place this in a SparkleFormation registry entry. A barebones example:

With the above code, cfn-init will run the empty default Configset. Using modular registry entries, we can expand this Configset to meet our needs. Each registry file should add the defined configuration to the default Configset, like this:

A registry entry can also include more than one config block:

Calling these registry entries in the template will add them to the default Configset in the order they are called:

Note that other approaches to extending the array will also work:

sets.default += [ 'key_to_add' ], sets.default.push('key_to_add'), sets.default << 'key_to_add', etc.

Use Cases

Extending the default Configset rather than setting the config key directly makes it easy to build out cfn-initinstructions in a flexible, modular fashion. Modular Configsets, in turn, create opportunities for better Infrastructure as Code workflows. Some examples:

Development Instances

This cfn-init pattern is not a substitute for full-fledged configuration management solutions (Chef, Puppet, Ansible, Salt, etc.), but for experimental or development instances cfn-init can provide just enough configuration management without the increased overhead or complexity of a full CM tool.

I use the Chef users cookbook to manage users across my AWS infrastructure. Consequently, I very rarely make use of AWS EC2 keypairs, but I do need a solution to access an instance without Chef. My preferred solution is to use cfn-init to fetch my public keys from Github and add them to the default ubuntu (or ec2-user) user. The registry for this:

In the template, I just set a github_user parameter and include the registry, and I get access to an instance in any region without needing to do any key setup or configuration management.

This could also be paired with a configuration management registry entry and the Github user setup can be limited to development:

Compiling this with the environment variable development=true will include the Github Configset, in any other case it will run the full configuration management.

In addition to being a handy shortcut, this approach is useful for on-boarding other users/teams to an Infrastructure codebase and workflow. Even with no additional automation in place, it encourages system provisioning using a code-based workflow, and provides a groundwork to layer additional automation on top of.

Incremental Automation Adoption

Extending the development example, a modular Configset pattern is helpful for incrementally introducing automation. Attempting to introduce automation and configuration management to an infrastructure that is actively being architected can be very frustrating—each new component require not just understanding the component and its initial configuration, but also determining how best to automate and abstract that into code. This can lead to expedient, compromise implementations that add to technical debt, as they aren’t flexible enough to support emergent needs.

An incremental approach can mitigate these issues, while maintaining a focus on code and automation. Well understood components are fully automated, while some emergent features are initially implemented with a mixture of automation and manual experimentation. For example, an engineer approaching a new service might perform some baseline user setup and package installation via an infrastructure codebase, but configure the service manually while determining the ideal configuration. Once that configuration matures, the automation resources necessary to achieve it are included in the codebase.

CloudFormation Configsets are effective options for package installation and are also good for fetching private assets from S3 buckets. An engineer might use a Configset to setup her user on a development instance, along with the baseline package dependencies and a tarball of private assets. By working with the infrastructure codebase from the outset, she has the advantage of knowing that any related AWS components are provisioned and configured as they would be in a production environment, so she can iterate directly on service configuration. As the service matures, the Configset instructions that handled user and package installation may be replaced by more sophisticated configuration management tooling, but this is a simple one-line change in the template.

Organization Wide Defaults

In organizations where multiple engineers or teams contribute discrete application components in the same infrastructure, adopting standard approaches across the organization is very helpful. Standardization often hinges on common libraries that are easy to include across a variety of contexts. The default Configset pattern makes it easy to share registry entries across an organization, whether in a shared repository or internally published gems. Once an organizational pattern is codified in a registry entry, including it is a single line in the template.

This is especially useful in organizations where certain infrastructure-wide responsibilities are owned by a subset of engineers (e.g. Security or SRE teams). These groups can publish a gem (SparklePack) containing a universal configuration covering their concerns that the wider group of engineers can include by default, essentially offering these in an Infrastructure as a Service model. Monitoring, Security, and Service Discovery are all good examples of the type of universal concerns that can be solved this way.

Conclusion

cfn-init Configsets can be a powerful tool for Infrastructure as Code workflows, especially when used in a modular, programmatic approach. The default Configset pattern in SparkleFormation provides an easy to implement, consistent approach to managing Configsets across an organization–either with a single codebase or vendored in as gems/SparklePacks. Teams looking to increase the flexibility of their AWS instance provisioning should consider this pattern, and a progammatic tool such as SparkleFormation.

For working examples, please checkout this repo.

About the Author

Michael F. Weinberg is an Infrastructure & Automation specialist, with a strong interest in cocktails and jukeboxes. He currently works at Hired as a Systems Engineer. His open source projects live at http://github.com/reverseskate.


Providing Static IPs for Non-Trivial Architectures

12. December 2016 2016 0

Author: Oli Wood
Editors: Seth Thomas, Scott Francis

An interesting problem landed on my desk a month ago that seemed trivial to begin with, but once we started digging into the problem it turned out to be more complex than we thought.  A small set of our clients needed to restrict outgoing traffic from their network to a whitelist of IP addresses.  This meant providing a finite set of IPs which we could use to provide a route into our data collection funnel.

Traditionally this has not been too difficult, but once you take into account the ephemeral nature of cloud infrastructures and the business requirements for high availability and horizontal scaling (within reason) it gets more complex.

We also needed to take into account that our backend system (api.example.com) is deployed in a blue/green manner (with traffic being switched by DNS), and that we didn’t want to incur any additional management overhead with the new system.  For more on Blue/Green see http://martinfowler.com/bliki/BlueGreenDeployment.html.

Where we ended up looks complex but is actually several small systems glued together.  Let’s describe the final setup and then dig into each section.

The Destination

A simplified version of the final solution.
A simplified version of the final solution.

 

The View from the Outside World

Our clients can address our system by two routes:

  • api.example.com – our previous public endpoint.  This is routed by Route 53 to either api-blue.example.com or api-green.example.com
  • static.example.com – our new address which will always resolve to a finite set of IP addresses (we chose 4).  This will eventually route through to the same blue or green backend.

The previous infrastructure

api-blue.example.com is an autoscaling group deployed (as part of a wider system) inside its own VPC.  When we blue/green deploy an entire new VPC is created (this is something we’re considering revisiting).  It is fronted by an ELB.  Given the nature of ELBs, the IP addresses of this instance will change over time, which is why we started down this road.

The proxying infrastructure

static.example.com is a completely separate VPC which houses 4 autoscaling groups set to a minimum size of 1 and a maximum size of 1.  The EC2 instances are assigned an EIP on boot (more on this later) and have HAProxy 1.6 installed.  HAProxy is setup to provide two things:

  • A TCP proxy endpoint on port 443
  • A healthcheck endpoint of port 9000

The DNS configuration

The new DNS entry for static.example.com is configured so that it only returns IP addresses for up to 4 of the EIPs, based on the results of their healthcheck (as provided by HAProxy).

How we got there

The DNS setup

static.example.com is based on a set of four Health Checks which form a Traffic Policy that creates the Policy Record (which is the equivalent of your normal DNS entry).

Steps to create Health Checks:

  1. Log into the AWS Console
  2. Head to Route 53
  3. Head to Health Checks
  4. Create new Health Check
    1. What to monitor => Endpoint
    2. Specify endpoint by => IP Address
    3. Protocol => HTTP
    4. IP Address => [Your EIP]
    5. Host name => Ignore
    6. Port => 9001
    7. Path => /health

Repeat four times.  Watch until they all go green.

Steps to create Traffic Policy:

  1. Route 53
  2. Traffic Policies
  3. Create Traffic Policy
    1. Policy name => something sensible
    2. Version description => something sensible

This opens up the GUI editor

  1. Choose DNS type A: IP address
  2. Connect to => Weighted Rule
  3. Add 2 more Weights
  4. On each choose “Evaluate target health” and then one of your Health Checks
  5. Make sure the Weights are all set the same (I chose 10)
  6. For each click “Connect to” => New Endpoint
    1. Type => Value
    2. Value => EIP address
The traffic policy in the GUI
The traffic policy in the GUI

Adding the Policy record

  1. Route 53
  2. Policy Record
  3. Create new Policy Record
    1. Traffic policy => Your new policy created above
    2. Version => it’ll probably be version 1 because you just created it
    3. Hosted zone => chose the domain you’re already managing in AWS
    4. Policy record => add static.example.com equivalent
    5. TTL => we chose 60 seconds

And there you go, static.example will route traffic to your four EIPs, but only if they are available.

The Autoscaling groups

The big question you’re probably wondering here is “why did they create four separate Autoscaling groups?  Why not just use one?”  It’s a fair question, and our choice might not be right for you, but the reasoning is that we didn’t want to build something else to manage which EIPs were assigned to each of the 4 instances.  By using 4 separate Autoscaling groups we can use 4 separate Launch Configurations, and then use the EC2 tags to manage how an instance knows which EIP to launch.

The keys things here are…

  • Each of the Autoscaling Groups is defined separately in our CloudFormation stack
  • Each of the Autoscaling Groups has its own Launch Configuration
  • We place two Autoscaling Groups in each of our Availability Zones
  • We place two Autoscaling Groups in each Public Subnet
  • Tags on the Autoscaling Group are set with “PropagateAtLaunch: true” so that the instances they launch end up with the EIP reference on them
  • Each of the four Launch Configurations includes the same UserData script (Base64 encoded in our CloudFormation template)
  • The LaunchConfiguration includes an IAM Role giving enough permissions to be able to tag the instance

The UserData script

The IAM Role statement

The EC2 instances

We chose c4.xlarge instances to provide a good amount of network throughput.  Because HAProxy is running in TCP mode we struggle to monitor the traffic levels and so we’re using CloudWatch to alert on very high or low Network Output from the four instances.

The EC2 instances themselves are launched from a custom AMI which includes very little except a version of HAProxy (thanks to ITV for https://github.com/ITV/rpm-haproxy).  We’re using this fork because it supplies the slightly newer HAProxy veresion 1.6.4

Unusually for us we’ve baked the config for HAProxy into the AMI.  This is a decision we will revisit at a later date I suspect and have the config pulled from S3 at boot time.

HAProxy is set to start on boot.  Something we shall probably add at a later date is to have the Autoscaling Group use the same healthcheck endpoint that HAProxy provides to Route 53 to determine the instance health. This way we’ll launch another instance if one comes up, but does not provide a healthy HAProxy for some reason.

The HAProxy setup

HAProxy is a fabulously flexible beast and we had a lot of options on what to do here.  We did however wish to keep it as simple as possible.  With that in mind, we opted to not offload SSL at this point but to act as a passthrough proxy direct to our existing architecture.

Before we dive into the config, however, it’s worth mentioning our choice of backend URL.  We opted to route back to api.example.com because this means that when we blue/green deploy our existing setup we don’t need to make any changes to our HAProxy setup.  By using its own health check mechanism and “resolvers” entry we can make sure that the IP addresses that it is routing to (the new ELB) aren’t more than a few seconds out of date.  This loopback took us a while to figure out and is (again) something we might revisit in the future.

Here are the important bits of the config file:

The resolver

Makes use of AWS’s internal DNS service.  This has to be used in conjunction with a health check on the backend server

The front end listener

Super simple.  This would be more complex if you wanted to route traffic from different source addresses to different backends using SNI (see http://blog.haproxy.com/2012/04/13/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/).

The backend listener

The key things here are the including of the resolver (mydns, as defined above). It’s the combination of the two which causes HAProxy to reevaluate the DNS entry.

The outwards facing health check

This will return a 200 if everything is ok, 503 if the backend is down, and will return a connection failure if HAProxy is down. This will correctly inform the Route 53 health checks and if needed R53 will not include the IP address.

What we did to test it

We ran through various scenarios to check how the system coped:

  • Deleting one of the proxy instances and seeing it vanish from the group returned from static.example.com
  • Doing a blue/green deployment and seeing HAProxy update its backend point
  • Block access to one AZ with a tweak to the Security Group to simulate the AZ becoming unavailable
  • Forcing 10 times our load in using Vegeta
  • Running a soak test at sensible traffic levels over several hours (also with Vegeta)

The end result

While this is only providing 4 EC2 instances which proxy traffic, it’s a pattern which could be scaled out very easily, with each section bringing another piece of the resilience pie to the table.

  • Route 53 does a great job of only including EIPs that are associated with healthy instances
  • The Autoscaling Groups make sure that our proxy instances will bounce back if something nasty happens to them
  • UserData and Tags provide a neat way for the instances to self-manage the allocation of EIPs
  • HAProxy provides both transparent routing and health checks.
  • Route 53 works really well for Blue/Greening our traffic to our existing infrastructure.

It’s not perfect (I imagine we’ll have issues with some client caching DNS records for far too long at some point), and I’ll wager we’ll end up tuning some of the timeouts and HAProxy config at some point in the future, but for now it’s out there and happily providing an end point for our customers (and not taking up any of our time).  We’ve tested how to deploy updates (deploy a new CloudFormation stack and let the new instance “steal” the EIPs) successfully too.

About the Author:

Oli Wood has been deploying systems into AWS since 2010 in businesses ranging from 2 people startups to multi-million dollar enterprises. Previous to that he mostly battled with deploying them onto other service providers, cutting his teeth in a version control and deployment team on a Large Government Project back in the mid 2000s.

Inside of work he spends time, train tickets and shoe leather helping teams across the business benefit from DevOps mentality.

Outside of work he can mostly be found writing about food on https://www.omnomfrickinnom.com/ and documenting the perils of poor posture at work at http://goodcoderbadposture.com/

Online he’s @coldclimate

About the Editors:

Scott Francis has been designing, building and operating Internet-scale infrastructures for the better part of 20 years. He likes BSD, Perl, AWS, security, cryptography and coffee. He’s a good guy to know in a zombie apocalypse. Find him online at  https://linkedin.com/in/darkuncle and https://twitter.com/darkuncle.


L4 vs L7 Showdown

10. December 2016 2016 0

Author: Atif Siddiqui

Editors: Vinny Carpenter, Brian O’Rourke

Objective

This article will explain the role and types of load balancers before delving into it through the prism of Amazon Web Services (AWS). This post wraps up with a lab exercise on AWS Load Balancer migration.

Introduction

A load balancer is a device that in its simplest form acts as a funnel for traffic before redistributing it. This is achieved by playing the role of reverse proxy server (RPS). While a load balancer can be a hardware device or a software component, this article will focus on a Software Defined Networking (SDN) load balancer.

Load Balancer dictating traffic distribution
Load Balancer dictating traffic distribution

OSI 101

Open System Interconnection (OSI) model is a conceptual illustration of networking. It shows the dependency of each layer serving the one above it. When discussing load balancers, transport and applications layer hold our interest.

Open Systems Interconnection model – high level
Open Systems Interconnection model – high level

There are two types of load balancers.

1. A Layer 4 load balancer works at the networking transport layer. This confines the criteria to IP addresses and ports as only the packet header is being inspected without reviewing its contents.

2.A  Layer 7 load balancer works at the application layer. It has higher intelligence because it can inspect packet contents as it understands protocols such as HTTP, HTTPS, WebSockets. This gives it the ability to perform advanced routing.

 Open Systems Interconnection model – close up [1]
Open Systems Interconnection model – close up [1]

AWS Perspective

Elastic Load Balancer (ELB) is one of the cornerstones of designing resilient applications. A walk down memory lane shows that beta release happened back in May 2009. Being a layer 4 (L4) load balancer, with ELB, routing decisions are made without inspecting contents of the packet.

The abstraction and simplicity of use remain as its core strengths: provisioning can be done through one click of a button. On the flip side, one of the features that is conspicuously missing is the support of server name indication (SNI). While wildcard and SAN certificates are supported, hopefully multiple certificates support is around the corner.

As a new offering in this space, AWS recently came out with Layer 7 Load balancer aptly named Application Load Balancer (ALB). This was announced in August this year with availability across all AWS commercial regions. Along with this announcement, the original load balancer was rebranded as Class Load Balancer.

Building blocks of an AWS application load balancer
Building blocks of an AWS application load balancer

AWS has also introduced target group as the new nomenclature. Target group is used to register EC2(s) that is mapped to port number(s). Target group is linked to ALB via Listener which in turn can have rule(s) association.

Register/de-register instance for Target group
Register/de-register instance for Target group

Some other noteworthy aspects about ALB are:

1. ALB supports HTTP and Web Sockets.

3. While AWS cli for Classic Load Balancer is aws elb, for Application Load Balancer it is aws elbv2.

4. ALB allows routing via path matching only with a ceiling of 10 URL based rules.

5. Like Classic, pre-warming for ALB is recommended in preparation for major traffic spike.

6. ALB’s hourly late is 10% lower than ELB.

7. CloudFormation supports ALB though, interestingly, it is referred to as ElasticLoadBalancingV2.

Migration Guide: ELB -> ALB

While ELB cannot be converted to an ALB, migration is supported [2]. AWS recommends python script [3] available in github. The following exercise was done on an Amazon AMI to test such a migration. Each command is preceded with a comment to indicate the purpose. It is assumed that the reader already has the AWS CLI installed, as well as has their credentials set up to be able to manipulate aws objects from the command line.

grab migration utility [4]

— verify existing ELB name via cli

— Conduct dry run for load balancer migration (specified incorrect region first time around). As python script needs boto3; prerequisite step is to run command via pip install boto3

— create application load balancer

Target group ARNs:

Considerations:

1. If your Classic load balancer is attached to an Auto Scaling group, attach the target groups to the Auto Scaling group.

2. All HTTPS listeners use the predefined security policy.

3. To use Amazon EC2 Container Service (Amazon ECS), register your containers as targets.

On November 22, the product team published [5] a new ALB feature for request tracing. This will provide the ability to trace through individual requests. I can’t wait to play with it.

References

  1. https://mplsnet.files.wordpress.com/2014/06/osi-model.gif
  2. http://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/migrate-to-application-load- balancer.html
  3. https://github.com/aws/elastic-load-balancing-tools
  4. https://raw.githubusercontent.com/aws/elastic-load-balancing- tools/master/copy_classic_load_balancer.py
  5. https://aws.amazon.com/blogs/aws/application-performance-percentiles-and-request-tracing-for- aws-application-load-balancer/

 

About the Author:

Atif Siddiqui is a certified AWS Solutions Architect. He works as an Architect at General Electric (GE) in enterprise applications space. His responsibilities encompass critical applications with global footprint where he brings solutions and infrastructure expertise for his customers. He is also an avid blogger on GE’s internal social media.

About the Editors:

Brian O’Rourke is the co-founder of RedisGreen, a highly available and highly instrumented Redis service. He has more than a decade of experience building and scaling systems and happy teams, and has been an active AWS user since S3 was a baby.