Deploy a Secure Static Site with AWS & Terraform

14. December 2018 2018 0

There are many uses for static websites. A static site is the simplest form of website, though every website consists of delivering HTML, CSS and other resources to a browser. With a static website, initial page content is delivered the same to every user, regardless as to how they’ve interacted with your site previously. There’s no database, authentication or anything else associated with sending the site to the user – just a straight HTTPS connection and some text content. This content can benefit from caching on servers closer to its users for faster delivery; it will generally also be lower cost as the servers to deliver this content to not themselves need to interpret scripting languages or make database connections on behalf of the application.

The static website now has another use, as there are more tools to provide highly interactive in-browser applications based on JavaScript frameworks (such as React, Vue or Angular) which manage client interaction, maintain local data and interact with the web service via small but often frequent API calls. These systems decouple front-end applications from back-end services and allow those back-ends to be written in multiple languages or as small siloed applications, often called microservices. Microservices may take advantage of modern back-end technologies such as containers (via Docker and/or Kubernetes) and “serverless” providers like AWS Lambda.

People deploying static sites fall into these two very different categories – for one the site is the whole of their business, for the other the static site is a very minor part supporting the API. However, each category of static site use still shares similar requirements. In this article we explore deploying a static site with the following attributes:

  • Must work at the root domain of a business, e.g., example.com
  • Must redirect from the common (but unnecessary) www. subdomain to the root domain
  • Must be served via HTTPS (and upgrade HTTP to HTTPS)
  • Must support “pretty” canonical URLs – e.g., example.com/about-us rather than example.com/about-us.html
  • Must not cost anything when not being accessed (except for domain name costs)

AWS Service Offerings

We achieve these requirements through use of the following AWS services:

  • S3
  • CloudFront
  • ACM (Amazon Certificate Manager)
  • Route53
  • Lambda

This may seem like quite a lot of services to host a simple static website; let’s review and summarise why each item is being used:

  • S3 – object storage; allows you to put files in the cloud. Other AWS users or AWS services may be permitted access to these files. They can be made public. S3 supports website hosting, but only via HTTP. For HTTPS you need…
  • CloudFront – content delivery system; can sit in front of an S3 bucket or a website served via any other domain (doesn’t need to be on AWS) and deliver files from servers close to users, caching them if allowed. Allows you to import HTTPS certificates managed by…
  • ACM – generates and stores certificates (you can also upload your own). Will automatically renew certificates which it generates. For generating certificates, your domain must be validated via adding custom CNAME records. This can be done automatically in…
  • Route53 – AWS nameservers and DNS service. R53 replaces your domain provider’s nameservers (at the cost of $0.50 per month per domain) and allows both traditional DNS records (A, CNAME, MX, TXT, etc.) and “alias” records which map to a specific other AWS service – such as S3 websites or CloudFront distributions. Thus an A record on your root domain can link directly to Cloudfront, and your CNAMEs to validate your ACM certificate can also be automatically provisioned
  • Lambda – functions as a service. Lambda lets you run custom code on events, which can come directly or from a variety of other AWS services. Crucially you can put a Lambda function into Cloudfront, manipulating requests or responses as they’re received from or sent to your users. This is how we’ll make our URLs look nice

Hopefully, that gives you some understanding of the services – you could cut out CloudFront and ACM if you didn’t care about HTTPS, but there’s a worldwide push for HTTPS adoption to provide improved security for users and browsers including Chrome are marking pages not served via HTTPS as “insecure” as part of their commitment.

All this is well and good, but whilst AWS is powerful their console leaves much to be desired, and setting up one site can take some time – replicating it for multiple sites is as much an exercise in memory and box ticking as it is in technical prowess. What we need is a way to do this once, or even better have somebody else do this once, and then replicate it as many times as we need.

Enter Terraform from HashiCorp

One of the most powerful parts of AWS isn’t clear when you first start using the console to manage your resources. AWS has a super powerful API that drives pretty much everything. It’s key to so much of their automation, to the entirety of their security model and tools, tools like Terraform.

Terraform from HashiCorp is “Infrastructure-as-Code” or IaC. It lets you define resources on a variety of cloud providers and then run commands to:

  • Check the current state of your environment
  • Make required changes such that your actual environment matches the code you’ve written

In code form, Terraform uses blocks of code called resources:

resource “aws_s3_bucket” “some-internal-reference” {
  bucket = “my-bucket-name”
}

Each resource can include variables (documented on the provider’s website), and these can be text, numbers, true/false, lists (of the above) or maps (basically like subresources with their variables).

Terraform is distributed as pre-built binaries (it’s also open source, written in Go so you can build it yourself) that you can run simply by downloading, making them executable and then executing them. To work with AWS, you need to define a “provider” which is formatted similarly to a resource:

provider “aws” {
}

To run any AWS API (via command line, terraform or a language of your choice) you’ll need to generate an access key and secret key for the account you’d like to use. That’s beyond the scope of this article, but given you should also avoid hardcoding those credentials into Terraform, and given you’d be very well served to have access to it, skip over to the AWS CLI setup instructions and set this up with the correct keys before continuing.

(NB: in this step you’re best provisioning an account with admin rights, or at least full access to IAM, S3, Route53, Cloudfront, ACM & Lambda. However don’t be tempted to create access keys for your root account – AWS recommends against this)

Now that you’ve got your system set up to use AWS programmatically, installed Terraform and been introduced to the basics of its syntax it’s a good time to look at our code on GitHub.

Clone the repository above; you’ll see we have one file in the root (main.tf.example) and then a directory called modules. One of the best parts of Terraform is modules and how they behave. Modules allow one user to define a specific set of infrastructure that may either relate directly to each other or interact by being on the same account. These modules can define variables allowing some aspects (names, domains, tags) to be customised, whilst other items that may be necessary for the module to function (like a certain configuration of a CloudFront distribution) are fixed.

To start off run bash ./setup which will copy the example file to main.tf and also ensure your local Terraform installation has the correct providers (AWS and file archiving) as well as set up the modules. In main.tf then you’ll see a suggested set up using three modules. Of course, you’d be free to just remove main.tf entirely and use each module in its own right, but for this tutorial, it helps to have a complete picture.

At the top of the main.tf file are defined three variables which you’ll need to fill in correctly:

  1. The first is the domain you wish to use – it can be your root domain (example.com) or any sort of subdomain (my-site.example.com).
  2. Second, you’ll need the Zone ID associated with your domain on Route 53. Each Route 53 domain gets a zone ID which relates to AWS’ internal domain mapping system. To find your Zone ID visit the Route53 Hosted Zones page whilst signed in to your AWS account and check the right-hand column next to the root domain you’re interested in using for your static site.
  3. Finally choose a region; if you already use AWS you may have a preferred region, otherwise, choose one from the AWS list nearest to you. As a note, it’s generally best to avoid us-east-1 where possible, as on balance this tends to have more issues arise due to its centrality in various AWS services.

Now for the fun part. Run terraform plan – if your AWS CLI environment is set up the plan should execute and show the creation of a whole list of resources – S3 Buckets, CloudFront distributions, a number of DNS records and even some new IAM roles & policies. If this bit fails entirely, check that the provider entity in main.tf is using the right profile name based on your ~/.aws/credentials file.

Once the plan has run and told you it’s creating resources (it shouldn’t say updating or destroying at this point), you’re ready to go. Run terraform apply – this basically does another plan, but at the end, you can type yes to have Terraform create the resources. This can take a while as Terraform has to call various AWS APIs and some are quicker than others – DNS records can be slightly slower, and ACM generation may wait until it’s verified DNS before returning a positive response. Be patient and eventually it will inform you that it’s finished, or tell you if there have been problems applying.

If the plan or apply options have problems you may need to change some of your variables based on the following possible issues:

  • Names of S3 buckets should be globally unique – so if anyone in the world has a bucket with the name you want, you can’t have it. A good system is to prefix buckets with your company name or suffix them with random characters. By default, the system names your buckets for you, but you can override this.
  • You shouldn’t have an A record for your root or www. domain already in Route53.
  • You shouldn’t have an ACM certificate for your root domain already.

It’s safe (in the case of this code at least) to re-run Terraform if problems have occurred and you’ve tried to fix them – it will only modify or remove resources it has already created, so other resources on the account are safe.

Go into the AWS console and browse S3, CloudFront, Route53 and you should see your various resources created. You can also view the Lambda function and ACM but be aware that for the former you’ll need to be in the specific region you chose to run in, and for the latter, you must select us-east-1 (N. Virginia)

What now?

It’s time to deploy a website. This is the easy part – you can use the S3 console to drag and drop files (remember to use the website bucket and not the logs or www redirect buckets), use awscli to upload yourself (via aws s3 cp or aws s3 sync) or run the example bash script provided in the repo which takes one argument, a directory of all files you want to upload. Be aware – any files uploaded to your bucket will immediately be public on the internet if somebody knows the URL!

If you don’t have a website, check the “example-website” directory – running the bash script above without any arguments will deploy this for you. Once you’ve deployed something, visit your domain and all being well you should see your site. Cloudfront distributions have a variable time to set up so in some cases it might be 15ish minutes before the site works as expected.

Note also that CloudFront is set to cache files for 5 minutes; even a hard refresh won’t reload resource files like CSS or JavaScript as Cloudfront won’t go and fetch them again from your bucket for 5 minutes after first fetching them. During development you may wish to turn this off – you can do this in the CloudFront console, set the TTL values to 0. Once you’re ready to go live, run terraform apply again and it will reconfigure Cloudfront to recommended settings.

Summary

With a minimal amount of work we now have a framework that can deploy a secure static site to any domain we choose in a matter of minutes. We could use this to deploy websites for marketing clients rapidly, publish a blog generated with a static site builder like Jekyll, or use it as the basis for a serverless web application using ReactJS delivered to the client and a back-end provided by AWS Lambda accessed via AWS API Gateway or (newly released) an AWS Application Load Balancer.

About the Author

Mike has been working in web application development for 10 years, including 3 years managing a development team for a property tech startup and before that 4 years building a real time application for managing operations at skydiving centres, as well as some time freelancing. He uses Terraform to manage all the AWS infrastructure for his current work and has dabbled in other custom AWS tools such as an improvement to the CloudWatch logging agent and a deployment tool for S3. You can find him on Twitter @m1ke and GitHub.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


A Hybrid of One: Building a Private Cloud in AWS

12. December 2018 2018 0

Introduction

Adoption of the cloud is becoming more and more popular for all types of businesses. When you’re starting, you have a blank canvas to work from – there’s no existing blueprint or guide. But what if you’re not in that position? What if you’ve already got an established security policy in place, or you’re working in a regulated industry that sets limits of what’s appropriate or acceptable for your company’s IT infrastructure?

Being able to leverage the elasticity of the public cloud is one of its biggest – if not, the biggest – advantage over a traditional corporate IT environment. Building a private cloud takes time, money and a significant amount of investment. This investment might not be acceptable to your organisation or ever generate returns…

But what if we can build a “private” cloud using public cloud services?

The Virtual Private Cloud

If you’ve looked at AWS, you’ll be familiar with the concept of a “VPC” – A “Virtual Private Cloud”, the first resource you’ll create in your AWS account (if you don’t use the default VPC created in every region when your account is created, that is!). It’s private in the sense that it’s your little bubble, to do with as you please. You control it, nurture it and manage it (hopefully with automation tools!). But private doesn’t mean isolated, and this does not fit the definition of a “private cloud.”

If you misconfigure your AWS environment, you can accidentally expose your environment to the public Internet, and an intruder may be able to use this as a stepping-stone into the rest of your network.

In this article, we’re going to look at the building blocks of your own “private” cloud in the AWS environment. We’ll cover isolating your VPC from the public internet, controlling what data enters and, crucially, leaves your cloud, as well as ensuring that your users can get the best out of their new shiny cloud.

Connecting to your “private” Cloud

AWS is most commonly accessed over the Internet. You publish ‘services’ to be consumed by your users. This is how many people think of AWS – a Load balancer with a couple of web servers, some databases and perhaps a bit of email or workflow.

In the “private” world, it’s unlikely you’ll want to provide direct access to your services over the Internet. You need to guarantee the integrity and security of your data. To maximise your use of the new environment you want to make sure it’s as close to your users and the rest of your infrastructure as possible.

AWS has two private connectivity methods you can use for this: DirectConnect and AWS managed VPN.

Both technologies allow you to “extend” your network into AWS. When you create your VPC, you allocate an IP range (that doesn’t clash with your internet network), and you can then establish a site-to-site connection to your new VPC. Any instance or service you spin up in your VPC is accessed directly from your internal network, using its private IP address. It’s just as if a new datacenter appeared on your network. Remember, you can still configure your VPC with an Internet Gateway and allocate Public IP addresses (or Elastic IPs) to your instances, which would then give them both an Internet IP and an IP on your internal network – you probably don’t want to do this!

The AWS managed VPN service allows you to establish a VPN over the Internet between your network(s) and AWS. You’re limited by the speed of your internet connection. Also, you’re accessing your cloud environment over the Internet, with all the variable performance and latency that entails.

The diagram below shows an example how of AWS Managed VPN connectivity interfaces with your network:

AWS DirectConnect allows you to establish a private circuit with AWS (like a traditional “leased line”). Your network traffic never touches the Internet or any other uncontrolled public network. You can directly connect to AWS’ routers at one of their shared facilities, or you can use a third-party service to provide the physical connectivity. The right option depends on your connectivity requirements: directly connecting to AWS means you can own the service end-to-end, but using a third party allows you greater flexibility in how you design the resiliency and the connection speed you want to AWS (DirectConnect offers physical 1GbE or 10GbE connectivity options, but you might want something in between, which is where a third party can really help here).

The diagram below shows an example of how you can architect DirectConnect connectivity between your corporate datacenter and the AWS cloud. DirectConnect also allows you to connect directly to Amazon services over your private connection, if required. This ensures that no traffic traverses the public Internet when you’re accessing AWS hosted services (such as API endpoints, S3, etc.). DirectConnect also allows you to access services across different regions, so you could have your primary infrastructure in eu-west-1 and your DR infrastructure in eu-west-2, and use the same DirectConnect to access both regions.

2-direct_connect_overview

Both connectivity options offer the same native approach to access control you’re familiar with. Network ACLs (NACLs) and Security Groups function exactly as before – you can reference your internal network IP addresses/CIDR ranges as normal and control service access by IP and port. There’s no NAT in place between your network and AWS; it’s just like another datacenter on your network.

Pro Tip: You probably want to delete your default VPCs. By default, AWS services will launch into the default VPC for a specific region, and this comes configured with the standard AWS template of ‘public/private’ subnets and internet gateways. Deleting the default VPCs and associated security groups makes it slightly harder for someone to spin up a service in the wrong place accidentally.

Workload Segregation

You’re not restricted to a single AWS VPC (by default, you’re able to create 5 per region, but this limit can be increased by contacting AWS support). VPCs make it very easy to isolate services – services you might not want to be accessed directly from your corporate network. You can build a ‘DMZ-like’ structure in your “private” cloud environment.

One good example of this is in the diagram below – you have a “landing zone” VPC where you host services that should be accessible directly from your corporate network (allowing you to create a bastion host environment), and you run your workloads elsewhere – isolated from your internal corporate network. In the example below, we also show an ‘external’ VPC – allowing us to access Internet-based services, as well as providing a secure inbound zone where we can accept incoming connectivity if required (essentially, this is a DMZ network, and can be used for both inbound and outbound traffic).

Through the use of VPC Peering, you can ensure that your workload VPCs can be reached from your inbound-gateway VPC, but as VPCs do not support transitive networking configurations by default, you cannot connect from the internal network directly to your workload VPC.

3-Multi-VPC Peering and Privatelink

Shared Services

Once your connectivity between your corporate network and AWS is established, you’ll want to deploy some services. Sure, spinning up an EC2 instance and connecting to it is easy, but what if you need to connect to an authentication service such as LDAP or Active Directory? Do you need to route your access via an on-premise web proxy server? Or, what if you want to publish services to the rest of your AWS environment or your corporate network but keep them isolated in your DMZ VPC?

Enter AWS PrivateLink: Launched at re:Invent in 2017, it allows you to “publish” a Network Load Balancer to other VPCs or other AWS Accounts without needing to establish VPC peering. It’s commonly used to expose specific services or to supply MarketPlace services (“SaaS” offerings) without needing to provide any more connectivity over and above precisely what your service requires.

We’re going to offer an example here of using PrivateLink to expose access to an AWS hosted web proxy server to our isolated inbound and workload VPCs. This gives you the ability to keep sensitive services isolated from the rest of your network but still provide essential functionality. AWS prohibit transitive VPCs for network traffic (i.e., you cannot route from VPC A to VPC C via a shared VPC B) but PrivateLink allows you to work around this limitation for individual services (basically, anything you can “hide” behind a Network Load Balancer).

Assuming we’ve created the network architecture as per the diagram above, we need to create our Network Load Balancer first. NLBs are the only load balancer type supported by PrivateLink at present.

4-load balancer.png

Once this is complete, we can then create our ‘Endpoint Service,’ which is in the VPC section of the console:

5-create endpoint service.png

Once the Endpoint Service is created, take note of the Endpoint Service Name, you’ll need this to create the actual endpoints in your VPCs.

6-endpoint service details

The Endpoint Service Name is unique across all VPC endpoints in a specific region. This means you can share this with other accounts, which are then able to discover your endpoint service. By default, you need to accept all requests to your endpoint manually, but this can be disabled (you probably don’t want this, though!). You can also whitelist specific account IDs that are allowed to create a PrivateLink connection to your endpoint.

Once your Endpoint Service is created, you then need to expose this into your VPCs. This is done from the ‘Endpoints’ configuration screen under VPCs in the AWS console. Validate your endpoint service name and select the VPC required – simple!

7-endpoint details

 

You can then use this DNS name to reference your VPC endpoint. It will resolve to an IP address in your VPC (via an Elastic Network Interface), but traffic to this endpoint will be routed directly across the Amazon network to the Network Load Balancer.

What’s in a Name?

Typically, one of the biggest hurdles with connecting between your internal network and AWS is the ability to route DNS queries correctly. DNS is key to many Amazon services, and Amazon Provided DNS (now Route53 Resolver) contains a significant amount of behind-the-scenes intelligence, such as allowing you to reach the correct Availability Zone target for your ALB or EFS mount point.

Hot off the press is the launch of Route53 Resolver, which removes the need to create your own DNS infrastructure to route requests between your AWS network and your internal network, while allowing you to continue to leverage the intelligence built into the Amazon DNS service. Previously, you would need to build your own DNS forwarder on an EC2 instance to route queries to your corporate network. This means that, from the AWS perspective, all your DNS requests are originating from a single server in a specific AZ (which might be different to the AZ of the client system), and so you’d end up getting the endpoint in a different region for your service. With a service such as EFS, this could result in increased latency and a high cross-AZ data transfer bill.

Here’s an example of how the Route53 resolver automatically picks the correct mount point target based on the location of your client system:

Pro Tip: If you’re using a lot of standardised endpoint services (such as proxy servers), using a common DNS name which can be used across VPCs is a real time-saver. This requires you to create a Route53 internal zone for each VPC (such as workload.example.com, inbound.example.com) and update the VPC DHCP Option Set to hand out this domain name via DHCP to your instances. This, then allows you to create a record in each zone with a CNAME to the endpoint service, for example:

From an instance in our workload VPC:

And the same commands from an instance in our inbound VPC:

In this example above, we could use our configuration management system to set the http_proxy environment variable to ‘proxy.privatelink:3128’ and not have to have per-VPC specific logic configured. Neat!

Closing Notes

There are still AWS services that expect to have Internet access available from your VPC by default. One example of this is AWS Fargate – the Amazon-hosted and managed container deployment solution. However, Amazon is constantly migrating more and more services to PrivateLink, meaning this restriction is slowly going away.

A full list of currently available VPC endpoint services is available in the VPC Endpoint documentation. AWS provided VPC Endpoints also give you the option to update DNS to return the VPC endpoint IPs when you resolve the relevant AWS endpoint service name (i.e. ec2.eu-west-1.amazonaws.com -> vpce-123-abc.ec2.eu-west-1.amazonaws.com -> 10.10.0.123) so you do not have to make any changes to your applications in order to use the Amazon provided endpoints.

About the Author

Jon is a freelance cloud devoperative buzzword-hater, currently governing the clouds for a financial investment company in London, helping them expand their research activities into “the cloud.”

Before branching out into the big bad world of corporate consulting, Jon spent five years at Red Hat, focusing on the financial services sector as a Technical Account Manager, and then as an on-site consultant.

When he’s not yelling at the cloud, Jon is a trustee of the charity Service By Emergency Rider Volunteers – Surrey & South London, the “Blood Runners,” who provide free out-of-hours transport services to the UK National Health Service. He is also guardian to two small dogs and a flock of chickens.

Feel free to shout at him on Twitter, or send an old-fashioned email.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


Providing Static IPs for Non-Trivial Architectures

12. December 2016 2016 0

Author: Oli Wood
Editors: Seth Thomas, Scott Francis

An interesting problem landed on my desk a month ago that seemed trivial to begin with, but once we started digging into the problem it turned out to be more complex than we thought.  A small set of our clients needed to restrict outgoing traffic from their network to a whitelist of IP addresses.  This meant providing a finite set of IPs which we could use to provide a route into our data collection funnel.

Traditionally this has not been too difficult, but once you take into account the ephemeral nature of cloud infrastructures and the business requirements for high availability and horizontal scaling (within reason) it gets more complex.

We also needed to take into account that our backend system (api.example.com) is deployed in a blue/green manner (with traffic being switched by DNS), and that we didn’t want to incur any additional management overhead with the new system.  For more on Blue/Green see http://martinfowler.com/bliki/BlueGreenDeployment.html.

Where we ended up looks complex but is actually several small systems glued together.  Let’s describe the final setup and then dig into each section.

The Destination

A simplified version of the final solution.
A simplified version of the final solution.

 

The View from the Outside World

Our clients can address our system by two routes:

  • api.example.com – our previous public endpoint.  This is routed by Route 53 to either api-blue.example.com or api-green.example.com
  • static.example.com – our new address which will always resolve to a finite set of IP addresses (we chose 4).  This will eventually route through to the same blue or green backend.

The previous infrastructure

api-blue.example.com is an autoscaling group deployed (as part of a wider system) inside its own VPC.  When we blue/green deploy an entire new VPC is created (this is something we’re considering revisiting).  It is fronted by an ELB.  Given the nature of ELBs, the IP addresses of this instance will change over time, which is why we started down this road.

The proxying infrastructure

static.example.com is a completely separate VPC which houses 4 autoscaling groups set to a minimum size of 1 and a maximum size of 1.  The EC2 instances are assigned an EIP on boot (more on this later) and have HAProxy 1.6 installed.  HAProxy is setup to provide two things:

  • A TCP proxy endpoint on port 443
  • A healthcheck endpoint of port 9000

The DNS configuration

The new DNS entry for static.example.com is configured so that it only returns IP addresses for up to 4 of the EIPs, based on the results of their healthcheck (as provided by HAProxy).

How we got there

The DNS setup

static.example.com is based on a set of four Health Checks which form a Traffic Policy that creates the Policy Record (which is the equivalent of your normal DNS entry).

Steps to create Health Checks:

  1. Log into the AWS Console
  2. Head to Route 53
  3. Head to Health Checks
  4. Create new Health Check
    1. What to monitor => Endpoint
    2. Specify endpoint by => IP Address
    3. Protocol => HTTP
    4. IP Address => [Your EIP]
    5. Host name => Ignore
    6. Port => 9001
    7. Path => /health

Repeat four times.  Watch until they all go green.

Steps to create Traffic Policy:

  1. Route 53
  2. Traffic Policies
  3. Create Traffic Policy
    1. Policy name => something sensible
    2. Version description => something sensible

This opens up the GUI editor

  1. Choose DNS type A: IP address
  2. Connect to => Weighted Rule
  3. Add 2 more Weights
  4. On each choose “Evaluate target health” and then one of your Health Checks
  5. Make sure the Weights are all set the same (I chose 10)
  6. For each click “Connect to” => New Endpoint
    1. Type => Value
    2. Value => EIP address
The traffic policy in the GUI
The traffic policy in the GUI

Adding the Policy record

  1. Route 53
  2. Policy Record
  3. Create new Policy Record
    1. Traffic policy => Your new policy created above
    2. Version => it’ll probably be version 1 because you just created it
    3. Hosted zone => chose the domain you’re already managing in AWS
    4. Policy record => add static.example.com equivalent
    5. TTL => we chose 60 seconds

And there you go, static.example will route traffic to your four EIPs, but only if they are available.

The Autoscaling groups

The big question you’re probably wondering here is “why did they create four separate Autoscaling groups?  Why not just use one?”  It’s a fair question, and our choice might not be right for you, but the reasoning is that we didn’t want to build something else to manage which EIPs were assigned to each of the 4 instances.  By using 4 separate Autoscaling groups we can use 4 separate Launch Configurations, and then use the EC2 tags to manage how an instance knows which EIP to launch.

The keys things here are…

  • Each of the Autoscaling Groups is defined separately in our CloudFormation stack
  • Each of the Autoscaling Groups has its own Launch Configuration
  • We place two Autoscaling Groups in each of our Availability Zones
  • We place two Autoscaling Groups in each Public Subnet
  • Tags on the Autoscaling Group are set with “PropagateAtLaunch: true” so that the instances they launch end up with the EIP reference on them
  • Each of the four Launch Configurations includes the same UserData script (Base64 encoded in our CloudFormation template)
  • The LaunchConfiguration includes an IAM Role giving enough permissions to be able to tag the instance

The UserData script

The IAM Role statement

The EC2 instances

We chose c4.xlarge instances to provide a good amount of network throughput.  Because HAProxy is running in TCP mode we struggle to monitor the traffic levels and so we’re using CloudWatch to alert on very high or low Network Output from the four instances.

The EC2 instances themselves are launched from a custom AMI which includes very little except a version of HAProxy (thanks to ITV for https://github.com/ITV/rpm-haproxy).  We’re using this fork because it supplies the slightly newer HAProxy veresion 1.6.4

Unusually for us we’ve baked the config for HAProxy into the AMI.  This is a decision we will revisit at a later date I suspect and have the config pulled from S3 at boot time.

HAProxy is set to start on boot.  Something we shall probably add at a later date is to have the Autoscaling Group use the same healthcheck endpoint that HAProxy provides to Route 53 to determine the instance health. This way we’ll launch another instance if one comes up, but does not provide a healthy HAProxy for some reason.

The HAProxy setup

HAProxy is a fabulously flexible beast and we had a lot of options on what to do here.  We did however wish to keep it as simple as possible.  With that in mind, we opted to not offload SSL at this point but to act as a passthrough proxy direct to our existing architecture.

Before we dive into the config, however, it’s worth mentioning our choice of backend URL.  We opted to route back to api.example.com because this means that when we blue/green deploy our existing setup we don’t need to make any changes to our HAProxy setup.  By using its own health check mechanism and “resolvers” entry we can make sure that the IP addresses that it is routing to (the new ELB) aren’t more than a few seconds out of date.  This loopback took us a while to figure out and is (again) something we might revisit in the future.

Here are the important bits of the config file:

The resolver

Makes use of AWS’s internal DNS service.  This has to be used in conjunction with a health check on the backend server

The front end listener

Super simple.  This would be more complex if you wanted to route traffic from different source addresses to different backends using SNI (see http://blog.haproxy.com/2012/04/13/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/).

The backend listener

The key things here are the including of the resolver (mydns, as defined above). It’s the combination of the two which causes HAProxy to reevaluate the DNS entry.

The outwards facing health check

This will return a 200 if everything is ok, 503 if the backend is down, and will return a connection failure if HAProxy is down. This will correctly inform the Route 53 health checks and if needed R53 will not include the IP address.

What we did to test it

We ran through various scenarios to check how the system coped:

  • Deleting one of the proxy instances and seeing it vanish from the group returned from static.example.com
  • Doing a blue/green deployment and seeing HAProxy update its backend point
  • Block access to one AZ with a tweak to the Security Group to simulate the AZ becoming unavailable
  • Forcing 10 times our load in using Vegeta
  • Running a soak test at sensible traffic levels over several hours (also with Vegeta)

The end result

While this is only providing 4 EC2 instances which proxy traffic, it’s a pattern which could be scaled out very easily, with each section bringing another piece of the resilience pie to the table.

  • Route 53 does a great job of only including EIPs that are associated with healthy instances
  • The Autoscaling Groups make sure that our proxy instances will bounce back if something nasty happens to them
  • UserData and Tags provide a neat way for the instances to self-manage the allocation of EIPs
  • HAProxy provides both transparent routing and health checks.
  • Route 53 works really well for Blue/Greening our traffic to our existing infrastructure.

It’s not perfect (I imagine we’ll have issues with some client caching DNS records for far too long at some point), and I’ll wager we’ll end up tuning some of the timeouts and HAProxy config at some point in the future, but for now it’s out there and happily providing an end point for our customers (and not taking up any of our time).  We’ve tested how to deploy updates (deploy a new CloudFormation stack and let the new instance “steal” the EIPs) successfully too.

About the Author:

Oli Wood has been deploying systems into AWS since 2010 in businesses ranging from 2 people startups to multi-million dollar enterprises. Previous to that he mostly battled with deploying them onto other service providers, cutting his teeth in a version control and deployment team on a Large Government Project back in the mid 2000s.

Inside of work he spends time, train tickets and shoe leather helping teams across the business benefit from DevOps mentality.

Outside of work he can mostly be found writing about food on https://www.omnomfrickinnom.com/ and documenting the perils of poor posture at work at http://goodcoderbadposture.com/

Online he’s @coldclimate

About the Editors:

Scott Francis has been designing, building and operating Internet-scale infrastructures for the better part of 20 years. He likes BSD, Perl, AWS, security, cryptography and coffee. He’s a good guy to know in a zombie apocalypse. Find him online at  https://linkedin.com/in/darkuncle and https://twitter.com/darkuncle.