Providing Static IPs for Non-Trivial Architectures

12. December 2016 2016 0

Author: Oli Wood
Editors: Seth Thomas, Scott Francis

An interesting problem landed on my desk a month ago that seemed trivial to begin with, but once we started digging into the problem it turned out to be more complex than we thought.  A small set of our clients needed to restrict outgoing traffic from their network to a whitelist of IP addresses.  This meant providing a finite set of IPs which we could use to provide a route into our data collection funnel.

Traditionally this has not been too difficult, but once you take into account the ephemeral nature of cloud infrastructures and the business requirements for high availability and horizontal scaling (within reason) it gets more complex.

We also needed to take into account that our backend system (api.example.com) is deployed in a blue/green manner (with traffic being switched by DNS), and that we didn’t want to incur any additional management overhead with the new system.  For more on Blue/Green see http://martinfowler.com/bliki/BlueGreenDeployment.html.

Where we ended up looks complex but is actually several small systems glued together.  Let’s describe the final setup and then dig into each section.

The Destination

A simplified version of the final solution.
A simplified version of the final solution.

 

The View from the Outside World

Our clients can address our system by two routes:

  • api.example.com – our previous public endpoint.  This is routed by Route 53 to either api-blue.example.com or api-green.example.com
  • static.example.com – our new address which will always resolve to a finite set of IP addresses (we chose 4).  This will eventually route through to the same blue or green backend.

The previous infrastructure

api-blue.example.com is an autoscaling group deployed (as part of a wider system) inside its own VPC.  When we blue/green deploy an entire new VPC is created (this is something we’re considering revisiting).  It is fronted by an ELB.  Given the nature of ELBs, the IP addresses of this instance will change over time, which is why we started down this road.

The proxying infrastructure

static.example.com is a completely separate VPC which houses 4 autoscaling groups set to a minimum size of 1 and a maximum size of 1.  The EC2 instances are assigned an EIP on boot (more on this later) and have HAProxy 1.6 installed.  HAProxy is setup to provide two things:

  • A TCP proxy endpoint on port 443
  • A healthcheck endpoint of port 9000

The DNS configuration

The new DNS entry for static.example.com is configured so that it only returns IP addresses for up to 4 of the EIPs, based on the results of their healthcheck (as provided by HAProxy).

How we got there

The DNS setup

static.example.com is based on a set of four Health Checks which form a Traffic Policy that creates the Policy Record (which is the equivalent of your normal DNS entry).

Steps to create Health Checks:

  1. Log into the AWS Console
  2. Head to Route 53
  3. Head to Health Checks
  4. Create new Health Check
    1. What to monitor => Endpoint
    2. Specify endpoint by => IP Address
    3. Protocol => HTTP
    4. IP Address => [Your EIP]
    5. Host name => Ignore
    6. Port => 9001
    7. Path => /health

Repeat four times.  Watch until they all go green.

Steps to create Traffic Policy:

  1. Route 53
  2. Traffic Policies
  3. Create Traffic Policy
    1. Policy name => something sensible
    2. Version description => something sensible

This opens up the GUI editor

  1. Choose DNS type A: IP address
  2. Connect to => Weighted Rule
  3. Add 2 more Weights
  4. On each choose “Evaluate target health” and then one of your Health Checks
  5. Make sure the Weights are all set the same (I chose 10)
  6. For each click “Connect to” => New Endpoint
    1. Type => Value
    2. Value => EIP address
The traffic policy in the GUI
The traffic policy in the GUI

Adding the Policy record

  1. Route 53
  2. Policy Record
  3. Create new Policy Record
    1. Traffic policy => Your new policy created above
    2. Version => it’ll probably be version 1 because you just created it
    3. Hosted zone => chose the domain you’re already managing in AWS
    4. Policy record => add static.example.com equivalent
    5. TTL => we chose 60 seconds

And there you go, static.example will route traffic to your four EIPs, but only if they are available.

The Autoscaling groups

The big question you’re probably wondering here is “why did they create four separate Autoscaling groups?  Why not just use one?”  It’s a fair question, and our choice might not be right for you, but the reasoning is that we didn’t want to build something else to manage which EIPs were assigned to each of the 4 instances.  By using 4 separate Autoscaling groups we can use 4 separate Launch Configurations, and then use the EC2 tags to manage how an instance knows which EIP to launch.

The keys things here are…

  • Each of the Autoscaling Groups is defined separately in our CloudFormation stack
  • Each of the Autoscaling Groups has its own Launch Configuration
  • We place two Autoscaling Groups in each of our Availability Zones
  • We place two Autoscaling Groups in each Public Subnet
  • Tags on the Autoscaling Group are set with “PropagateAtLaunch: true” so that the instances they launch end up with the EIP reference on them
  • Each of the four Launch Configurations includes the same UserData script (Base64 encoded in our CloudFormation template)
  • The LaunchConfiguration includes an IAM Role giving enough permissions to be able to tag the instance

The UserData script

The IAM Role statement

The EC2 instances

We chose c4.xlarge instances to provide a good amount of network throughput.  Because HAProxy is running in TCP mode we struggle to monitor the traffic levels and so we’re using CloudWatch to alert on very high or low Network Output from the four instances.

The EC2 instances themselves are launched from a custom AMI which includes very little except a version of HAProxy (thanks to ITV for https://github.com/ITV/rpm-haproxy).  We’re using this fork because it supplies the slightly newer HAProxy veresion 1.6.4

Unusually for us we’ve baked the config for HAProxy into the AMI.  This is a decision we will revisit at a later date I suspect and have the config pulled from S3 at boot time.

HAProxy is set to start on boot.  Something we shall probably add at a later date is to have the Autoscaling Group use the same healthcheck endpoint that HAProxy provides to Route 53 to determine the instance health. This way we’ll launch another instance if one comes up, but does not provide a healthy HAProxy for some reason.

The HAProxy setup

HAProxy is a fabulously flexible beast and we had a lot of options on what to do here.  We did however wish to keep it as simple as possible.  With that in mind, we opted to not offload SSL at this point but to act as a passthrough proxy direct to our existing architecture.

Before we dive into the config, however, it’s worth mentioning our choice of backend URL.  We opted to route back to api.example.com because this means that when we blue/green deploy our existing setup we don’t need to make any changes to our HAProxy setup.  By using its own health check mechanism and “resolvers” entry we can make sure that the IP addresses that it is routing to (the new ELB) aren’t more than a few seconds out of date.  This loopback took us a while to figure out and is (again) something we might revisit in the future.

Here are the important bits of the config file:

The resolver

Makes use of AWS’s internal DNS service.  This has to be used in conjunction with a health check on the backend server

The front end listener

Super simple.  This would be more complex if you wanted to route traffic from different source addresses to different backends using SNI (see http://blog.haproxy.com/2012/04/13/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/).

The backend listener

The key things here are the including of the resolver (mydns, as defined above). It’s the combination of the two which causes HAProxy to reevaluate the DNS entry.

The outwards facing health check

This will return a 200 if everything is ok, 503 if the backend is down, and will return a connection failure if HAProxy is down. This will correctly inform the Route 53 health checks and if needed R53 will not include the IP address.

What we did to test it

We ran through various scenarios to check how the system coped:

  • Deleting one of the proxy instances and seeing it vanish from the group returned from static.example.com
  • Doing a blue/green deployment and seeing HAProxy update its backend point
  • Block access to one AZ with a tweak to the Security Group to simulate the AZ becoming unavailable
  • Forcing 10 times our load in using Vegeta
  • Running a soak test at sensible traffic levels over several hours (also with Vegeta)

The end result

While this is only providing 4 EC2 instances which proxy traffic, it’s a pattern which could be scaled out very easily, with each section bringing another piece of the resilience pie to the table.

  • Route 53 does a great job of only including EIPs that are associated with healthy instances
  • The Autoscaling Groups make sure that our proxy instances will bounce back if something nasty happens to them
  • UserData and Tags provide a neat way for the instances to self-manage the allocation of EIPs
  • HAProxy provides both transparent routing and health checks.
  • Route 53 works really well for Blue/Greening our traffic to our existing infrastructure.

It’s not perfect (I imagine we’ll have issues with some client caching DNS records for far too long at some point), and I’ll wager we’ll end up tuning some of the timeouts and HAProxy config at some point in the future, but for now it’s out there and happily providing an end point for our customers (and not taking up any of our time).  We’ve tested how to deploy updates (deploy a new CloudFormation stack and let the new instance “steal” the EIPs) successfully too.

About the Author:

Oli Wood has been deploying systems into AWS since 2010 in businesses ranging from 2 people startups to multi-million dollar enterprises. Previous to that he mostly battled with deploying them onto other service providers, cutting his teeth in a version control and deployment team on a Large Government Project back in the mid 2000s.

Inside of work he spends time, train tickets and shoe leather helping teams across the business benefit from DevOps mentality.

Outside of work he can mostly be found writing about food on https://www.omnomfrickinnom.com/ and documenting the perils of poor posture at work at http://goodcoderbadposture.com/

Online he’s @coldclimate

About the Editors:

Scott Francis has been designing, building and operating Internet-scale infrastructures for the better part of 20 years. He likes BSD, Perl, AWS, security, cryptography and coffee. He’s a good guy to know in a zombie apocalypse. Find him online at  https://linkedin.com/in/darkuncle and https://twitter.com/darkuncle.


Building custom AMIs with Packer

08. December 2016 2016 0

Author: Andreas Rütten
Editors: Steve Button, Joe Nuspl

Amazon machine images (AMIs) are the basis of every EC2 instance launched. They contain the root volume and thereby define what operating system or application will run on the instance.

There are two types of AMIs:

  • public AMIs are provided by vendors, communities or individuals. They are available on the AWS Marketplace and can be paid or free.
  • private AMIs belong to a specific AWS account. They can be shared with other AWS accounts by granting launch permissions. Usually they are either a copy of a public AMI or created by the account owner.

There are several reasons to create your own AMI:

  • predefine a template of the software, which runs on your instances. This provides a major advantage for autoscaling environments. Since most, if not all, of the system configuration has already been done, there is no need to run extensive provisioning steps on boot. This drastically reduces the amount of time from instance start to service ready.
  • provide a base AMI for further usage by others. This can be used to ensure a specific baseline across your entire organization.

What is Packer

Packer is software from the HashiCorp universe, like Vagrant, Terraform or Consul. From a single source, you can create machine and container images for multiple platforms.

For that Packer has the concept of builder and provisioner.

Builders exist for major cloud providers (like Amazon EC2, Azure or Google), for container environments (like Docker) or classical visualization environments (like QEMU, VirtualBox or VMware). They manage the build environment and perform the actual image creation.

Provisioners on the other hand are responsible for installing and configuring all components, that will be part of the final image. They can be simple shell commands or fully featured configuration management systems like Puppet, Chef or Ansible.

How to use Packer for building AWS EC2 AMIs

The heart of every Packer build is the template, a JSON file which defines the various steps of each Packer run. Let’s have a look at a very simple Packer template:
simple_template

There is just one builder and one simple provisioner. On line 4, we specify the amazon-ebs builder which means Packer will create an EBS-backed AMI by:

  • launching the source AMI
  • running the provisioner
  • stopping the instance
  • creating a snapshot
  • converting the snapshot into a new AMI
  • terminating the instance

As this all occurs under your AWS account, you need to specify your access_key and secret_key. Lines 7-9 specify the region, source AMI and instance type that will be used during the build. ssh_username specifies which user Packer will use to ssh into the build instance. This is specific to the source AMI. Ubuntu based AMIs use ubuntu which AWS Linux based AMIs use ec2-user.

Packer will create temporary keypairs and security groups to connect the local system to the build instance to run the provisioner. If you are running packer prior to 0.12.0 watch out for GitHub issue #4057.

The last line defines the name of the resulting AMI. We use the {{timestamp}} function of the Packer template engine which generates the current timestamp as part of the final AMI name.

The provisioner section defines one provisioner of the type shell. The local script “setup_things.sh” will be transferred to the build instance and executed. This is the easiest and most basic way to provision an instance.

A more extensive example

The requirements for a real world scenario usually needs something more than just executing a simple shell script during provisioning. Lets add some more advanced features to the simple template.

Optional sections

The first thing somebody could add is a description and a variables section to the top of our template, like:

The first one is just a simple and optional description of the template. While the second adds some useful functionality. It defines variables, which can later be used in the template. Some of them have a default value, others are optional and can be set during the Packer call. Using packer inspect shows that:

Overriding can be done like this:

Multiple builders

The next extension could be to define multiple builders:

The amazon-ebs builder was extended by using some of the previously introduced variables. It got a bit more specific about the build environment which will be used on AWS side by defining a VPC, a subnet and attaching a public IP address to the build instance, and also added how the description of the resulting AMI will look.

The second builder defines a build with docker. This is quite useful for testing the provisioning part of the template. Creating an EC2 instance and an AMI afterwards takes some time and resources while building in a local docker environment is faster.

The pull option ensures that the base docker image is pulled if it isn’t already in the local repository. While the commit option is set so that the container will be committed to an image in the local repository after provisioning instead of exported.

Per default, packer will execute all builders which have been defined. This can be useful if you want to build the same image in a different Cloud Provider or in different AWS regions at the same time. In our example we have a special test builder and the actual AWS builder. The following command tells packer to use only a specific builder:

Provisioner

Provisioners are executed sequentially during the build phase. Using the only option you can restrict the provisioner to be called only by the corresponding builder.

This is useful if you need different provisioners or different options for a provisioner. In this example both call the same script to do some general bootstrap actions. One is for the amazon-ebs builder, where we call the script with sudo, and the other is for the docker builder where we don’t need to call sudo, as being root is the default inside a docker container.

The script itself is about upgrading all installed packages and installing Ansible to prepare the next provisioner:

Now a provisioner of the type ansible-local can be used. Packer will copy the defined Ansible Playbook from the local system into the instance and then execute it locally.

The last one is another simple shell provisioner to do some cleanup:

Post-Processors

Post-processors are run right after the image is built. For example to tag the docker image in the local docker repository:

Or to trigger the next step of a CI/CD pipeline.

Pitfalls

While building AMIs with packer is quite easy in general, there are some pitfalls to be aware of.

The most common is differences between the build system and the instance which will be created based on the AMI. It could be anything from simple things like different instance type to running in a different VPC. This means thinking about what can already be done at build time and what is specific to the environment where an EC2 instance is created based on the build AMI. Other examples, an Apache worker threads configuration based on the amount of available CPU cores, or a VPC specific endpoint the application communicates with, for example, an S3 VPC endpoint or the CloudWatch endpoint where custom metrics are sent.

This can be addressed by running a script or any real configuration management system at first boot time.

Wrapping up

As we have seen, building an AMI with a pre-installed configuration is not that hard. Packer is an easy to use and powerful tool to do that. We have discussed the basic building blocks of a packer template and some of the more advanced options. Go ahead and check out the great Packer documentation which explains this and much more in detail.

All code examples can be found at https://github.com/aruetten/aws-advent-2016-building-amis

About the Author

Andreas Rütten is a Senior Systems Engineer at Smaato, a global real-time advertising platform for mobile publishers & app developers. He also is the UserGroup leader of the AWS UG Meetup in Hamburg.

About the Editors

Steve Button is a Linux admin geek / DevOps who likes messing around with Raspberry Pi, Ruby, Python. Love technology and hate technology.

Joe Nuspl is a Portland, OR based DevOps Kung Fu practioner. He is a senior operations engineer at Workday. One of the DevOpsDays Portland 2016 organizers. Author of the zap chef community cookbook. Aspiring culinary chef. Occasionally he rambles on http://nvwls.github.io/ or @JoeNuspl on Twitter.