AWS Advent 2014 – High-Availability in AWS with keepalived, EBS and Elastic Network Interfaces

Today’s post on how to achieve high availability in AWS with keepalived comes to us from Julian Dunn, who’s currently helping improve things at Chef.

Introduction

By now, most everyone knows that running infrastructure in AWS is not the same as a traditional data center, thus putting a lie to claims that you can just “lift and shift to the cloud”. In AWS, one normally achieves “high-availability” by scaling horizontally. For example, if you have a WordPress site, you could create several identical WordPress servers and put them all behind an Elastic Load Balancer (ELB), and connect them all to the same database. That way, if one of these servers fails, the ELB will stop directing traffic to it, but your site will still be available.

But about that database – isn’t it also a single-point-of-failure? You can’t very well pull the same horizontal-redundancy trick for services that explicitly have one writer (and potentially many readers). For a database, you could probably use Amazon Relational Database Server (RDS), but suppose Amazon doesn’t have a handy highly-available Platform-as-a-Service variant for the service you need?

In this post, I’ll show you how to use that old standby, keepalived, in conjunction with Virtual Private Cloud (VPC) features, to achieve real high-availability in AWS for systems that can’t be horizontally replicated.

Kit of Parts

To create high-availability out of two (or more) systems, you need the following components:

  • A service IP (commonly referred to as a VIP, for virtual IP) that can be moved between the systems to which client systems will communicate
  • A block device containing data served by the currently-active system that can be detached and reattached to others, should the active one fail
  • Some kind of cluster coordination system to handle master/backup election, as well as doing all the housekeeping to move the service IP and block device to the active node.

In AWS, we’ll use:

  • Private secondary addresses on an Elastic Network Interface (ENI) as the service IP.
  • A separate Elastic Block Storage (EBS) volume as the block device
  • keepalived as the cluster coordination system.

There are a few limitations to this approach in AWS. Most important is that all instances and the block storage device must live in the same VPC subnet, which implies that they live in the same availability zone (AZ).

Just Enough keepalived for HA

Keepalived for Linux has been around for over ten years, and while it is very robust and reliable, it can be very difficult to grasp because it is designed for a variety of use cases, some very distinct from the one we are going to implement. Software design diagrams like this one do not necessarily aid in understanding how it works.

For the purposes of building an HA system, you need only know a few things about keepalived:

  • As previously mentioned, keepalived serves as a cluster coordination system between two or more peers.
  • Keepalived uses the Virtual Router Redundancy Protocol (VRRP) for assigning the service IP to the active instance. It does this by talking to the Linux netlink layer directly. Thus, don’t try to useifconfig to examine whether the master’s interface has the VIP, as ifconfig doesn’t use netlink system calls and the VIP won’t show up! Use ip addr instead.
  • VRRP is normally run over multicast in a closed network segment. However, in a cloud environment where multicast is not permitted, we must use unicast, which implies that we need to list all peers participating in the cluster.
  • Keepalived has the ability to invoke external scripts whenever a cluster member transitions from backup to master (or vice-versa). We will use this functionality to associate and mount the EBS block device (or the inverse, when transitioning from master to backup).

Building the HA System

We’ll spin up two identical systems in the same VPC subnet for our master and backup nodes. To avoid passing AWS access and secret keys to the systems, I’ve created an IAM instance profile & role called awsadvent-ha with a policy document to let the systems manage ENI addresses and EBS volumes:

For this exercise I used Fedora 21 AMIs, because Fedora has a recent-enough version of keepalived with VRRP-over-unicast support:

You’ll notice that one of the security groups I’ve placed the machines into is entitled internal-icmp, which is a group I created to allow the instances to ping each other (send ICMP Echo Request and receive ICMP Echo Reply). This is what keepalived will use as a heartbeat mechanism between nodes.

We also need a separate EBS volume for the data, so let’s create one in the same AZ as the instances:

Note that the volume needs to be partitioned and formatted at some point; I don’t do that in this tutorial.

Installing and configuring keepalived

Once the two machines are up and reachable, it’s time to install and configure keepalived. SSH to them and type:

I intend to write the external failover scripts called by keepalived in Ruby, so I’m going to install that, and the fog gem that will let me communicate with the AWS API:

keepalived is configured using the /etc/keepalived/keepalived.conf file. Here’s the configuration I used for this demo:

A couple of notes about this configuration:

  • 172.31.40.96 is the current machine; 172.31.40.95 is its peer. The peer has the IPs reversed in the unicast_srcip and unicast_peer clauses, so make sure to change this. (A configuration management system sure would help here…)
  • 172.31.36.57 is the virtual IP address which will be bound as a secondary IP address to the active master’s Elastic Network Interface. You can pick anything unused in your subnet.

The notify script, awsha.rb

As previously mentioned, the external script is invoked whenever a master-to-backup or backup-to-master event occurs, via the notify_backup and notify_master directives in keepalived.conf. Upon receiving an event, it will associate and mount (or unmount and disassociate) the EBS volume from the instance, and attach or release the ENI secondary address.

The script is too long to reproduce inline here, so I’ve included it as a separate Gist.

Note: For brevity, I’ve eliminated a lot of error-handling from the script, so it may or may not work out-of-the-box. In a real implementation, you need to check for many error conditions like open files on a disk volume, poll for the EC2 API to attach/release the volume, etc.

Putting it all together

Start keepalived on both servers:

One of them will elect itself the master, assign the ENI secondary IP to itself, and attach and mount the block device on /mnt. You can see which is which by checking the service status:

The other machine will say that it’s transitioned to backup state:

To force a failover, stop keepalived on the current master. The backup system will detect that the master went away, and transition to primary:

After a while, the backup should be reachable on the VIP, and have the disk volume mounted under/mnt.

If you now start keepalived on the old master, it should come back online as the new backup.

Wrapping Up

As we’ve seen, it’s not always possible to architect systems in AWS for horizontal redundancy. Many pieces of software, particularly those involving one writer and many readers, cannot be set up this way.

In other situations, it’s not desirable to build horizontal redundancy. One real-life example is a highly-available large shared cache system (e.g. squid or varnish) where it would be costly to rebuild terabytes of cache on instance failure. At Chef Software, we use an expanded version of the tools shown here to implement our Chef Server High-Availability solution.

Finally, I also found this presentation by an AWS solutions architect in Japan very useful in identifying what L2 and L3 networking technologies are available in AWS:http://www.slideshare.net/kentayasukawa/ip-multicast-on-ec2