Multi-region Serverless APIs: I’ve got a fever and the only cure is fewer servers

08. December 2018 2018 0

Meet SAM

Let’s talk about the hottest thing in computers for the past few years. No, not Machine Learning. No, not Kubernetes. No, not big data. Fine, one of the hottest things in computers. Right, serverless!

It’s still an emerging and quickly changing field, but I’d like to take some time to demonstrate how easy it is to make scalable and reliable multi-region APIs using just a few serverless tools and services.

It’s actually deceptively simple. Well, for a “blog post”-level application, anyway.

We’re going to be managing this application using the wonderful AWS Serverless Application Model (SAM) and SAM CLI. Far and away the easiest way I have ever used for creating and deploying serverless applications. And, in keeping with contemporary practices, it even has a cute little animal mascot.

SAM is a feature of CloudFormation that provides a handful of short-hand resources that get expanded out to their equivalent long-hand CloudFormation resources upon ChangeSet calculation. You can also drop down into regular CloudFormation whenever you need to to manage resources and configurations not covered by SAM.

The SAM CLI is a local CLI application for developing, testing, and deploying your SAM applications. It uses Docker under the hood to provide as close to a Lambda execution environment as possible and even allows you to run your APIs locally in an APIGateway-like environment. It’s pretty great, IMO.

So if you’re following along, go ahead and install Docker and the SAM CLI and we can get started.

The SAMple App

Once that’s installed, let’s generate a sample application so we can see what it’s all about. If you’re following along on the terminal, you can run sam init -n hello-sam -r nodejs8.10 to generate a sample node app called hello-sam. You can also see the output in the hello-sam-1 folder in the linked repo if you aren’t at a terminal and just want to read along.

The first thing to notice is the README.md that is full of a huge amount of information about the repo. For the sake of brevity, I’m going to leave learning the basics of SAM and the repo structure you’re looking at as a bit of an exercise for the reader. The README and linked documentation can tell you anything you need to know.

The important thing to know is that hello_world/ contains the code and template.yaml contains a special SAM-flavored CloudFormation template that controls the application. Take some time to familiarize yourself with it if you want to.

SAM Local

So what can SAM do, other than give us very short CFN templates? Well the SAM CLI can do a lot to help you in your local development process. Let’s try it out.

Step 0 is to install your npm dependencies so your function can execute:

Alright, now let’s have some fun.

There are a few local invocation commands that I won’t cover here because we’re making an API. The real magic with the CLI is that you can run your API locally with sam local start-api. This will inspect your template, identify your API schema, start a local API Gateway, and mount your functions at the correct paths. It’s by no means a perfect replica of running in production, but it actually does a surprisingly great job.

When we start the API, it will mount our function at /hello, following the path specified in the Events attribute of the resource.

Now you can go ahead and curl against the advertised port and path to execute your function.

Then on the backend, you’ll see it executing your function in Docker:

You can change your code at-will and the next invocation will pick it up. You can also attach a debugger to the process if you aren’t a “debug statement developer.”

Want to try to deploy it? I guess we might as well – it’s easy enough. The only pre-requisite is that we need an S3 bucket for our uploaded code artifact. So go ahead and make that – call it whatever you like.

Now, we’ll run a sam package. This will bundle up the code for all of your functions and upload it to S3. It’ll spit out a rendered “deployment template” that has the local CodeUris swapped out for S3 URLs.

If you check out deploy-template.yaml, you should see…very few remarkable differences. Maybe some of the properties have been re-ordered or blank lines removed. But the only real difference you should see is that the relative CodeUrl of CodeUri: hello_world/ for your function has been resolved to an S3 URL for deployment.

Now let’s go ahead and deploy it!

Lastly, let’s find the URL for our API so we can try it out:

Cool, let’s try it:

Nice work! Now that you know how SAM works, let’s make it do some real work for us.

State and Data

We’re planning on taking this multi-region by the end of this post. Deploying a multi-region application with no state or data is both easy and boring. Let’s do something interesting and add some data to our application. For the purposes of this post, let’s do something simple like storing a per-IP hit counter in DynamoDB.

We’ll go through the steps below, but if you want to jump right to done, check out the hello-sam-2 folder in this repository.

SAM offers a SimpleTable resource that creates a very simple DynamoDB table. This is technically fine for our use-case now, but we’ll need to be able to enable Table Streams in the future to go multi-region. So we’ll need to use the regular DynamoDB::Table resource:

We can use environment variables to let our functions know what our table is named instead of hard-coding it in code. Let’s add an environment variable up in the Globals section. This ensures that any functions we may add in the future automatically have access to this as well.

Change your Globals section to look like:

Lastly, we’ll need to give our existing function access to update and read items from the table. We’ll do that by setting the Policies attribute of the resource, which turns into the execution role. We’ll give the function UpdateItem and GetItem. When you’re done, the resource should look like:

Now let’s have our function start using the table. Crack open hello_world/app.js and replace the content with:

On each request, our function will read the requester’s IP from the event, increment a counter for that IP in the DynamoDB table, and return the total number of hits for that IP to the user.

This’ll probably work in production, but we want to be diligent and test it because we’re responsible, right? Normally, I’d recommend you spin up a DynamoDB-local Docker container, but to keep things simple for the purposes of this post, let’s create a “local dev” table in our AWS account called HitsTableLocal.

And now let’s update our function to use that table when we’re executing locally. We can use the AWS_SAM_LOCAL environment variable to determine if we’re running locally or not. Toss this at the top of your app.js to select that table when running locally:

Now let’s give it a shot! Fire up the app with sam local start-api and let’s do some curls.

Nice! Now let’s deploy it and try it out for real.

Not bad. Not bad at all. Now let’s take this show on the road!

Going Global

Now we’ve got an application, and it even has data that our users expect to be present. Now let’s go multi-region! There are a couple of different features that will underpin our ability to do this.

First is the API Gateway Regional Custom Domain. We need to use the same custom domain name in multiple regions, the edge-optimized custom domain won’t cut it for us since it uses CloudFront. The regional endpoint will work for us, though.

Next, we’ll hook those regional endpoints up to Route53 Latency Records in order to do closest-region routing and automatic failover.

Lastly, we need to way to synchronize our DynamoDB tables between our regions so we can keep those counters up-to-date. That’s where DynamoDB Global Tables come in to do their magic. This will keep identically-named tables in multiple regions in-sync with low latency and high accuracy. It uses DynamoDB Streams under the hood, and ‘last writer wins’ conflict resolution. Which probably isn’t perfect, but is good enough for most uses.

We’ve got a lot to get through here. I’m going to try to keep this as short and as clear as possible. If you want to jump right to the code, you can find it in the hello-sam-3 directory of the repo.

First things first, let’s add in our regional custom domain and map it to our API. Since we’re going to be using a custom domain name, we’ll need a Route53 Hosted Zone for a domain we control. I’m going to pass through the domain name and Hosted Zone Id via a CloudFormation parameter and use it below. When you deploy, you’ll need to supply your own values for these parameters.

Toss this at the top of template.yaml to define the parameters:

Now we can create our custom domain, provision a TLS certificate for it, and configure the base path mapping to add our API to the custom domain – put this in the Resources section:

That !Ref ServerlessRestApi references the implicit API Gateway that is created as part of the AWS::Serverless::Function Event object.

Next, we want to assign each regional custom domain to a specific Route53 record. This will allow us to perform latency-based routing and regional failover through the use of custom healthchecks. Let’s put in a few more resources:

The AWS::Route53::Record resource creates a DNS record and assigns it to a specific AWS region. When your users query for your record, they will get the value for the region closest to them. This record also has a AWS::Route53::HealthCheck attached to it. This healthcheck will check your regional endpoint every 30 seconds. If your regional endpoint has gone down, Route53 will stop considering that record when a user queries for your domain name.

Our Route53 Healthcheck is looking at /health on our API, so we’d better implement that if we want our service to stay up. Let’s just drop a stub healthcheck into app.js. For a real application you could perform dependency checks and stuff, but for this we’ll just return a 200:

The last piece, we, unfortunately, can’t control directly with CloudFormation; we’ll need to use regular AWS CLI commands. Since Global Tables span regions, it kind of makes sense. But before we can hook up the Global Table, each table needs to exist already.

Through the magic of Bash scripts, we can deploy to all of our regions and create the Global Table all in one go!

For a more idempotent (but more verbose) version of this script, check out hello-sam-3/deploy.sh.

Note: if you’ve never provisioned an ACM Certificate for your domain before, you may need to check your CloudFormation output for the validation CNAMEs.

And…that’s all there is to it. You have your multi-region app!

Let’s try it out

So let’s test it! How about we do this:

  1. Get a few hits in on our home region
  2. Fail the healthcheck in our home region
  3. Send a few hits to the next region Route53 chooses for us
  4. Fail back to our home region
  5. Make sure the counter continues at the number we expect

Cool, we’ve got some data. Let’s failover! The easiest way to do this is just to tell Route53 that up actually means down. Find the healthcheck id for your region using aws route53 list-health-checks and run:

Now let’s wait a minute for it to fail over and give it another shot.

Look at that, another region! And it started counting at 3. That’s awesome, our data was replicated. Okay, let’s fail back, you know the drill:

Give it a minute for the healthcheck to become healthy again and fail back. And now let’s hit the service a few more times:

Amazing. Your users will now automatically get routed to not just the nearest region, but the nearest healthy region. And all of the data is automatically replicated between all active regions with very low latency. This grants you a huge amount of redundancy, availability, and resilience to service, network, regional, or application failures.

Now not only can your app scale effortlessly through the use of serverless technologies, it can failover automatically so you don’t have to wake up in the middle of the night and find there’s nothing you can do because there’s a network issue that is out of your control – change your region and route around it.

Further Reading

I don’t want to take up too much more of your time, but here’s some further reading if you wish to dive deeper into serverless:

  • A great Medium post by Paul Johnston on Serverless Best Practices
  • SAM has configurations for safe and reliable deployment and rollback using CodeDeploy!
  • AWS built-in tools for serverless monitoring are lackluster at best, you may wish to look into external services like Dashbird or Thundra once you hit production.
  • ServerlessByDesign is a really great web app that allows you to drag, drop, and connect various serverless components to visually design and architect your application. When you’re done, you can export it to a working SAM or Serverless repository!

About the Author

Norm recently joined Chewy.com as a Cloud Engineer to help them start on their Cloud transformation. Previously, he ran the Cloud Engineering team at Cimpress. Find him on twitter @nromdotcom.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.