Hacking together an Alexa skill

24. December 2018 2018 0

Alexa is an Amazon technology that allows users to build voice-driven applications. Amazon takes care of converting voice input to text and vice versa, provisioning the software to devices, and calling into your business logic. You use the Amazon interface to build your model and provide the logic that executes based on the text input. The combination of the model and the business logic is an Alexa skill. Alexa skills run on a variety of devices, most prominently the Amazon Echo.

I built an Alexa skill as a proof of concept for a hackathon; I had approximately six hours to build something to demo. My goals for the proof of concept were to:

  • Run without error in the simulator
  • Pull data from a remote service using an AWS lambda function
  • Take input from a user
  • Respond to that input

I was working with horse racing data because it is both timely and I had access to an API that provided interesting data. Horse races happen at a specific track on a specific date and time. Each race has a number of horses that compete.

The flow of my Alexa skill was:

  • Notify Alexa to call into the custom skill using a phrase.
  • Prompt the user to choose a track from one of N provided by me.
  • Store the value of the track for future operations.
  • Store the track name in the session.
  • Prompt the user to choose between two sets of information that might be of use: the number of races today or the date of the next featured race.
  • Return the requested information.
  • Exit the skill, which meant that Alexa was no longer listening for voice input.

The barriers to entry of creating a proof of concept are low. If you can write a python script and navigate around the AWS console, you can write an Alexa skill. However, there are some other considerations that I didn’t have to work through because this was a hackathon project. Tasks including UX, testing, and deployment to a device would be crucial to any production project, however.

Jargon and Terminology

Like any other technology, Alexa has its own terminology. And there’s a lot of it.

A skill is a package of a model to convert voice to text and vice versa as well as business logic to execute against the text input. A skill is accessed by a phrase the user says, like “listen to NPR” or “talk horse racing.” This phrase is an “invocation.” The business logic is written by you, but the Alexa service handles the voice to text and text to voice conversion.

A skill is made of up of one or more intents. An intent is one branch of logic and is also identified by a phrase, called an utterance. While invocations need to be globally unique, utterances only trigger after a skill is invoked so the phrasing can overlap between different skills. An example intent utterance would be “please play ‘Fresh Air’” or “my favorite track is Arapahoe Park.” You can also use a prepackaged intent, such as one that returns search results, and tie that to an utterance. Utterances are also called samples.

Slots are placeholders within utterances. If the intent phrase is “please play ‘Fresh Air’” you can parameterize the words ‘Fresh Air’ and have that phrase converted to text and delivered to you. A slot is basically a multiple choice selection, so you can provide N values and have the text delivered to you. Each slot has a defined data type. It was unclear to me what happens when a slot is filled with a value that is not one of your N values. (I didn’t get a chance to test that.)

A session is a place for your business logic to store temporary data between invocations of intents. A session is tied both to an application and a user (more info here). Sessions stay around for about the length of time a user is interacting with your application. Depending on application settings it will be about 30 seconds. If you need to store data for longer, connect your business logic to a durable storage solution like S3 or DynamoDB.

Getting started

To get started, I’d suggest using this tutorial as a foundation. If you are looking for a different flow, look at this set of tutorials and see if any of them match your problem space. All of them walk you through creating an Alexa skill using a python lambda function. It’s worth noting that you’ll have to sign up with an Amazon account for access to the Alexa console (it’s a different authentication system than AWS IAM). I’d also start out using a lambda function to eliminate a possible complication. If you use lambda, you don’t have to worry about making sure Alexa can access your https endpoint (the other place your business logic can reside).

Once you have the tutorial working in the Alexa console, you can start extending the main components of the Alexa skill: the model or the business logic.

Components

You configure the model using the Alexa console and the Alexa skills kit or via the CLI or skills kit API. In either case, you’re going to end up with a JSON configuration file with information about the invocation phrase, the intents and the slots. You also can trigger a model build and test your model using a browser when using the console, as long as you have a microphone.

Here are selected portions of the JSON configuration file for the Alexa skill I created. You can see this was a proof of concept as I didn’t delete the color scheme from the tutorial and only added two tracks that the user can select as their favorite.

{  
   "interactionModel":{  
      "languageModel":{  
         "invocationName":"talk horse racing",
         "intents":[  
            {  
               "name":"MyColorIsIntent",
               "slots":[  
                  {  
                     "name":"TrackName",
                     "type":"TrackNameType"
                  }
               ],
               "samples":[  
                  "my favorite track is {TrackName}"
               ]
            },
            {  
               "name":"AMAZON.HelpIntent",
               "samples":[  

               ]
            },
            {  
               "name":"HowManyRaces",
               "slots":[  

               ],
               "samples":[  
                  "how many races"
               ]
            },
            {  
               "name":"NextStakesRace",
               "slots":[  

               ],
               "samples":[  
                  "when is the stakes race",
                  "when is the next stakes race"
               ]
            }
         ],
         "types":[  
            {  
               "name":"LIST_OF_COLORS",
               "values":[  
                  {}
               ]
            },
            {  
               "name":"TrackNameType",
               "values":[  
                  {  
                     "name":{  
                        "value":"Arapahoe Park"
                     }
                  },
                  {  
                     "name":{  
                        "value":"Tampa Bay Downs"
                     }
                  }
               ]
            }
         ]
      }
   }
}

The other component of the system is business logic. This can either be an AWS Lambda, written in any language supported by that service, or service that responds to an HTTPS request. That could be useful in leveraging existing code or data, not in AWS. If you use Lambda, you can deploy the skill just like any other Lambda, which means you can leverage whatever lifecycle, frameworks or testing solution you use for other Lambda functions. Using a non-AWS Lambda solution requires a bit more work when processing a request, but it can be done.

The business logic I wrote for this was basically hacked tutorial code. The first section is the lambda handler. Below is a relevant snippet where we examine the event passed to the lambda function by the Alexa system and call the appropriate business method.

def lambda_handler(event, context):

    if event['session']['new']:

       on_session_started({'requestId': event['request']['requestId']},

                         event['session'])

    if event['request']['type'] == "LaunchRequest":

       return on_launch(event['request'], event['session'])

    elif event['request']['type'] == "IntentRequest":

       return on_intent(event['request'], event['session'])

   elif event['request']['type'] == "SessionEndedRequest":

       return on_session_ended(event['request'], event['session'])

...

on_intent is the logic dispatcher which retrieves the intent name and then calls the appropriate internal function.

def on_intent(intent_request, session):

   """ Called when the user specifies an intent for this skill """

   print("on_intent requestId=" + intent_request['requestId'] +

         ", sessionId=" + session['sessionId'])

   intent = intent_request['intent']

   intent_name = intent_request['intent']['name']

    if intent_name == "MyColorIsIntent":

       return set_color_in_session(intent, session)

…

    elif intent_name == "HowManyRaces":

       return get_how_many_races(intent, session)

…

Each business logic function can be independent and could call into different services if need be.

def get_how_many_races(intent, session):

   session_attributes = {}

   reprompt_text = None

    # Setting reprompt_text to None signifies that we do not want to reprompt

    # the user. If the user does not respond or says something that is not

    # understood, the session will end.

    if session.get('attributes', {}) and "favoriteColor" in session.get('attributes', {}):

       favorite_track = session['attributes']['favoriteColor']

       speech_output = "There are " + get_number_races(favorite_track) + " races at " +favorite_track + " today. Thank you, good bye."

       should_end_session = True

   else:

       speech_output = "Please tell me your favorite track by saying, " \

                   "my favorite track is Arapahoe Park"

       should_end_session = False

   

   return build_response(session_attributes, build_speechlet_response(

       intent['name'], speech_output, reprompt_text, should_end_session))

build_response is directly from the sample code and creates a correctly formatted string response. This response will be interpreted by Alexa and converted into speech.

def build_response(session_attributes, speechlet_response):

   return {

       'version': '1.0',

       'sessionAttributes': session_attributes,

       'response': speechlet_response

    }

Based on the firm foundation of the tutorial, you can easily add more slots, intents and change the invocation. You also can build out additional business logic that can respond to the additional voice input.

Testing

I tested my skill manually using the built-in simulator in the Alexa console. I tried other simulators, but they were not effective. At the bottom of the python tutorial mentioned above, there is a reference to echosim.io, which is an online Alexa skill simulator; I couldn’t seem to make it work.

After each model change (new or modified utterances, intents or invocations) you will need to rebuild the model (approximately 30-90 seconds, depending on the complexity of your model). Changing the business logic does not require rebuilding the model, and you can use that functionality to iterate more quickly.

I did not investigate automated testing. If I were building a production Alexa skill, I’d add a small layer of indirection so that the business logic could be easily unit tested, apart from any dependencies on Alexa objects. I’d also plan to build a CI/CD pipeline so that changes to the model or the lambda function, something like what is outlined here.

User Experience (UX)

Voice UX is very different from the UX of desktop or a mobile device. Because information transmission is slow, it’s even more important to think about voice UX for an Alexa skill than it would be if you were building a more traditional web-based app. If you are building a skill for any other purpose than exploration or proof of concept, make sure to devote some time to learning about voice UX. This webinar appears useful.

Some lessons I learned:

  • Don’t go too deep in navigation level. With Alexa, you can provide choice after choice for the user, but remember the last time you dealt with an interactive phone voice recognition system. Did you like it? Keep interactions short.
  • Repeat back what Alexa “heard” as this gives the user another chance to correct course.
  • Offer a help option. If I were building a production app, I’d want to get some kind of statistics on how often the help option was invoked to see if there was an oversight on my part.
  • Think about error handling using reprompts. If the skill hasn’t received input, it can reprompt and possibly get more user input.

After the simulator

A lot of testing and development can take place in the Amazon Alexa simulator. However, at some point, you’ll need to deploy to a device. Since this was a proof of concept, I didn’t do that, but there is documentation on that process here.

Conclusion

This custom Alexa skill was the result of a six-hour exploration during a company hackfest. At the end of the day, I had a demo I could run on the Alexa Simulator. Alexa is mainstream enough that it makes sense for anyone who works with timely, textual information to evaluate building a skill, especially since a prototype can be built relatively quickly. For instance, it seems to me that a newspaper should have an Alexa skill, but it doesn’t make as much sense for an e-commerce store (unless you have certain timely information and a broad audience) because complex navigation is problematic. Given the low barrier to entry, Alexa skills are worth exploring as this new UX for interacting with computers becomes more prevalent.

About the Author

Dan Moore is director of engineering at Culture Foundry. He is a developer with two decades of experience, former AWS trainer, and author of “Introduction to Amazon Machine Learning,” a video course from O’Reilly. He blogs at http://www.mooreds.com/wordpress/ . You can find him on Twitter at @mooreds.

About the Editors

Ed Anderson is the SRE Manager at RealSelf, organizer of ServerlessDays Seattle, and occasional public speaker. Find him on Twitter at @edyesed.

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


Deploy a Secure Static Site with AWS & Terraform

14. December 2018 2018 0

There are many uses for static websites. A static site is the simplest form of website, though every website consists of delivering HTML, CSS and other resources to a browser. With a static website, initial page content is delivered the same to every user, regardless as to how they’ve interacted with your site previously. There’s no database, authentication or anything else associated with sending the site to the user – just a straight HTTPS connection and some text content. This content can benefit from caching on servers closer to its users for faster delivery; it will generally also be lower cost as the servers to deliver this content to not themselves need to interpret scripting languages or make database connections on behalf of the application.

The static website now has another use, as there are more tools to provide highly interactive in-browser applications based on JavaScript frameworks (such as React, Vue or Angular) which manage client interaction, maintain local data and interact with the web service via small but often frequent API calls. These systems decouple front-end applications from back-end services and allow those back-ends to be written in multiple languages or as small siloed applications, often called microservices. Microservices may take advantage of modern back-end technologies such as containers (via Docker and/or Kubernetes) and “serverless” providers like AWS Lambda.

People deploying static sites fall into these two very different categories – for one the site is the whole of their business, for the other the static site is a very minor part supporting the API. However, each category of static site use still shares similar requirements. In this article we explore deploying a static site with the following attributes:

  • Must work at the root domain of a business, e.g., example.com
  • Must redirect from the common (but unnecessary) www. subdomain to the root domain
  • Must be served via HTTPS (and upgrade HTTP to HTTPS)
  • Must support “pretty” canonical URLs – e.g., example.com/about-us rather than example.com/about-us.html
  • Must not cost anything when not being accessed (except for domain name costs)

AWS Service Offerings

We achieve these requirements through use of the following AWS services:

  • S3
  • CloudFront
  • ACM (Amazon Certificate Manager)
  • Route53
  • Lambda

This may seem like quite a lot of services to host a simple static website; let’s review and summarise why each item is being used:

  • S3 – object storage; allows you to put files in the cloud. Other AWS users or AWS services may be permitted access to these files. They can be made public. S3 supports website hosting, but only via HTTP. For HTTPS you need…
  • CloudFront – content delivery system; can sit in front of an S3 bucket or a website served via any other domain (doesn’t need to be on AWS) and deliver files from servers close to users, caching them if allowed. Allows you to import HTTPS certificates managed by…
  • ACM – generates and stores certificates (you can also upload your own). Will automatically renew certificates which it generates. For generating certificates, your domain must be validated via adding custom CNAME records. This can be done automatically in…
  • Route53 – AWS nameservers and DNS service. R53 replaces your domain provider’s nameservers (at the cost of $0.50 per month per domain) and allows both traditional DNS records (A, CNAME, MX, TXT, etc.) and “alias” records which map to a specific other AWS service – such as S3 websites or CloudFront distributions. Thus an A record on your root domain can link directly to Cloudfront, and your CNAMEs to validate your ACM certificate can also be automatically provisioned
  • Lambda – functions as a service. Lambda lets you run custom code on events, which can come directly or from a variety of other AWS services. Crucially you can put a Lambda function into Cloudfront, manipulating requests or responses as they’re received from or sent to your users. This is how we’ll make our URLs look nice

Hopefully, that gives you some understanding of the services – you could cut out CloudFront and ACM if you didn’t care about HTTPS, but there’s a worldwide push for HTTPS adoption to provide improved security for users and browsers including Chrome are marking pages not served via HTTPS as “insecure” as part of their commitment.

All this is well and good, but whilst AWS is powerful their console leaves much to be desired, and setting up one site can take some time – replicating it for multiple sites is as much an exercise in memory and box ticking as it is in technical prowess. What we need is a way to do this once, or even better have somebody else do this once, and then replicate it as many times as we need.

Enter Terraform from HashiCorp

One of the most powerful parts of AWS isn’t clear when you first start using the console to manage your resources. AWS has a super powerful API that drives pretty much everything. It’s key to so much of their automation, to the entirety of their security model and tools, tools like Terraform.

Terraform from HashiCorp is “Infrastructure-as-Code” or IaC. It lets you define resources on a variety of cloud providers and then run commands to:

  • Check the current state of your environment
  • Make required changes such that your actual environment matches the code you’ve written

In code form, Terraform uses blocks of code called resources:

resource “aws_s3_bucket” “some-internal-reference” {
  bucket = “my-bucket-name”
}

Each resource can include variables (documented on the provider’s website), and these can be text, numbers, true/false, lists (of the above) or maps (basically like subresources with their variables).

Terraform is distributed as pre-built binaries (it’s also open source, written in Go so you can build it yourself) that you can run simply by downloading, making them executable and then executing them. To work with AWS, you need to define a “provider” which is formatted similarly to a resource:

provider “aws” {
}

To run any AWS API (via command line, terraform or a language of your choice) you’ll need to generate an access key and secret key for the account you’d like to use. That’s beyond the scope of this article, but given you should also avoid hardcoding those credentials into Terraform, and given you’d be very well served to have access to it, skip over to the AWS CLI setup instructions and set this up with the correct keys before continuing.

(NB: in this step you’re best provisioning an account with admin rights, or at least full access to IAM, S3, Route53, Cloudfront, ACM & Lambda. However don’t be tempted to create access keys for your root account – AWS recommends against this)

Now that you’ve got your system set up to use AWS programmatically, installed Terraform and been introduced to the basics of its syntax it’s a good time to look at our code on GitHub.

Clone the repository above; you’ll see we have one file in the root (main.tf.example) and then a directory called modules. One of the best parts of Terraform is modules and how they behave. Modules allow one user to define a specific set of infrastructure that may either relate directly to each other or interact by being on the same account. These modules can define variables allowing some aspects (names, domains, tags) to be customised, whilst other items that may be necessary for the module to function (like a certain configuration of a CloudFront distribution) are fixed.

To start off run bash ./setup which will copy the example file to main.tf and also ensure your local Terraform installation has the correct providers (AWS and file archiving) as well as set up the modules. In main.tf then you’ll see a suggested set up using three modules. Of course, you’d be free to just remove main.tf entirely and use each module in its own right, but for this tutorial, it helps to have a complete picture.

At the top of the main.tf file are defined three variables which you’ll need to fill in correctly:

  1. The first is the domain you wish to use – it can be your root domain (example.com) or any sort of subdomain (my-site.example.com).
  2. Second, you’ll need the Zone ID associated with your domain on Route 53. Each Route 53 domain gets a zone ID which relates to AWS’ internal domain mapping system. To find your Zone ID visit the Route53 Hosted Zones page whilst signed in to your AWS account and check the right-hand column next to the root domain you’re interested in using for your static site.
  3. Finally choose a region; if you already use AWS you may have a preferred region, otherwise, choose one from the AWS list nearest to you. As a note, it’s generally best to avoid us-east-1 where possible, as on balance this tends to have more issues arise due to its centrality in various AWS services.

Now for the fun part. Run terraform plan – if your AWS CLI environment is set up the plan should execute and show the creation of a whole list of resources – S3 Buckets, CloudFront distributions, a number of DNS records and even some new IAM roles & policies. If this bit fails entirely, check that the provider entity in main.tf is using the right profile name based on your ~/.aws/credentials file.

Once the plan has run and told you it’s creating resources (it shouldn’t say updating or destroying at this point), you’re ready to go. Run terraform apply – this basically does another plan, but at the end, you can type yes to have Terraform create the resources. This can take a while as Terraform has to call various AWS APIs and some are quicker than others – DNS records can be slightly slower, and ACM generation may wait until it’s verified DNS before returning a positive response. Be patient and eventually it will inform you that it’s finished, or tell you if there have been problems applying.

If the plan or apply options have problems you may need to change some of your variables based on the following possible issues:

  • Names of S3 buckets should be globally unique – so if anyone in the world has a bucket with the name you want, you can’t have it. A good system is to prefix buckets with your company name or suffix them with random characters. By default, the system names your buckets for you, but you can override this.
  • You shouldn’t have an A record for your root or www. domain already in Route53.
  • You shouldn’t have an ACM certificate for your root domain already.

It’s safe (in the case of this code at least) to re-run Terraform if problems have occurred and you’ve tried to fix them – it will only modify or remove resources it has already created, so other resources on the account are safe.

Go into the AWS console and browse S3, CloudFront, Route53 and you should see your various resources created. You can also view the Lambda function and ACM but be aware that for the former you’ll need to be in the specific region you chose to run in, and for the latter, you must select us-east-1 (N. Virginia)

What now?

It’s time to deploy a website. This is the easy part – you can use the S3 console to drag and drop files (remember to use the website bucket and not the logs or www redirect buckets), use awscli to upload yourself (via aws s3 cp or aws s3 sync) or run the example bash script provided in the repo which takes one argument, a directory of all files you want to upload. Be aware – any files uploaded to your bucket will immediately be public on the internet if somebody knows the URL!

If you don’t have a website, check the “example-website” directory – running the bash script above without any arguments will deploy this for you. Once you’ve deployed something, visit your domain and all being well you should see your site. Cloudfront distributions have a variable time to set up so in some cases it might be 15ish minutes before the site works as expected.

Note also that CloudFront is set to cache files for 5 minutes; even a hard refresh won’t reload resource files like CSS or JavaScript as Cloudfront won’t go and fetch them again from your bucket for 5 minutes after first fetching them. During development you may wish to turn this off – you can do this in the CloudFront console, set the TTL values to 0. Once you’re ready to go live, run terraform apply again and it will reconfigure Cloudfront to recommended settings.

Summary

With a minimal amount of work we now have a framework that can deploy a secure static site to any domain we choose in a matter of minutes. We could use this to deploy websites for marketing clients rapidly, publish a blog generated with a static site builder like Jekyll, or use it as the basis for a serverless web application using ReactJS delivered to the client and a back-end provided by AWS Lambda accessed via AWS API Gateway or (newly released) an AWS Application Load Balancer.

About the Author

Mike has been working in web application development for 10 years, including 3 years managing a development team for a property tech startup and before that 4 years building a real time application for managing operations at skydiving centres, as well as some time freelancing. He uses Terraform to manage all the AWS infrastructure for his current work and has dabbled in other custom AWS tools such as an improvement to the CloudWatch logging agent and a deployment tool for S3. You can find him on Twitter @m1ke and GitHub.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.