Code assistance for boto3, always up to date and in any IDE

21. December 2018 2018 0

If you’re like me and work with the boto3 sdk to automate your Ops, then you probably are familiar with this sight:

image1

No code completion! It’s almost as useful as coding in Notepad, isn’t it? This is one of the major quirks of the boto3 sdk. Due to its dynamic nature, we don’t get code completion like for other libraries like we are used to.

I used to deal with this by going back and forth with the boto3 docs. However, this impacted my productivity by interrupting my flow all the time. I had recently adopted Python as my primary language and had second thoughts on whether it was the right tool to automate my AWS stuff. Eventually, I even became sick of all the back-and-forth.

A couple of weeks ago, I thought enough was enough. I decided to solve the code completion problem so that I never have to worry about it anymore.

But before starting it, a few naysaying questions cropped up in my head:

  1. How would I find time to support all the APIs that the community and I want?
  2. Will this work be beneficial to people not using the X IDE?
  3. With 12 releases of boto3 in the last 15 days, will this become a full time job to continuously update my solution?

Thankfully, I found a lazy programmer’s solution that I could conceive in a weekend. I put up an open source package and released it on PyPI. I announced this on reddit and within a few hours, I saw this:

image4

Looks like a few people are going to find this useful! 🙂

In this post I will describe botostubs, a package that gives you code completion for boto3, all methods in all APIs. It even automatically supports any new boto3 releases.

Read on to learn a couple of less-used facilities in boto3 that made this project possible. You will also learn how I automated myself out of the job of maintaining botostubs by leveraging a simple deployment pipeline on AWS that costs about $0.05 per month to run.

What’s botostubs?

botostubs is a PyPI package, which you can install to give you code completion for any AWS service. You install it in your Python runtime using pip, add “import botostubs” to your scripts and a type hint for your boto3 clients and you’re good to go:

image18

Now, instead of “no suggestions”, your IDE can offer you something more useful like:

image17

The parameters in boto3 are dynamic too so what about them?

With botostubs, you can now get to know which parameters are supported and also which are required or optional:

image10

Much more useful, right? No more back-and-forth with the boto3 docs, yay!

The above is for Intellij/PyCharm but will this work in other IDEs?

Here are a couple of screenshots of botostubs running in Visual Studio Code:

image2

image3

Looks like it works! You should be able to use botostubs in any IDE that supports code completion from python packages.

Why is this an issue in the first place?

As I mentioned before, the boto3 sdk is dynamic, i.e the methods and APIs don’t exist as code. As it says in the guide,

It uses a data-driven approach to generate classes at runtime from JSON description files …

The SDK maintainers do it to be able to enhance the SDK reliably and faster. This is great for the maintainers but terrible for us, the end users of the SDK.

Therefore, we need statically defined classes and methods. Since boto3 doesn’t work that way, we need a separate solution.

How botostubs works

At a high level, we need a way to discover all the available APIs, find out about the method signatures and package them up as classes in a module.

  1. Get a boto3 session
  2. Loop over its available clients
  3. Find out about each client’s operations
  4. Generate class signatures
  5. Dump them in a Python module

I didn’t know much about boto3 internals before so I had to do some digging on how to accomplish that. You can use what I’ve learnt here if you’re interested in building tools on top of boto3.

First, about the clients. It’s easy when you already know which API you need, e.g with S3, you write:

client = boto3.client(‘s3’)

But for our situation, we don’t know which ones are there in advance. I could have hardcoded them but I need a scalable and foolproof way. I found out that the way to do that is with a session’s get_available_services() facility.

image19

Tip: Much of what I’ve learnt have been though Intellij’s debugger. Very handy especially when having to deal with dynamic code.

image13

For example, to learn what tricks are involved to get the dynamic code to convert to actual API calls to AWS, you can place a breakpoint in _make_api_call found in boto3’s client.py:

image14

Steps 1 and 2 solved. Next, I had to find out which operations are possible in a scalable fashion. For example, the S3 API supports about 98 operations for listing objects, uploading and downloading them. Coding 98 operations is no fun, so I’m forced to get creative.

Digging deeper, I found out that clients have an internal botocore’s service model that had everything that I was looking for. Through the service model you can find the service documentation, api version, etc.

Side note: botocore is a factored out library that is shared with the AWS CLI. Much of what boto3 is capable is actually powered by botocore.

In particular, we can read the available operation names. E.g the service model for the ACM api returns:

image6

Step 3 was therefore solved with:

image23

Next, we need to know what parameters are available for each operation. In boto parlance, they are called “input shapes”. (Similarly, you can get the output shape if needed) Digging some more in the service model source, I found out that we can get the input shape with the operation model:

image12

This told me the required and optional parameters. The missing part of generating the method signatures was then solved. (I don’t need the method body since I’m generating stubs)

Then it was a matter of generating classes based on the clients and operations above and package them in a Python module.

For any version of boto, I had to run my script, run the twine PyPI utility and it will output a PyPI package that’s up to date with upstream boto3. All of that took about 100 lines of Python code.

Another problem remained to be solved though; with a new boto3 release every time you change your t-shirt, I would need to run it and re-upload to PyPI several times a week. So, wouldn’t this become a maintenance hassle for me?

The deployment pipeline

To solve this problem, I looked to AWS itself. The simplest way I found out was to use their build tool and invoke it on a schedule. What I want is a way to get the latest boto3 version, run the script and upload the artefact to PyPI. All without my intervention.

The relevant AWS services to achieve this is Cloudwatch Events (to trigger other services on a schedule), CodeBuild (managed build service in the cloud) and SNS (for email notifications). This is what the architecture looks like on AWS:

image20

Image generated with viz-cfn

The image above describes the CloudFormation template used to deploy on Github as well as the code.

The AWS Codebuild Project looks like this:

image5

image15

To keep my credentials outside of source control, I also attached a service role to give CodeBuild permissions to write logs and read my PyPI username and password from the Systems Manager parameter store.

I also enabled the build badge feature so that I can show the build status on Github:

image7

 

 

For intricate details, check out the buildspec.yml and the project definition.

I want this project to be invoked on a schedule (I chose every 3 days) and I can accomplish that with a Cloudwatch Event Rule:

image11

When the rule gets triggered, I see that my codebuild project does what it needs to do; clone the git repo, generate the stubs and upload to PyPI:

image21

This whole process is done in about 25 seconds. Since this is entirely hands off, I needed some way to be kept in the loop. After the build has run, there’s another Cloudwatch Event which gets triggered for build events on the project. It sends a notification to SNS which in turns sends me an email to let me know if everything went OK:

image9

The build event and notification.

image8

The SNS Topic with an email subscription.

That’s it! But what about my AWS bill? My estimate is that it should be around $0.05 every month. Besides, it will definitely not break the bank, so I’m pretty satisfied with it! Imagine how much it would cost to maintain a build server on your own to accomplish all of that.

What’s with the weird versioning?

You will notice botostubs versions look like this:

image22

It currently follows boto3 releases in the format 0.4.x.y.z. Therefore, if botostubs is currently at 0.4.1.9.61, then it means that it will offer whatever is available in boto3 version 1.9.61. I included the boto version in mine to make it more obvious what version of boto3 that botostubs was generated from but also because PyPI does not allow uploads at the same version number.

Are people using it?

According to pypistats.org, botostubs has been downloaded about 600 times in its initial week after I showed it to the reddit community. So it seems that it was a tool well needed:

image16

Your turn

If this sounds that something that you’ll need, get started by running:

pip install botostubs

Run it and let me know if you have any advice on how to make this better.

Credit

Huge thanks goes to another project called pyboto3 for the original idea. The issues that I had with it was that it was unmaintained and supported legacy Python only. I wouldn’t have known that this would be possible were it not for pyboto3.

Open for contribution

botostubs is an open source project, so feel free to send your pull requests.

A couple of areas where I’ll need some help:

  • Support Python < 3.5
  • Support boto3 high-level resources (as opposed to just low-level clients)

Summary

In this article, I’ve shared my process for developing botostubs through examining the internals of boto3 and automate its maintenance with a deployment pipeline that handles all the grunt work. If you like it, I would appreciate it if you share it with a fellow Python DevOps engineer

https://pypi.org/project/botostubs/.

I hope you are inspired to find solutions for AWS challenges that are not straightforward and share them with the community.

If you used what you’ve learnt above to build something new, let me know, I’d love to take a look! Tweet me @jeshan25.

About the Author

Jeshan Babooa is an independent software developer from Mauritius. He is passionate about all things infra automation on AWS especially with tools like Cloudformation and Lambda. He is the guy behind LambdaTV, a youtube channel dedicated to teaching serverless on AWS. You can reach him on Twitter @jeshan25.

About the Editors

Ed Anderson is the SRE Manager at RealSelf, organizer of ServerlessDays Seattle, and occasional public speaker. Find him on Twitter at @edyesed.

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


Paginating AWS API Results using the Boto3 Python SDK

21. December 2016 2016 0

Author: Doug Ireton

Boto3 is Amazon’s officially supported AWS SDK for Python. It’s the de facto way to interact with AWS via Python.

If you’ve used Boto3 to query AWS resources, you may have run into limits on how many resources a query to the specified AWS API will return, generally 50 or 100 results, although S3 will return up to 1000 results. The AWS APIs return “pages” of results. If you are trying to retrieve more than one “page” of results you will need to use a paginator to issue multiple API requests on your behalf.

Introduction

Boto3 provides Paginators to automatically issue multiple API requests to retrieve all the results (e.g. on an API call toEC2.DescribeInstances). Paginators are straightforward to use, but not all Boto3 services provide paginator support. For those services you’ll need to write your own paginator in Python.

In this post, I’ll show you how to retrieve all query results for Boto3 services which provide Pagination support, and I’ll show you how to write a custom paginator for services which don’t provide built-in pagination support.

Built-In Paginators

Most services in the Boto3 SDK provide Paginators. See S3 Paginators for example.

Once you determine you need to paginate your results, you’ll need to call the get_paginator() method.

How do I know I need a Paginator?

If you suspect you aren’t getting all the results from your Boto3 API call, there are a couple of ways to check. You can look in the AWS console (e.g. number of Running Instances), or run a query via the aws command-line interface.

Here’s an example of querying an S3 bucket via the AWS command-line. Boto3 will return the first 1000 S3 objects from the bucket, but since there are a total of 1002 objects, you’ll need to paginate.

Counting results using the AWS CLI

Here’s a boto3 example which, by default, will return the first 1000 objects from a given S3 bucket.

Determining if the results are truncated

The S3 response dictionary provides some helpful properties, like IsTruncated, KeyCount, and MaxKeys which tell you if the results were truncated. If resp['IsTruncated'] is True, you know you’ll need to use a Paginator to return all the results.

Using Boto3’s Built-In Paginators

The Boto3 documentation provides a good overview of how to use the built-in paginators, so I won’t repeat it here.

If a given service has Paginators built-in, they are documented in the Paginators section of the service docs, e.g.AutoScaling, and EC2.

Determine if a method can be paginated

You can also verify if the boto3 service provides Paginators via the client.can_paginate() method.


So, that’s it for built-in paginators. In this section I showed you how to determine if your API results are being truncated, pointed you to Boto3’s excellent documentation on Paginators, and showed you how to use the can_paginate() method to verify if a given service method supports pagination.

If the Boto3 service you are using provides paginators, you should use them. They are tested and well documented. In the next section, I’ll show you how to write your own paginator.

How to Write Your Own Paginator

Some Boto3 services, such as AWS Config don’t provide paginators. For these services, you will have to write your own paginator code in Python to retrieve all the query results. In this section, I’ll show you how to write your own paginator.

You Might Need To Write Your Own Paginator If…

Some Boto3 SDK services aren’t as built-out as S3 or EC2. For example, the AWS Config service doesn’t provide paginators. The first clue is that the Boto3 AWS ConfigService docs don’t have a “Paginators” section.

The can_paginate Method

You can also ask the individual service client’s can_paginate method if it supports paginating. For example, here’s how to do that for the AWS config client. In the example below, we determine that the config service doesn’t support paginating for the get_compliance_details_by_config_rule method.

Operation Not Pageable Error

If you try to paginate a method without a built-in paginator, you will get an error similar to this:

If you get an error like this, it’s time to roll up your sleeves and write your own paginator.

Writing a Paginator

Writing a paginator is fairly straightforward. When you call the AWS service API, it will return the maximum number of results, and a long hex string token, next_token if there are more results.

Approach

To create a paginator for this, you make calls to the service API in a loop until next_token is empty, collecting the results from each loop iteration in a list. At the end of the loop, you will have all the results in the list.

In the example code below, I’m calling the AWS Config service to get a list of resources (e.g. EC2 instances), which are not compliant with the required-tags Config rule.

As you read the example code below, it might help to read the Boto3 SDK docs for theget_compliance_details_by_config_rule method, especially the “Response Syntax” section.

Example Paginator

Example Paginator – main() Method

In the example above, the main() method creates the config client and initializes the next_token variable. Theresources list will hold the final results set.

The while loop is the heart of the paginating code. In each loop iteration, we call theget_compliance_details_by_config_rule method, passing next_token as a parameter. Again, next_token is a long hex string returned by the given AWS service API method. It’s our “claim check” for the next set of results.

Next, we extract the current_batch of AWS resources and the next_token string from the compliance_detailsdictionary returned by our API call.

Example Paginator – get_resources_from() Helper Method

The get_resources_from(compliance_details) is an extracted helper method for parsing the compliance_detailsdictionary. It returns our current batch (100 results) of resources and our next_token “claim check” so we can get the next page of results from config.get_compliance_details_by_config_rule().

I hope the example is helpful in writing your own custom paginator.


In this section on writing your own paginators I showed you a Boto3 documentation example of a service without built-in Paginator support. I discussed the can_paginate method and showed you the error you get if you call it on a method which doesn’t support pagination. Finally, I discussed an approach for writing a custom paginator in Python and showed a concrete example of a custom paginator which passes the NextToken “claim check” string to fetch the next page of results.

Summary

In this post, I covered Paginating AWS API responses with the Boto3 SDK. Like most APIs (Twitter, GitHub, Atlassian, etc) AWS paginates API responses over a set limit, generally 50 or 100 resources. Knowing how to paginate results is crucial when dealing with large AWS accounts which may contain thousands of resources.

I hope this post has taught you a bit about paginators and how to get all your results from the AWS APIs.

About the Author

Doug Ireton is a Sr. DevOps engineer at 1Strategy, an AWS Consulting Partner specializing in Amazon Web Services (AWS). He has 23 years experience in IT, working at Microsoft, Washington Mutual Bank, and Nordstrom in diverse roles from testing, Windows Server engineer, developer, and Chef engineer, helping app and platform teams manage thousands of servers via automation.