Auditing Bitbucket Server Data for Credentials in AWS

02. December 2018 2018 0

This article was originally published on the Sourced blog.

Introduction

Secrets management in public cloud environments continues to be a challenge for many organisations as they embrace the power of programmable infrastructure and the consumption of API-based services. All too often reputable companies will feature in the news, having fallen victim to security breaches or costly cloud resource provisioning through the accidental disclosure of passwords, API tokens or private keys.

Whilst the introduction of cloud-native services such as AWS Secrets Manager or third-party solutions like HashiCorp Vault provide more effective handling for this type of data, the nature of version control systems such as git provides a unique challenge in that the contents of old commits may contain valid secrets that could still be discovered and abused.

We’ve been engaged at a customer who has a large whole-of-business ‘self-service’ cloud platform in AWS, where deployments are driven by an infrastructure as code pipeline with code stored in git repositories hosted on a Atlassian Bitbucket server. Part of my work included identifying common, unencrypted secrets in the organisation’s git repositories and provide the business units responsible a way to easily identify and remediate these exposures.

Due to client restraints in time and resourcing, we developed a solution that leveraged our existing tooling as well as appropriate community-developed utilities to quickly and efficiently meet our customer’s requirements whilst minimising operational overhead.

In this blog post, we’ll walk through the components involved in allowing us to visualise these particular security issues and work to drive them towards zero exposure across the organisation.

Understanding Bitbucket Server

As mentioned above, our client leverages an AWS deployed instance of Atlassian Bitbucket Server to store and manage their git repositories across the group.

From the application side, the Bitbucket platform contains the following data characteristics that continues to grow every day:

  • 100Gb+ of git repository data
  • 1300+ repositories with more than 9000 branches
  • 200,000+ commits in just the master branches alone

As part of this deployment, EBS and RDS snapshots are created on a schedule to ensure that a point-in-time backup of the application is available ensuring that the service can be redeployed in the event of a failure, or to test software upgrades to the software against production-grade data.

When these snapshots are created, a tag containing a timestamp is created that allows quick identification of the most recent backup of the service by both humans and automated processes.

Auditing git repositories

When it comes to inspecting git repositories for unwanted data, one of the challenges is that it involves inspecting every commit in every branch of every repository.

Even though the Bitbucket platform provides a search capability in the web interface, it is limited in that it can only search for certain patterns in the master branch of the repository, in files smaller than a set size. In addition to this, the search API is private and is not advocated for use for non-GUI operations, further emphasised with the fact that it returns its results in HTML format.

Another challenge that we encountered was that heavy use of the code search API resulted in impaired performance on the Bitbucket server itself.

As such, we looked to the community to see what other tools might exist to help us sift through the data and identify any issues. During our search, we identified a number of different tools, each with their own capabilities and limitations.  Each of these are worthy of a mention and are detailed below:

After trying each of these tools out and understand their capabilities, we ended up selecting gitleaks for our use.

The primary reasons for its selection includes:

  • It is an open source security scanner for git repositories, actively maintained by Zach Rice;
  • Written in GO, gitleaks provides a fast individual repository scanning capability that comes with a set of pre-defined secrets identification patterns; and
  • It functions by parsing the contents of a cloned copy of a git repository on the local machine, which it then uses to examine all files and commits, returning the results in a JSON output file for later use.

The below example shows the output of gitleaks being run against a sample repository called “secretsdummy” that contains an unencrypted RSA private key file.

As you can see, gitleaks detects it in a number of commits and returns the results in JSON format to the output file /tmp/secretsdummy_leaks.json for later use.

Read the rest of this of this article on Sourced blog.