Author: Andrew Langhorn
Editors: Dean Wilson
The public cloud provides many benefits to organizations both large and small: you pay only for what you use, you can fine-tune and tailor your infrastructure to suit your needs, and change these configurations easily on-the-fly, and you can bank on a reliable and scalable infrastructure underpinning it all that’s managed on your behalf. Increasingly, organizations are moving to using public cloud, and in many cases, the death knell for the corporate data centre is beginning to sound.
In this post, we will discuss how you can architect your applications in the AWS cloud to be as secure as possible, and to look at some often-underused features and little-known systems that can help you do this. We’ll also consider the importance of building security into your application infrastructure from the start.
Build everything in a VPC
A VPC, or Virtual Private Cloud, is a virtual, logically-separate section of the AWS cloud in which you define the architecture of your network as if it were a physical one, with consideration for how you connect to and from other networks.
This gives you an extraordinary amount of control over how your packets flow, both inside your network and outside of it. Many organizations treat their AWS VPCs as extensions of on-premise data centers, or other public clouds. To this end, given the number of advantages, you’ll definitely want to be at least looking at housing your infrastructure in a VPC from the get-go.
AWS accounts created since 2013 have had to make use of a VPC, even if you’re not defining your own; AWS will spin all of your VPC-supported infrastructure up inside a ‘default’ VPC which you’re assigned when you create your account. But, this default VPC isn’t great — ideally, you want VPCs which correspond with how you’ll use your account: maybe per-environment, per-application or for use in a blue/green scenario.
Therefore, not only make use of VPC, but define your VPCs yourself. They’re fairly straightforward to use; for instance, the aws_vpc resource in Hashicorp’s Terraform tool requires only a few parameters, to quickly instantiate a VPC in entirety.
Subnets are a way of logically dividing a network you’ve defined and created, and are a mainstay of networking. They allow different bits of your network to be separated away from each other. Permissions for traffic flow between subnets is managed by other devices, such as firewalls. In no way are subnets an AWS-specific thing. Regardless, they’re extremely useful.
Largely speaking, there are two types of subnet: public and private. As you might guess, a public subnet is one that can be addressed by IP addresses which can be announced to the world as existing, and a private subnet is one that can only addressed by IPs defined in RFC1918.
Perhaps next time you build some infrastructure, consider instead having everything you can in a private subnet, and using your public subnet as a DMZ. I like to treat my public subnets in this way, and use them only for things that are throw-away, that can be re-built easily, and which don’t have any direct access to any sensitive data: yes, that involves creating security group rules, updating ACLs and such, but the ability to remove any direct access at a deep level is so fundamental to the ways in which the internet works adds to my belief that securing stacks in an onion-like fashion (defense-in-depth) is the best way to do it. An often-followed pattern is to use Elastic Load Balancers in the public subnets, and EC2 Auto-Scaling Groups in the private ones. Routing between the two subnets is handled by the Elastic Load Balancers, and routing egress from the private subnet can be handled by a NAT Gateway.
Inside VPCs, you can use route tables to control the path packets take over IP from source to destination, much as you can on almost any other internet-connected device. Routing tables in AWS are no different to those outside AWS. One thing they’re very useful for, and we’ll come back to this later, is for helping route traffic for S3 over a private interface to S3, or for enforcing separation of concerns at a low IP-based level, helping you meet compliance and regulation requirements.
Inside a VPC, you are able to define a Flow Log, which captures details about the packets traveling across your network, and dumps them in to Cloudwatch Logs for you to scrutinize at a later date, or stream to S3, Redshift, the Elasticsearch Service or elsewhere using a service such as Kinesis Firehose.
Security groups work just like stateful ingress and egress firewalls. You define a group, add some rules for ingress traffic, and some more for egress traffic, and watch your packets flow. By default, they deny access to any ingress traffic and allow all egress traffic, which means that if you don’t set security groups up, you won’t be able to get to your AWS infrastructure.
It’s possible, and entirely valid, to create a rule to allow all traffic on all protocols both ingress and egress, but in doing so, you’re not really using security groups but working around them. They’re your friend: they can help you meet compliance regulations, satisfy your security-focused colleagues and are – largely – a mainstay and a staple of networking and running stuff on the internet. At least, by default, you can’t ignore them!
If you’re just starting out, consider using standard ports for services living in the AWS cloud. You can enable DNS resolution at a VPC level, and use load balancers as described below, to help you use the same ports for your applications across your infrastructure, helping simplify your security groups.
Note that there’s a limit on the number of security group rules you can have – the combined total of rules and groups cannot exceed 250. So, that’s one group with 250 rules, 250 groups with one rule, or any mixture thereof. Use them wisely, and remember that you can attach the same group to multiple AWS resources. One nice pattern is to create a group with common rules – such as Bastion host SSH ingress, monitoring services ingress etc. – and attach it to multiple resources. That way, changes are applied quickly, and you’re using security groups efficiently.
Once you’ve got your security groups up and running, and traffic’s flowing smoothly, take a look at network ACLs, which work in many ways as a complement to security groups: they act at a lower-level, but reinforce the rules you’ve previously created in your security groups, and are often used to explicitly deny traffic. Take a look at adding them when you’re happy your security groups don’t need too much further tweaking!
Soaking up TCP traffic with Elastic Load Balancers and AWS Shield
Elastic Load Balancers are useful for, as the name suggests, balancing traffic across multiple application pools. However, they’re also fairly good at scaling upward, hence the ‘elastic’ in their name. We can harness that elasticity to provide a good solid barrier between the internet and our applications, but also to bridge public and private subnets.
Since you can restrict traffic on both internal (as in, facing your compute infrastructure) and external (facing away from it), Elastic Load Balancers both allow traffic to bridge subnets, but also act as a barrier to shield against TCP floods in to your private subnets.
This year, AWS announced Shield, a managed DDoS protection offering, which is enabled by default for all customers. A paid-for offering, AWS Shield Advanced, offers support for integrating with your Elastic Load Balancers, CloudFront distributions and Route 53 record sets, as well as a consulting function, the DDoS Response Team, and protection for your AWS bill against traffic spikes causing you additional cost.
Connecting to services over private networks
If you’ve managed to create a service entirely within a private subnet, then the last thing you really want to do is to have to connect over public networks to get access to certain data, especially if you’re in a regulated environment, or care about the security of your data (which you really should do!).
Thankfully, AWS provides two ways of accessing to your data over private networks. Some services, such as Amazon RDS and Amazon ElastiCache, allow you to have the A record they insert in to DNS under an Amazon-managed zone populated by an available IP address in your private subnet. That way, whilst your DNS record is in the open, the A record is only really useful if you’re already inside the subnet where it’s connected to the Amazon-managed service. The record is published in a public zone, but anyone else who tries to connect to the address will either be unable to, or will get to a system on their own network at the same address!
Another, newer, way of connecting to a service from a private address is to use a VPC Endpoint, where Amazon establishes a route to a public service – currently, only S3 is supported – from within your private subnet, and amends your route table appropriately. This means traffic hits S3 entirely via your private subnet, by extending the borders of S3 and your subnet close to each other, so that S3 can appear in your subnet.
STS: the Security Token Service
The AWS Security Token Service works with Identity and Access Management (IAM) to allow you to request temporary IAM credentials for users who authenticate using federated identity services (see below) or for users defined directly in IAM itself. I like to use the STS GetFederationToken API call with federated users, since they can authenticate with my existing on-premise service, and receive temporary IAM credentials directly from AWS in a self-service fashion.
By default, AWS enables STS in all available regions. Instead, it’s safer to turn STS on only when you need it in specific regions, since that way, you’re scoping your attack surface solely to regions which you know you rely upon. You can turn STS region endpoints on and off, with the exception of the US (East) region, in the IAM console under the Account Settings tab.
Federating AWS Identity and Access Management using SAML or OIDC
Many organizations already have some pre-existing authentication database for authenticating employees trying to connect to their email inboxes, to expenses systems, and a whole host of other internal systems. There are typically policies and procedures around access control already in place, often involving onboarding and offboarding processes, so when a colleague joins, leaves or changes role in an organization, the authentication database and related permissions are adequately updated.
You can federate authentication systems which use SAML or OpenID Connect (OIDC) to IAM, allowing authentication of your users to occur locally against existing systems. This works well with products such as Active Directory (through Active Directory Federation Services) and Google Apps for Work, but I’ve also heard about Oracle WebLogic, Auth0, Shibboleth, Okta, Salesforce, Apache Altu, and modules for Nginx and Apache being used.
That way, when a colleague joins, as long as they’ve been granted the relevant permissions in your authentication service, they’ve got access to assume IAM roles, which you define, in the AWS console. And, when they leave, their access is revoked from AWS as soon as you remove their federated identity account. Unfortunately, though, there exists a caveat: since a generated STS token can’t be revoked, then if the identity account has been removed, the STS token may still work, and is still valid. To work around this, another good practice is to enforce a low expiration time, since the default of twelve hours is quite high.
Credentials provider chain
Whilst you may use a username and passphrase to get access to the AWS Management Console, the vast majority of programmatic access is going to authenticate with AWS using an IAM access key and secret key pair. You generate these on a per-user basis in the IAM console, with a maximum of two available at any one time. However, it’s how you use them that’s really the crux of the matter.
The concept of the credentials provider chain exists to assist services calling an AWS API through one of the many language-specific SDKs work out where to look for IAM credentials, and in what order to use them.
The AWS SDKs look for IAM credentials in the following order:
- through environment variables
- through JVM system properties
- on disk, at ~/.aws/credentials
- from the AWS EC2 Container Service
- or from the EC2 metadata service
I’m never a massive fan of hard-coding credentials on disk, so I prefer recommending that keys are either transparently handled through the metadata service (you can use IAM roles and instance profiles to help you provide keys to instances, wrapped using STS) when an EC2 instance makes a call needing authentication, or that the keys are passed using environment variables. Regardless, properly setting IAM policies is important: if your application needs only access to put files in to S3 and read from ElastiCache, then only let it do that!
IAM allows you to enforce the use of multi-factor authentication (MFA), or two-factor authentication as it’s often known elsewhere. It’s generally good practice to use this on all of your accounts – especially your root account, since that holds special privileges that IAM accounts don’t get by default, such as access to your billing information.
It’s generally recommended that you enable MFA on your root account, create another IAM user for getting access to the Management Console and APIs, and then create your AWS infrastructure using these IAM users. In essence, you should get out of the habit of using the root account as quickly as possible after enforcing MFA on it.
In many organisations, access to the root account is not something you want to tie down to one named user, but when setting up MFA, you need to provide two codes from an MFA device to enable it (since this is how AWS checks that your MFA device has been set up correctly and is in sync). The QR code provided contains the secret visible using the link below it, and this secret can be stored in a password vault or a physical safe, where others can use it to re-generate the QR code, if required. Scanning the QR code will also give you a URL which you can use on some devices to trigger opening of an app like Google Authenticator. You can request AWS disables MFA on your root account at any time, per the documentation.
Hopefully, you’re already doing – or at least thinking – about some of the ideas above for use in the AWS infrastructure in your organization, but if you’re not, then start thinking about them. Whilst some of the services mentioned above, such as Shield and IAM, offer security as part of their core offering, others – like using Elastic Load Balancers to soak up TCP traffic, using Network ACLs to explicitly deny traffic, or thinking about your architecture by considering public subnets as DMZs – are often overlooked as they’re a little less obvious.
Hopefully, the tips above can help you create a more secure stack in future.
About the Author
Andrew Langhorn is a senior consultant at ThoughtWorks. He works with clients large and small on all sorts of infrastructure, security and performance problems. Previously, he was up to no good helping build, manage and operate the infrastructure behind GOV.UK, the simpler, clearer and faster way to access UK Government services and information. He lives in Manchester, England, with his beloved gin collection, blogs at ajlanghorn.com, and is a firm believer that mince pies aren’t to be eaten before December 1st.
About the Editors
Dean Wilson (@unixdaemon) is a professional FOSS Sysadmin, occasional coder and very occasional blogger at www.unixdaemon.net. He is currently working as a web operations engineer for Government Digital Service in the UK.