AWS and the New Enterprise WAN

15. December 2018 2018 0

The public cloud’s wholesale transformation of IT includes a shifting in enterprise IT requirements for the wide-area network (WAN). The viability of traditional network architectures for interconnecting hundreds or even thousands of remote offices, or branches, is rapidly decreasing as enterprises consume IT as utility. More agile, secure, and dynamic WAN is needed. As an industry, we have a name for this emerging networking trend: Software-Defined WAN (SD-WAN). In this article, we explore SD-WAN  with a focus on the integration with AWS VPC infrastructure.

Importance of Multi-Cloud

Enterprises may have applications in on-premise data centers, colocation facilities, and infrastructure-as-a-service (IaaS) provider platforms and need to access the information in these locations in a flexible manner. Also, SaaS (Software As A Service)  is now the standard option for a wide range of enterprise applications. Yet the growth in SaaS hasn’t always been met by a growth in the infrastructure needed to cope with the resulting increase in network utilization. Older WAN technologies deployed at corporate branches are no longer sufficient for the modern SaaS-enabled workforce. As data stops flowing to and from the data center and starts flowing over the internet, congestion, packet loss, and high latencies are all too common.

The notion that companies can spread a single application across multiple public clouds has had its detractors. Some argue that the only “multi-cloud” approach will be a distribution of applications in a way that caters to the perceived strengths of a given cloud provider. For example, a company might consume AWS’s Lambda for functions-as-a-service while looking to Google Cloud Platform (GCP) for machine learning services. This approach is valid, and we see it regularly in our work; however, we also observe how the rise of Kubernetes is changing the IT roadmaps within the enterprise. We have no doubts: the future is multi-cloud.

Let’s not forget that most enterprises–unlike companies “born in the cloud”–must continue to operate infrastructure on-premise and in third-party colocation facilities. Why will private cloud deployments persist? Isn’t this passe? To answer this, let’s look at a telling quote from Amazon’s Anu Sharma, product manager for the new AWS Outputs hybrid cloud service. She acknowledges “…there are some applications that [customers] cannot move to AWS largely because of physical constraints…” She highlighted latency and its effect on moving data in and out of the cloud. Whether the private cloud is implemented as OpenStack, AWS Outposts, or Azure Stack, private cloud will remain in the picture.

Today’s WAN is Inadequate

Therefore, in an environment in which application placement is diverse, enterprises must figure out how to connect the employees in many physical locations to the tools they need to perform their job functions. The complexity involved in moving bits around these highly heterogeneous environments can be overwhelming.

Traditionally enterprises paid telecommunication companies premium prices for Multiprotocol Label Switching (MPLS) links or private point-to-point links for connecting remote branches to centralized corporate data centers. Traffic from branch locations was carried over the private connectivity regardless whether the bits were intended for an internal app, a SaaS application, or an Internet search engine. This added latency to network connections as all traffic was–to use networking parlance–”backhauled” to a small number of corporate locations. Within these corporate data centers, the network was the proverbial long pole in the tent in deploying new applications such that the geographically dispersed workforce could access them.

Figure 1 depicts the connection of multiple remote branches to a centralized data center. Note that all traffic exiting the branches traverses the expensive MPLS network to reach all destinations–including Internet ones.

 

image4Figure 1: Traditional Enterprise WAN

As mentioned earlier, as workloads spread across on-premise and public cloud infrastructure, enterprises need more flexible, secure, and agile means for connecting branch offices, namely SD-WAN. The projects around hybrid cloud connectivity and the modernization of the WAN infrastructure will run in parallel for many years to come. Any effort to move traffic between on-premise and off-premise needs to keep the requirements of the new WAN in mind.

Before defining SD-WAN, let’s examine the underlying access mechanisms at our disposal. We are no longer limited to T1 and other leased line service for business-grade connectivity. Access might consist of fiber-based Ethernet, business cable, fixed 4G/5G, or satellite. These access types might coexist with MPLS links or as a mean to replace them. A given branch might have more than one path for exiting the branch. Considering internet connectivity by itself, all type of access are not alike. A larger office might have a 1Gb/s fiber links while a kiosk in the mall might have spotty Wi-fi.

Does this heterogeneity of WAN access and multiple entry/exit paths sound like a management nightmare? This could very well be the case if we designed and operated the new WAN in the manner of the previous generation WAN.

Enter The Software-Defined WAN

What SD-WAN provides is an abstraction layer for the WAN to simplify the management and cost of wide-area connectivity while recognizing application performance improvements

Let’s compare and contrast the WAN abstraction and VPC as a data center abstraction. VPC constructs such as subnets, load balancers, and virtual gateways are purely ephemeral with the ability to appear and later vanquish with the stroke of an API call. But how do we abstract a WAN? Fiber and copper are tangible. We want to break the coupling of network capabilities with how the packets are delivered over the various access mechanisms. SD-WAN accomplishes this through the abstraction or introduction of an overlay network that extends over various connectivity methods. The overload network is implemented using SD-WAN appliances on either side of the connection.

 

image1

Figure 2: SD-WAN Overlay

Is it possible to cut through the hype and describe what SD-WAN can deliver for the enterprise in term of efficiency, cost reduction, and strategic enablement of new services? Yes, although doing so can be challenging.

To start, SD-WAN may be deployed in many different models. For example, service providers can add SD-WAN to their existing MPLS offering as a “first mile/last mile” technology.  On the other hand, enterprises might want to deploy SD-WAN in a full overlay model where they assert full control over the SD-WAN solution and appliances. Even such a model may be deployed in-house or as a managed service. In this post, we focus on the latter: the full overlay model.

In addition, there is much confusion as what features should an SD-WAN solution contain at a minimum. The SD-WAN space–like any other “hot” technology area–is a crowded area. Engineers may find it very hard to distinguish between a true SD-WAN solution and old WAN optimization appliance wrapped in new marketing jargon. To make things even more confusing, each enterprise is approaching SD-WAN from a different problem space. For example, some might consider application performance enhancements as their primary goal while others consider cost reductions as the primary driver.

We believe the following should be present in any chosen SD-WAN solution:

Provides Agility

Network agility–not lower costs or better performance–is the main factor for enterprises adopting SD-WAN infrastructure, according to findings in a survey conducted by Cato Networks.

A good SD-WAN solution enables rapid branch deployments with self-provisioning. Bringing a new branch or remote location online should be easy and completed within minutes. The branch appliance, physical or virtual, should just to be connected to the LAN and WAN links serving the branch, plugged it in, and turned on. No specialized IT expertise should be required on premise at the branch.

It is Flexible

Flexibility can be evaluated in many different contexts. Any SD-WAN solution should be end-to-end and not put any restrictions on where the data should reside. The reason it has to be end-to-end is that your users are in many places. They can be on your branches, or they can be on your campuses, or they can be road warriors connecting to your resources using client VPN software over the public Internet.

Similarly, your data and your applications are everywhere. They’re on-prem, and they are in the public cloud. The hubs for the SDN-deployment should be able to be a physical or virtual device in a traditional data center, a virtual device in the corporate private cloud or a virtual device in the public cloud of choice.

In addition, a true SD-WAN solution should support different topologies. Many enterprises use a hub and spoke or a full mesh topology. Most SD-WAN solutions support these basic topologies. But one could think of many other hybrid topologies, and SD-WAN solutions should not restrict enterprises to one end of the topology spectrum. SD-WAN should provide insertion of network services whether on the branch customer premise equipment (CPE), in the public cloud, or in regional and enterprise data centers, deployed in a wide range of topologies.

In addition, these SD-WAN solutions should provide automation and business-policy abstraction to simplify complex configurations and provide flexibility in traffic routing and policy definitions.

Includes Integrated Security

One could imagine a day in which a traditional firewall device isn’t needed per branch. It is no wonder that on the long list of SD-WAN vendors we find many of familiar names in the traditional firewall vendor space.  We believe integrating advanced security features into SD-WAN services allows a cleaner more simple deployment model for the branches.

Even though basic firewalling capabilities in some SD-WAN appliances might be sufficient for some enterprises, given today’s threat landscape most enterprise need and demand advanced firewall capabilities if the only appliance deployed on the branch is an SD-WAN device. Enterprises need to find a solution that delivers advanced security features without compromising desired SD-WAN functionality such as application optimization or fast fail-over.

Optimizes Application Performance

One of SD-WAN’s most desirable features is to position applications to choose the best connectivity based not only on performance metrics but also business metrics such as cost. Let’s take an example of ensuring users are proximal to their applications.  In our example, an enterprise has existing servers deployed in both the on-premise data center as well as the AWS public cloud. Let’s say you have an end user in Arlington, VA. Your on-premise data center is located in Pittsburgh while the AWS deployment region is us-east-1 in Ashburn, VA.

For applications that are housed in the Pittsburgh data center, branches have two possible paths to get to these resources. For applications that require large bandwidth and guaranteed SLAs, the MPLS path should be used. On the other hand, if cost is the primary factor for an application such as bulk file transfers, then perhaps the path through the Internet is the best option.

Similar choices exist for data housed in the AWS cloud. For the Branch to AWS VPC we can pick between Direct Connect (DX) connection to the VPC vs. pure Internet access. We should be able to optimize based on business requirements. A sensitive application might only be allowed to use the DX connection provided through the data center while for the rest Internet-based access might suffice. An acceptable SD-WAN solution provides IPSEC-based encryption between the branches and the centralized hub location (cloud or data center) when the connection is through the Internet. The configuration of this IPsec tunnel and the routing through them should be performed by the SD-WAN controller and not manually.

 

image3

Figure 3: Branch with multiple paths to reach applications

As described, SD-WAN hub can reside within an AWS VPC, in effect turning the VPC into another aggregation hub for the remote sites. In AWS, an SD-WAN termination point is an appliance from the Marketplace. For an interesting in-depth look at SD-WAN appliances, check out the AWS-commissioned report by ESG Labs entitled SD-WAN Integration with Amazon Web Services.

 

There may be different architectures to home an SD-WAN appliance within an AWS architecture, but we would like to explore one we like to call “edge services VPC. For the sake of simplicity, we show a single region deployment with three VPCs.

image2Figure 4: The Edge Services VPC Design

There are many possible ways to terminate the edge SD-WAN connections into a public cloud. At the very basic level SDN solutions rely on an SD-WAN gateway, which performs the hub functionality. This appliance which will be a VM when deployed in the AWS VPC aggregates all the connections from the SD-WAN branches.

We like the idea of a separate “edge services VPC” dedicated for all the edge connectivity terminations. This VPC would terminate the SDN-WAN connections. The connectivity between this VPC and other VPCs can be provided through a simple VPC peering if the AWS deployment is small or through a Transit Gateway (TGW) if a larger number of VPCs need access to the Edge Services VPC.  Even though outside the scope of this paper, one could imagine yet a larger deployment with edge services VPC per region, each connected to other VPCs within that region through a TGW.

Conclusion

In this article, we’ve described how multi-cloud and the diversity of WAN connectivity options for enterprise branches have given rise to a flexible, agile, and secure SD-WAN. We believe that enterprise public cloud migrations–while not necessarily dependent on SD-WAN–will occur in the same timelines as the move to SD-WAN as enterprise IT architects recognize that more intelligence is needed in the network path selection process. The details about SD-WAN vendor selection and design will vary. One thing is certain: the enterprise WAN is evolving toward a more software-centric approach to meet the needs of enterprise applications.

 

About the Authors

Amir Tabdili and Jeff Loughridge have been designing, operating, and engineering large-scale IP infrastructures since the late-1990s. In their current roles as Chief Architect and CTO of Konekti Systems, the two help clients with public cloud networking, SD-WAN, and hybrid IT architectures. You can learn more about Konekti at https://konekti.us.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


A Hybrid of One: Building a Private Cloud in AWS

12. December 2018 2018 0

Introduction

Adoption of the cloud is becoming more and more popular for all types of businesses. When you’re starting, you have a blank canvas to work from – there’s no existing blueprint or guide. But what if you’re not in that position? What if you’ve already got an established security policy in place, or you’re working in a regulated industry that sets limits of what’s appropriate or acceptable for your company’s IT infrastructure?

Being able to leverage the elasticity of the public cloud is one of its biggest – if not, the biggest – advantage over a traditional corporate IT environment. Building a private cloud takes time, money and a significant amount of investment. This investment might not be acceptable to your organisation or ever generate returns…

But what if we can build a “private” cloud using public cloud services?

The Virtual Private Cloud

If you’ve looked at AWS, you’ll be familiar with the concept of a “VPC” – A “Virtual Private Cloud”, the first resource you’ll create in your AWS account (if you don’t use the default VPC created in every region when your account is created, that is!). It’s private in the sense that it’s your little bubble, to do with as you please. You control it, nurture it and manage it (hopefully with automation tools!). But private doesn’t mean isolated, and this does not fit the definition of a “private cloud.”

If you misconfigure your AWS environment, you can accidentally expose your environment to the public Internet, and an intruder may be able to use this as a stepping-stone into the rest of your network.

In this article, we’re going to look at the building blocks of your own “private” cloud in the AWS environment. We’ll cover isolating your VPC from the public internet, controlling what data enters and, crucially, leaves your cloud, as well as ensuring that your users can get the best out of their new shiny cloud.

Connecting to your “private” Cloud

AWS is most commonly accessed over the Internet. You publish ‘services’ to be consumed by your users. This is how many people think of AWS – a Load balancer with a couple of web servers, some databases and perhaps a bit of email or workflow.

In the “private” world, it’s unlikely you’ll want to provide direct access to your services over the Internet. You need to guarantee the integrity and security of your data. To maximise your use of the new environment you want to make sure it’s as close to your users and the rest of your infrastructure as possible.

AWS has two private connectivity methods you can use for this: DirectConnect and AWS managed VPN.

Both technologies allow you to “extend” your network into AWS. When you create your VPC, you allocate an IP range (that doesn’t clash with your internet network), and you can then establish a site-to-site connection to your new VPC. Any instance or service you spin up in your VPC is accessed directly from your internal network, using its private IP address. It’s just as if a new datacenter appeared on your network. Remember, you can still configure your VPC with an Internet Gateway and allocate Public IP addresses (or Elastic IPs) to your instances, which would then give them both an Internet IP and an IP on your internal network – you probably don’t want to do this!

The AWS managed VPN service allows you to establish a VPN over the Internet between your network(s) and AWS. You’re limited by the speed of your internet connection. Also, you’re accessing your cloud environment over the Internet, with all the variable performance and latency that entails.

The diagram below shows an example how of AWS Managed VPN connectivity interfaces with your network:

AWS DirectConnect allows you to establish a private circuit with AWS (like a traditional “leased line”). Your network traffic never touches the Internet or any other uncontrolled public network. You can directly connect to AWS’ routers at one of their shared facilities, or you can use a third-party service to provide the physical connectivity. The right option depends on your connectivity requirements: directly connecting to AWS means you can own the service end-to-end, but using a third party allows you greater flexibility in how you design the resiliency and the connection speed you want to AWS (DirectConnect offers physical 1GbE or 10GbE connectivity options, but you might want something in between, which is where a third party can really help here).

The diagram below shows an example of how you can architect DirectConnect connectivity between your corporate datacenter and the AWS cloud. DirectConnect also allows you to connect directly to Amazon services over your private connection, if required. This ensures that no traffic traverses the public Internet when you’re accessing AWS hosted services (such as API endpoints, S3, etc.). DirectConnect also allows you to access services across different regions, so you could have your primary infrastructure in eu-west-1 and your DR infrastructure in eu-west-2, and use the same DirectConnect to access both regions.

2-direct_connect_overview

Both connectivity options offer the same native approach to access control you’re familiar with. Network ACLs (NACLs) and Security Groups function exactly as before – you can reference your internal network IP addresses/CIDR ranges as normal and control service access by IP and port. There’s no NAT in place between your network and AWS; it’s just like another datacenter on your network.

Pro Tip: You probably want to delete your default VPCs. By default, AWS services will launch into the default VPC for a specific region, and this comes configured with the standard AWS template of ‘public/private’ subnets and internet gateways. Deleting the default VPCs and associated security groups makes it slightly harder for someone to spin up a service in the wrong place accidentally.

Workload Segregation

You’re not restricted to a single AWS VPC (by default, you’re able to create 5 per region, but this limit can be increased by contacting AWS support). VPCs make it very easy to isolate services – services you might not want to be accessed directly from your corporate network. You can build a ‘DMZ-like’ structure in your “private” cloud environment.

One good example of this is in the diagram below – you have a “landing zone” VPC where you host services that should be accessible directly from your corporate network (allowing you to create a bastion host environment), and you run your workloads elsewhere – isolated from your internal corporate network. In the example below, we also show an ‘external’ VPC – allowing us to access Internet-based services, as well as providing a secure inbound zone where we can accept incoming connectivity if required (essentially, this is a DMZ network, and can be used for both inbound and outbound traffic).

Through the use of VPC Peering, you can ensure that your workload VPCs can be reached from your inbound-gateway VPC, but as VPCs do not support transitive networking configurations by default, you cannot connect from the internal network directly to your workload VPC.

3-Multi-VPC Peering and Privatelink

Shared Services

Once your connectivity between your corporate network and AWS is established, you’ll want to deploy some services. Sure, spinning up an EC2 instance and connecting to it is easy, but what if you need to connect to an authentication service such as LDAP or Active Directory? Do you need to route your access via an on-premise web proxy server? Or, what if you want to publish services to the rest of your AWS environment or your corporate network but keep them isolated in your DMZ VPC?

Enter AWS PrivateLink: Launched at re:Invent in 2017, it allows you to “publish” a Network Load Balancer to other VPCs or other AWS Accounts without needing to establish VPC peering. It’s commonly used to expose specific services or to supply MarketPlace services (“SaaS” offerings) without needing to provide any more connectivity over and above precisely what your service requires.

We’re going to offer an example here of using PrivateLink to expose access to an AWS hosted web proxy server to our isolated inbound and workload VPCs. This gives you the ability to keep sensitive services isolated from the rest of your network but still provide essential functionality. AWS prohibit transitive VPCs for network traffic (i.e., you cannot route from VPC A to VPC C via a shared VPC B) but PrivateLink allows you to work around this limitation for individual services (basically, anything you can “hide” behind a Network Load Balancer).

Assuming we’ve created the network architecture as per the diagram above, we need to create our Network Load Balancer first. NLBs are the only load balancer type supported by PrivateLink at present.

4-load balancer.png

Once this is complete, we can then create our ‘Endpoint Service,’ which is in the VPC section of the console:

5-create endpoint service.png

Once the Endpoint Service is created, take note of the Endpoint Service Name, you’ll need this to create the actual endpoints in your VPCs.

6-endpoint service details

The Endpoint Service Name is unique across all VPC endpoints in a specific region. This means you can share this with other accounts, which are then able to discover your endpoint service. By default, you need to accept all requests to your endpoint manually, but this can be disabled (you probably don’t want this, though!). You can also whitelist specific account IDs that are allowed to create a PrivateLink connection to your endpoint.

Once your Endpoint Service is created, you then need to expose this into your VPCs. This is done from the ‘Endpoints’ configuration screen under VPCs in the AWS console. Validate your endpoint service name and select the VPC required – simple!

7-endpoint details

 

You can then use this DNS name to reference your VPC endpoint. It will resolve to an IP address in your VPC (via an Elastic Network Interface), but traffic to this endpoint will be routed directly across the Amazon network to the Network Load Balancer.

What’s in a Name?

Typically, one of the biggest hurdles with connecting between your internal network and AWS is the ability to route DNS queries correctly. DNS is key to many Amazon services, and Amazon Provided DNS (now Route53 Resolver) contains a significant amount of behind-the-scenes intelligence, such as allowing you to reach the correct Availability Zone target for your ALB or EFS mount point.

Hot off the press is the launch of Route53 Resolver, which removes the need to create your own DNS infrastructure to route requests between your AWS network and your internal network, while allowing you to continue to leverage the intelligence built into the Amazon DNS service. Previously, you would need to build your own DNS forwarder on an EC2 instance to route queries to your corporate network. This means that, from the AWS perspective, all your DNS requests are originating from a single server in a specific AZ (which might be different to the AZ of the client system), and so you’d end up getting the endpoint in a different region for your service. With a service such as EFS, this could result in increased latency and a high cross-AZ data transfer bill.

Here’s an example of how the Route53 resolver automatically picks the correct mount point target based on the location of your client system:

Pro Tip: If you’re using a lot of standardised endpoint services (such as proxy servers), using a common DNS name which can be used across VPCs is a real time-saver. This requires you to create a Route53 internal zone for each VPC (such as workload.example.com, inbound.example.com) and update the VPC DHCP Option Set to hand out this domain name via DHCP to your instances. This, then allows you to create a record in each zone with a CNAME to the endpoint service, for example:

From an instance in our workload VPC:

And the same commands from an instance in our inbound VPC:

In this example above, we could use our configuration management system to set the http_proxy environment variable to ‘proxy.privatelink:3128’ and not have to have per-VPC specific logic configured. Neat!

Closing Notes

There are still AWS services that expect to have Internet access available from your VPC by default. One example of this is AWS Fargate – the Amazon-hosted and managed container deployment solution. However, Amazon is constantly migrating more and more services to PrivateLink, meaning this restriction is slowly going away.

A full list of currently available VPC endpoint services is available in the VPC Endpoint documentation. AWS provided VPC Endpoints also give you the option to update DNS to return the VPC endpoint IPs when you resolve the relevant AWS endpoint service name (i.e. ec2.eu-west-1.amazonaws.com -> vpce-123-abc.ec2.eu-west-1.amazonaws.com -> 10.10.0.123) so you do not have to make any changes to your applications in order to use the Amazon provided endpoints.

About the Author

Jon is a freelance cloud devoperative buzzword-hater, currently governing the clouds for a financial investment company in London, helping them expand their research activities into “the cloud.”

Before branching out into the big bad world of corporate consulting, Jon spent five years at Red Hat, focusing on the financial services sector as a Technical Account Manager, and then as an on-site consultant.

When he’s not yelling at the cloud, Jon is a trustee of the charity Service By Emergency Rider Volunteers – Surrey & South London, the “Blood Runners,” who provide free out-of-hours transport services to the UK National Health Service. He is also guardian to two small dogs and a flock of chickens.

Feel free to shout at him on Twitter, or send an old-fashioned email.

About the Editor

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Jennifer is the coauthor of Effective DevOps. Previously, she was a principal site reliability engineer at RealSelf, developed cookbooks to simplify building and managing infrastructure at Chef, and built reliable service platforms at Yahoo. She is a core organizer of devopsdays and organizes the Silicon Valley event. She is the founder of CoffeeOps. She has spoken and written about DevOps, Operations, Monitoring, and Automation.


AWS network security monitoring with FlowLogs

19. December 2016 2016 0

Author: Lennart Koopmann
Editors: Zoltán Lajos Kis

Regardless if you are running servers in AWS or your own data center, you need to have a high level of protection against intrusions. No matter how strict your security groups and local iptables are configured, there is always the chance that a determined attacker will make it past these barriers and move laterally within your network. In this post, I will walk through how to protect your AWS network with FlowLogs. From implementation and collection of FlowLogs in CloudWatch, to the analyzation of the data with Graylog, a log management system, you will be fully equipped to monitor your environment.

Introduction

As Rob Joyce, Chief of TAO at the NSA discussed in his talk at USENIX Enigma 2015, it’s critical to know your own network: What is connecting where, which ports are open, and what are usual connection patterns.

Fortunately AWS has the FlowLogs feature, which allows you to get a copy of raw network connection logs with a significant amount of metadata. This feature can be compared to Netflow capable routers, firewalls, and switches in classic, on-premise data centers.

FlowLogs are available for every AWS entity that uses Elastic Network Interfaces. The most important services that do this are EC2, ELB, ECS and RDS.

What information do FlowLogs include?

Let’s look at an example message:

This message tells us that the following network connection was observed:

  • 2 – The VPC flow log version is 2
  • 123456789010- The AWS account id was 123456789010
  • eni-abc 123de- The recording network interface was eni-abc123de. (ENI is Elastic Network Interface)
  • 172.31.16.139:20641 and 172.31.16.21.22 – 172.31.16.139:20641 attempted to connect to 172.31.16.21:22
  • 6 – The IANA protocol number used was 6 (TCP)
  • 20 and 429 – 4249 bytes were exchanged over 20 packets
  • 1418530010 – The start of the capture window in Unix seconds was 12/4/2016 at 4:06 am (UTC) ((A capture window is a duration of time which AWS aggregates before publishing the logs.The published logs will have a more accurate timestamp as metadata later.)
  • 1418630070 – The end of the capture window in Unix seconds was 12/4/2016 at 4:07 am (UTC)
  • ACCEPT – The recorded traffic was accepted. (If the recorded traffic was refused, it would say “REJECT”).
  • OK – All data was logged normally during the capture window: OK. This could also be set to NODATA if there were no observed connections or SKIPDATA if some connection were recorded but not logged for internal capacity reasons or errors.

Note that if your network interface has multiple IP addresses and traffic is sent to a secondary private IP address, the log will show the primary private IP address.

By storing this data and making it searchable, we will be able to answer several security related questions and get a definitive overview of our network.

How does the FlowLogs feature work?

FlowLogs must be enabled per network interface or VPC (Amazon Virtual Private Cloud) wide. You can enable it for a specific network interface by browsing to a network interface in your EC2 (Amazon Elastic Compute Cloud) console and clicking “Create Flow Log” in the Flow Logs tab. A VPC allows you to get a private network to place your EC2 instances into. In addition, all EC2 instances automatically receive a primary ENI so you do not need to fiddle with setting up ENIs.

Enabling FlowLogs for a whole VPC or subnet works similarly by browsing to the details page of a VPC or subnet and selecting “Create Flow Log” form the Flow Logs tab.

AWS will always write FlowLogs to a CloudWatch Log Group. This means that you can instantly browse your logs through the CloudWatch console and confirm that the configuration worked. (Allow 10-15 minutes to complete the first capture window as FlowLogs do not capture real-time log streams, but have a few minutes’ delay.)

How to collect and analyze FlowLogs

Now that you have the FlowLogs in CloudWatch, you will notice that the vast amount of data makes it difficult to extract intelligence from it. You will need an additional tool to further aggregate and present the data.

Luckily, there are two ways to access CloudWatch logs. You can either use the CloudWatch API directly or forward incoming data to a Kinesis stream.

In this post, I’ll be using Graylog as log management tool to further analyze the FlowLogs data simply because this is the tool I have the most experience with. Graylog is an open-source tool that you can download and run on your own without relying on any third-party. You should be able to use other tools like the ELK stack or Splunk, too. Choose your favorite!

The AWS plugin for Graylog has a direct integration with FlowLogs through Kinesis that only needs a few runtime configuration parameters. There is also official Graylog AWS machine images (AMIs) to get started quickly.

FlowLogs in Graylog will look like this:

Example analysis and use-cases

Now let’s view a few example searches and analysis that you can run with this.

Typically, you would browse through the data and explore. It would not take long until you find an out-of-place connection pattern that should not be there.

Example 1: Find internal services that have direct connections from the outside

Imagine you are running web services that should not be accessible from the outside directly, but only through an ELB load balancer.

Run the following query to find out if there are direct connections that are bypassing the ELBs:

In a perfect setup, this search would return no results. However, if it does return results, you should check your security groups and make sure that there is no direct traffic from the outside allowed.

We can also dig deeper into the addresses that connected directly to see who owns them and where they are located:

Example 2: Data flow from databases

Databases should only deliver data back to applications that have a legitimate need for that data. If data is flowing to any other destination, this can be an indication of a data breach or an attacker preparing to exfiltrate data from within your networks.

This simple query below will show you if any data was flowing from a RDS instance to a location outside of your own AWS networks:

This hopefully does not return a result, but let’s still investigate. We can follow where the data is flowing to by drilling deeper into the dst_addr field from a result that catches internal connections.

As you see, all destination addresses have a legitimate need for receiving data from RDS. This of course does not mean that you are completely safe, but it does rule out several attack vectors.

Example 3: Detect known C&C channels

If something in your networks is infected with malware, there is a high chance that it will communicate back with C&C (Command & Control) servers. Luckily, this communication cannot be hidden on the low level we are monitoring so we will be able to detect it.

The Graylog Threat Intelligence plugin can compare recorded IP addresses against lists of known threats. A simple query to find this traffic would look like this:

Note that these lists are fairly accurate, but never 100% complete. A hit tells you that something might be wrong, but an empty result does not guarantee that there are no issues.

For an even higher hit rate, you can collect DNS traffic and match the requested hostnames against known threat sources using Graylog.

Use-cases outside of security

The collected data is also incredibly helpful in non-security related use-cases. For example, you can run a query like this to find out where your load balancers (ELBs) are making requests to:

Looking from the other side, you could see which ELBs a particular EC2 instance is answering to:

Next steps

CloudTrail

You can send CloudTrail events into Graylog and correlate recorded IP addresses with FlowLog activity. This will allow you to follow what a potential attacker or suspicious actor has performed at your perimeter or even inside your network.

Dashboards

With the immense amount of data and information coming in every second, it is important to have measures in place that will help you keep an overview and not miss any suspicious activity.

Dashboards are a great way to incorporate operational awareness without having to perform manual searches and analysis. Every minute you invest in good dashboards will save you time in the future.

Alerts

Alerts are a helpful tool for monitoring your environment. For example, Graylog can automatically trigger an email or Slack message the moment a login from outside of your trusted network occurs. Then, you can immediately investigate the activity in Graylog.

Conclusion

Monitoring and analyzing your FlowLogs is vital for staying protected against intrusions. By combining the ease of AWS CloudWatch with the flexibility of Graylog, you can dive deeper in your data and spot anomalies.

About the Author

Lennart Koopmann is the founder of Graylog and started the project in 2010. He has a strong software development background and is also experienced in network and information security.

About the Editors

Zoltán Lajos Kis joined Ericsson in 2007 to work with scalable peer-to-peer and self organizing networks. Since then he has worked with various telecom core products and then on Software Defined Networks. Currently his focus is on cloud infrastructure and big data services.


Just add Code: Fun with Terraform Modules and AWS

06. December 2016 2016 0

Author: Chris Marchesi

Editors: Andrew Langhorn, Anthony Elizondo

This article is going to show you how you can use Terraform, with a little help from Packer and Chef, to deploy a fully-functional sample web application, complete with auto-scaling and load balancing, in under 50 lines of Terraform code.

You will need the sample project to follow along, so make sure you load that up before continuing with reading this article.

The Humble Configuration

Check out the code in the terraform/main.tf file.

It might be hard to think that with this mere smattering of Terraform is setting up:

  • An AWS VPC
  • 2 subnets, each in different availability zones, fully routed
  • An AWS Application Load Balancer
  • A listener for the ALB
  • An AWS Auto Scaling group
  • An ALB target group attached to the ALB
  • Configured security groups for both the ALB and backend instances

So what’s the secret?

Terraform Modules

This example is using a powerful feature of Terraform – the modules feature, providing a semantic and repeatable way to manage AWS infrastructure. The modules hide most of the complexity of setting up a full VPC behind a relatively small set of code, and an even smaller set of changes going forward (generally, to update this application, all that is needed is to update the AMI).

Note that this example is composed entirely of modules – no root module resources exist. That’s not to say that they can’t exist – and in fact one of the secondary examples demonstrates how you can use the outputs of one of the modules to add extra resources on an as-needed basis.

The example is composed of three visible modules, and one module that operates under the hood as a dependency:

  • terraform_aws_vpc, which sets up the VPC and subnets
  • terraform_aws_alb, which sets up the ALB and listener
  • terraform_aws_asg, which configures the Auto Scaling group, and ALB target group for the launched instances
  • terraform_aws_security_group, which is used by the ALB and Auto Scaling modules to set up security groups to restrict traffic flow.

These modules will be explained in detail later in the article.

How Terraform Modules Work

Terraform modules work very similar to basic Terraform configuration. In fact, each Terraform module is a standalone configuration in its own right, and depending on its pre-requisites, can run completely on its own. In fact, a top-level Terraform configuration without any modules being used is still a module – the root module. You sometimes see this mentioned in various parts of the Terraform workflow, such as in things like error messages, and the state file.

Module Sources and Versioning

Terraform supports a wide variety of remote sources for modules, such as simple, generic locations like HTTP, or Git, or well-known locations like GitHub, Bitbucket, or Amazon S3.

You don’t even need to put a module in a remote location. In fact, a good habit to get into is if you need to re-use Terraform code in a local project, put that code in a module – that way you can re-use it several times to create the same kind of resources in either the same, or even better, different, environments.

Declaring a module is simple. Let’s look at the VPC module from the example:

The location of the module is specified with the source parameter. The style of the parameter will dictate what kind of behaviour TF will undertake to get the module.

The rest of the options here are module parameters, which translate to variables within the module. Note that any variable that does not have a default value in the module is a required parameter, and Terraform will not start if these are not supplied.

The last item that should be mentioned is regarding versioning. Most module sources that work off of source control have a versioning parameter you can supply to get a revision or tag – with Git and GitHub sources, this is ref, which can translate to most Git references, be it a branch, or tag.

Versioning is a great way to keep things under control. You might find yourself iterating very fast on certain modules as you learn more about Terraform or your internal infrastructure design patterns change – versioning your modules ensures that you don’t need to constantly refactor otherwise stable stacks.

Module Tips and Tricks

Terraform and HCL is a work in progress, and there may be some things that seem like they may make sense that don’t necessarily work 100% – yet. There are some things that you might want to keep in mind when you are designing your modules that may reduce the complexity that ultimately gets presented to the user:

Use Data Sources

Terraform 0.7+’s data sources feature can go a long way in reducing the amount of data needs to go in to your module.

In this project, data sources are used for things such as obtaining VPC IDs from subnets (aws_subnet) and getting the security groups assigned to an ALB (using the aws_alb_listener and aws_alb data sources chained together). This allows us to create ALBs based off of subnet ID alone, and attach auto-scaling groups to ALBs with knowing only the listener ARN that we need to attach to.

Exploit Zero Values and Defaults

Terraform follows the rules of the language it was created in regarding zero values. Hence, most of the time, supplying an empty parameter is the same as supplying none at all.

This can be advantageous when designing a module to support different kinds of scenarios. For example, the alb module supports TLS via supplying a certificate ARN. Here is the variable declaration:

And here it is referenced in the listener block:

Now, when this module parameter is not supplied, its default value becomes an empty string, which is passed in to aws_alb_listener.alb_listener. This is, most times, exactly the same as if the parameter is not passed in at all. This allows you to not have to worry about this parameter when you just want to use HTTP on this endpoint (the default for the ALB module as a whole).

Pseudo-Conditional Logic

Terraform does not support conditional logic yet, but through creative use of count and interpolation, one can create semi-conditional logic in your resources.

Consider the fact that the terraform_aws_autoscaling module supports the ability to attach the ASG to an ALB, but does not explicit require it. How can you get away with that, though?

To get the answer, check one of the ALB resources in the module:

Here, we make use of the map interpolation function, nested in a lookup function to provide essentially an if/then/else control structure. This is used to control a resource’s instance count, adding an instance if var.enable_albis true, and completely removing the resource from the graph otherwise.

This conditional logic does not necessarily need to be limited to count either. Let’s go back to the aws_alb_listener.alb_listener resource in the ALB module, looking at a different parameter:

Here, we are using this trick to supply the correct SSL policy to the listener if the listener protocol is not HTTP. If it is, we supply the zero value, which as mentioned before, makes it as if the value was never supplied.

Module Limitations

Terraform does have some not-necessarily-obvious limitations that you will want to keep in mind when designing both modules and Terraform code in general. Here are a couple:

Count Cannot be Computed

This is a big one that can really get you when you are writing modules. Consider the following scenario that totally did not happen to me even though I knew of of such things beforehand 😉

  • An ALB listener is created with aws_alb_listener
  • The arn of this resource is passed as an output
  • That output is used as both the ARN to attach an auto-scaling group to, and the pseudo-conditional in the ALB-related resources’ count parameter

What happens? You get this lovely message:

value of 'count' cannot be computed

Actually, it used to be worse (a strconv error was displayed instead), but luckily that changed recently.

Unfortunately, there is no nice way to work around this right now. Extra parameters need to be supplied or you need to structure your modules in way that avoids computed values being passed into count directives in your workflow. (This is pretty much exactly why the terraform_aws_asg module has a enable_alb parameter).

Complex Structures and Zero Values

Complex structures are not necessarily good candidates for zero values, even though it may seem like a good idea. But by defining a complex structure in a resource, you are by nature supplying it a non-zero value, even if most of the fields you supply are empty.

Most resources don’t handle this scenario gracefully, so it’s best to avoid using complex structures in a scenario where you may be designing a module for re-use, and expect that you won’t be using the functionality defined by such a structure often.

The Application in Brief

As our focus in this article is on Terraform modules, and not on other parts of the pattern such as using Packer or Chef to build an AMI, we will only touch up briefly on the non-Terraform parts of this project, so that we can focus on the Terraform code and the AWS resources that it is setting up.

The Gem

The Ruby gem in this project is a small “hello world” application running with Sinatra. This is self-contained within this project and mainly exists to give us an artifact to put on our base AMI to send to the auto-scaling group.

The server prints out the system’s hostname when fetched. This will allow us to see each node in action as we boot things up.

Packer

The built gem is loaded on to an AMI using Packer, for which the code is contained within packer/ami.json. We use chef-solo as a provisioner, which works off a self-contained cookbook named packer_payload in the cookbooks directory. This allows us a bit more of a higher-level workflow than we would have simply with shell scripts, including the ability to better integration test things and also possibly support multiple build targets.

Note that the Packer configuration takes advantage of a new Packer 0.12.0 feature that allows us to fetch an AMI to use as the base right from Packer. This is the source_ami_filter directive. Before Packer 0.12.0, you would have needed to resort to a helper, such as ubuntu_ami.sh, to get the AMI for you.

The Rakefile

The Rakefile is the build runner. It has tasks for Packer (ami), Terraform (infrastructure), and Test Kitchen (kitchen). It also has prerequisite tasks to stage cookbooks (berks_cookbooks), and Terraform modules (tf_modules). It’s necessary to pre-fetch modules when they are being used in Terraform – normally this is handled by terraform get, but the tf_modules task does this for you.

It also handles some parameterization of Terraform commands, which allows us to specify when we want to perform something else other than an apply in Terraform, or use a different configuration.

All of this is in addition to standard Bundler gem tasks like build, etc. Note that install and release tasks have been explicitly disabled so that you don’t install or release the gem by mistake.

The Terraform Modules

Now that we have that out of the way, we can talk about the fun stuff!

As mentioned at the start of the article, This project has 4 different Terraform modules. Also as mentioned, one of them (the Security Group module) is hidden from the end user, as it is consumed by two of the parent modules to create security groups to work with. This exploits the fact that Terraform can, of course, nest modules within each other, allowing for any level of re-usability when designing a module layout.

The AWS VPC Module

The first module, terraform_aws_vpc, creates not only a VPC, but also public subnets as well, complete with route tables and internet gateway attachments.

We’ve already hidden a decent amount of complexity just by doing this, but as an added bonus, redundancy is baked right into the module by distributing any network addresses passed in as subnets to the module across all availability zones available in any particular region via the aws_availability_zones data source. This process does not require previous knowledge of the zones available to the account.

The module passes out pertinent information, such as the VPC ID, the ID of the default network ACL, the created subnet IDs, the availability zones for those subnets as a map, and the ID of the route table created.

The ALB Module

The second module, terraform_aws_alb allows for the creation of AWS Application Load Balancers. If all you need is the defaults, use of this module is extremely simple, creating an ALB that will answer requests on port 80. A default target group is also created that can be used if you don’t have anything else mapped, but we want to use this with our auto-scaling group.

The Auto Scaling Module

The third module, terraform_aws_asg, is arguably the most complex of the three that we see in the sample configuration, but even at that, its required options are very slim.

The beauty of this module is that, thanks to all the aforementioned logic, you can attach more than one ASG to the same ALB with different path patterns (mentioned below), or not attach it to an ALB at all! This allows this same module to be used for a number of scenarios. This is on top of the plethora of options available to you to tune, such as CPU thresholds, health check details, and session stickiness.

Another thing to note is how the AMI for the launch configuration is being fetched from within this module. We work off the tag that we used within Packer, which is supplied as a module variable. This is then searched for within the module via an aws_ami data source. This means that no code or variables need to change when the AMI is updated – the next Terraform run will pick up the most recent AMI with the tag.

Lastly, this module supports the rolling update mechanism laid out by Paul Hinze in this post oh so long ago now. When a new AMI is detected and the auto-scaling group needs to be updated, Terraform will bring up the new ASG, attach it, wait for it to have minimum capacity, and then bring down the old one.

The Security Group Module

The last module to be mentioned, terraform_aws_security_group, is not shown anywhere in our example, but is actually used by the ALB and ASG modules to create Security Groups.

Not only does it create security groups though – it also allows for the creation of 2 kinds of ICMP allow rules. One for all ICMP, if you so choose, but more importantly, allow rules for ICMP type 3 (host unreachable) are always created, as this is how path MTU discovery works. Without this, we might end up with unnecessarily degraded performance.

Give it a Shot

After all this talk about the internals of the project and the Terraform code, you might be eager to bring this up and see it working. Let’s do that now.

Assuming you have the project cloned and AWS credentials set appropriately, do the following:

  • Run bundle install --binstubs --path vendor/bundle to load the project’s Ruby dependencies.
  • Run bundle exec rake ami. This builds the AMI.
  • Run bundle exec rake infrastructure. This will deploy the project.

After this is done, Terraform should return a alb_hostname value to you. You can now load this up in your browser. Load it once, then wait about 1 second, then load it again! Or even better, just run the following in a prompt:

while true; do curl http://ALBHOST/; sleep 1; done

And watch the hostname change between the two hosts.

Tearing it Down

Once you are done, you can destroy the project simply by passing a TF_CMD environment variable in to rake with the destroy command:

TF_CMD=destroy bundle exec rake infrastructure

And that’s it! Note that this does not delete the AMI artifact, you will need to do that yourself.

More Fun

Finally, a few items for the road. These are things that are otherwise important to note or should prove to be helpful in realizing how powerful Terraform modules can be.

Tags

You may have noticed the modules have a project_path parameter that is filled out in the example with the path to the project in GitHub. This is something that I think is important for proper AWS resource management.

Several of our resources have machine-generated names or IDs which make them hard to track on their own. Having a easy-to-reference tag alleviates that. Having the tag reference the project that consumes the resource is even better – I don’t think it gets much clearer than that.

SSL/TLS for the ALB

Try this: create a certificate using Certificate Manager, and change the alb module to the following:

Better yet, see the example here. This can be run with the following command:

And destroyed with:

You now have SSL for your ALB! Of course, you will need to point DNS to the ALB (either via external DNS, CNAME records, or Route 53 alias records – the example includes this), but it’s that easy to change the ALB into an SSL load balancer.

Adding a Second ASG

You can also use the ASG module to create two auto-scaling groups.

There is an example for the above here. Again, run it with:

And destroy it with:

You now have two auto-scaling groups, one handling requests for /foo/*, and one handling requests for /bar/*. Give it a go by reloading each URL and see the unique instances you get for each.

Acknowledgments

I would like to take a moment to thank PayByPhone for allowing me to use their existing Terraform modules as the basis for the publicly available ones at https://github.com/paybyphone. Writing this article would have been a lot more painful without them!

Also thanks to my editors, Anthony Elizondo and Andrew Langhorn for for their feedback and help with this article, and the AWS Advent Team for the chance to stand on their soapbox for my 15 minutes! 🙂

About the Author:

picture of author Chris MarchesiChris Marchesi (@vancluever) is a Systems Engineer working out of Vancouver, BC, Canada. He currently works for PayByPhone, designing tools and patterns to help its engineers and developers work with AWS. He is also a regular contributor to the Terraform project. You can view his work at https://github.com/vancluever, and also his previous articles at https://vancluevertech.com/.

About the Editors:

Andrew Langhorn is a senior consultant at ThoughtWorks. He works with clients large and small on all sorts of infrastructure, security and performance problems. Previously, he was up to no good helping build, manage and operate the infrastructure behind GOV.UK, the simpler, clearer and faster way to access UK Government services and information. He lives in Manchester, England, with his beloved gin collection, blogs at ajlanghorn.com, and is a firm believer that mince pies aren’t to be eaten before December 1st.

Anthony Elizondo is a SRE at Adobe. He enjoys making things, breaking things, and burritos. You can find him at http://twitter.com/complexsplit


Amazon Virtual Private Cloud

04. December 2012 2012 0

Amazon Virtual Private Cloud

Amazon Virtual Private Cloud (VPC) is a service which allows you to create an isolated, private network within an AWS region where you can run and use a variety of other AWS resources. You’re able to create a variety of private IP space subnets and build routes and security policies between them to fully host a multi-tier application within AWS while maintaining isolation from other AWS customers.

How do I build a VPC?

A VPC is built from a number of parts

  1. The VPC object: which you declare with a name and a broad private network space. (You can define 5 VPCs in a single region)
  2. 1 or more subnets: which are segments of the VPC IP space
  3. An Internet Gateway (IG): which connects your VPC to the public Internet via a NAT Instance
  4. NAT Instance: an Amazon managed EC2 instance that provides NAT services to your VPC
  5. Router: the router is a VPC service that performs routing between subnets with your user defined route tables

Optionally you can setup IPSec VPN tunnels which you terminate on your hardware in a DC or home network.

VPC supports four options for its network architecture.

  1. VPC with a Public Subnet Only
  2. VPC with Public and Private Subnets
  3. VPC with Public and Private Subnets and Hardware VPN Access
  4. VPC with a Private Subnet Only and Hardware VPN Access

Further Reading

AWS services you can use inside a VPC

A number of AWS services provide you with instance based resources, and you’re able to run those resources inside your VPC. These include

ELB

ELB instances are able to function inside VPCs in two ways

  1. They are able to create interfaces inside your VPC subnets and then send traffic to EC2 instances inside your VPC
  2. An ELB instance can be created with an internal IP in a VPC subnet. This is useful if for load balancing between internal tiers of your application architecture

Further Reading

EC2

All classes of EC2 instances are available to deploy inside your VPC.

Availability Zone placement of EC2 instances can be controlled by which subnet you place your EC2 instance(s) into.

Further Reading

RDS

All classes and types of RDS instances are available to deploy inside your VPC.

Further Reading

Auto Scaling

You’re able to use Auto Scaling to scale EC2 instances inside your VPC, in conjunction with ELB instances.

Further Reading

Networking inside your VPC

Your VPC is divided into a set of subnets. You control traffic between subnets and to the Internet with two necessary things and one optional.

The required things are route tables and security groups.

A route table defines a subnet and a destination, which can be an instance ID, a network interface ID, or your Internet gateway.

A security group acts like a firewall and is associated with a set of EC2 instances. You define two sets of rules, based on TCP/UDP/ICMP and ports, one for ingress traffic and one for egress traffic. Security group rules are stateful.

Optionally, you can use Network ACLsto control your TCP/UDP/ICMP traffic flow at the subnet layer. Rules defined in Network ACLs are not stateful, as so your rules must match up for ingress and egress traffic of a given service (e.g. TCP 22/SSH) to function.

Further Reading

Some limitations of using VPCs

As with any product, VPC comes with some limitations. These include:

  • You can only create five VPCs in a single AWS region
  • You need to create a VPN tunnel or attach an Elastic IP (EIP) to get to instances, each if which has associated costs.
  • You can only create 20 subnets per VPC
  • You can only create 1 Internet Gateway per VPC

Further Reading

Cost

Your VPC(s) do not cost anything to create or run. Additionally, subnets, security groups, and network ACLs are also free.

There will be costs associated with how you choose to access your instances inside your VPC, be that a VPN solution or using Elastic IPs.

All other AWS services cost the same whether you run those instances inside a VPC or outside.

Further Reading

Summary

In summary, VPCs provide an easy way to isolate application infrastructure, while still using a variety of AWS resources. With a little additional configuration, you’re able to take advantage of the VPC service.