Thinking About EFS? Ten Reasons It May Not Be a Fit
Is “Elastic” Enough?
Before we talk about why Amazon’s Elastic File System (EFS) might not be a fit in some environments, let’s discuss why it might. EFS is very good at what it was designed to do. It was designed to be a simple, scalable, elastic file store for Linux clients running NFS 4. AWS describes several use cases for this functionality in their EFS overview, including big data analytics, web serving and content management, application test and development, media and entertainment, and database backup. If your target application is Linux-only and fits somewhere in the list of use cases, EFS may well be what you’re looking for.
On the other hand, what EFS was not designed to be, as many EFS users have discovered, is a fully-featured, multi-protocol enterprise file system. Also, it’s not cheap.
So, what are the key feature gaps in EFS? Here’s my “Top 10 (plus a few)” list:
- Support for native windows file systems (SMB/CIFS): Amazon is quite clear that EFS support is limited to Linux clients and NFS 4. Windows clients are not supported, even if they’re running the NFS client (this isn’t Amazon’s fault; the Windows NFS client is, uh… “problematic” running NFS v4).
- Support for Active Directory (AD): This is related-but-separate from SMB/CIFS support and is particularly problematic for organizations that deploy AD for enterprise-wide authentication. Since EFS doesn’t support AD, adopting EFS means that (at least) a subset of the permissions structure needs to be duplicated into NFS-style for use with EFS. And then (and here’s the kicker), the duplicated permissions need to be kept current with any changes to the underlying AD structure.
- File system quotas: Independent of client OS-specifics, many organizations use file system quotas to manage unstructured data “sprawl” in their file systems, and EFS does not support file system quotas. So, if you deploy EFS, you’ll need another tool.
- Flexible local data replication: Making application-consistent and/or file system consistent copies of data to enable backup, test & dev, data analytics, and a host of other uses has been a mainstay of enterprise storage for something like twenty years. EFS does not offer any native snapshot or clone capability.
- Remote replication: Similarly, the capability to replicate data between geographically distributed data centers in order to provide availability in case of regional disasters (e.g., hurricanes, earthquakes, tornados, floods, etc.) has also been a critical component of enterprise business continuity planning for decades, but this functionality isn’t available in EFS either. And that’s pretty ironic, considering what ubiquitous cloud computing has done for the affordability of effective disaster recovery planning.
- User-managed encryption keys: I won’t belabor how important data security is in this piece, but one key (if you’ll pardon the pun) to security is to “trust no one.” Everyone agrees that both in-flight and at-rest data needs to be encrypted but, following the “trust no one” adage, when the user control encryption keys, even “bad actors” with physical access to storage infrastructure can’t access secure data. Most storage solutions, including EFS, only support vendor-managed encryption keys.
- Non-disruptive, dynamic volume migrations: Think of this as “Whoops!” insurance. E.g., what if you realize that you need to change volumes after they’re already in use? You’d really like not to have to take the affected volume(s) down to make corrections but, like most cloud file systems, the only solution that EFS provides, in this case, requires provisioning new storage and running a user-managed migration. Whoops.
- Predictable performance immune from “noisy neighbors”: Amazon goes to considerable lengths to minimize the effects of multiple tenants running multiple workloads on shared infrastructure inevitably have on each other. But the simple fact is that environments based on shared resources, like EFS, are potentially subject to “noisy neighbors” scenarios.
- Hybrid Cloud: This really comes down to support for simultaneous file system access from applications running in EC2 instances and applications running on-premises. EFS does support on-premises access to EFS via Direct Connect, but the use cases Amazon discusses are focused on copying data back-and-forth between on-premises and AWS. That’s not simultaneous access; processing data on-premises OR in the cloud is not the same as processing data on-premises AND in the cloud.
- Multi-Cloud: The more experience that organizations gain with public cloud computing, the more important avoiding cloud provider lock-in and minimizing “wasted cloud spend” becomes. Given that EFS doesn’t even support simultaneous access between on-premises and AWS instances, you feel pretty confident that it doesn’t (and never will) support data sharing between AWS and, say, Azure or Google Cloud Platform. So, even if EFS meets all your other requirements, If you ever want to simultaneously present your file system(s) to multiple public cloud environments, EFS isn’t the solution for you.
- Data availability across Virtual Private Clouds: AWS Virtual Private Clouds (VPCs) are exactly what they sound like: multiple private clouds belonging to a single organization, hosted by AWS. There are some reasons that enterprises choose to operate multiple VPCs, analogous to operating multiple on-premises private clouds, including (but in no way limited to) supporting multiple functional groups, business units, subsidiaries, etc. There are also technical requirements in some deployments, e.g., VMware Cloud on AWS, that may require multiple VPCs. If your organization deploys, or might in the future deploy, multiple VPCs and need to share file system data between them, understand that EFS doesn’t support that.
- Flexible Storage Media: If you’re only going to support one storage medium, EFS is certainly correct to support only flash. But I already mentioned, “not cheap,” right? On the other hand, however, if we can match application requirements to storage media, we can optimize price/performance to application requirements. And if we have Non-disruptive, dynamic volume migrations (see #7, above), we can always reconfigure the volumes to different media if the requirements change.
My final point is more speculative. With the announcement of FSx for Windows and AWS’s embrace of Samba to enable Linux connectivity to FSx for Windows, one could be forgiven for wondering how Amazon views EFS’s long-term prospects.
The bottom line is that EFS was designed for the “80” part of the old 80/20 rule; it is fit-for-purpose for recommended applications. The top-level mismatch between EFS and an industry understanding of enterprise file systems is that enterprise file systems are expected to cover ninety-nine-plus percent of enterprise use cases.
And that’s just not what EFS was designed for.
About the Author
Marc Leavitt, Senior Director of Product Marketing at Zadara (www.zadara.com), has more than twenty years experience developing, architecting, selling, and marketing enterprise storage solutions for companies like EMC, Brocade, Western Digital, and QLogic.
Marc is a graduate The University of California, Berkeley and currently resides in Irvine, California.