The Largest Cloud Providers Hardly Need Bare Metal Like They Do VMs
The cloud-native world largely takes its lead from the major cloud players when it comes to how to run its infrastructure. So what are the big providers doing with bare metal?
The Kubernetes services from AWS, GCP, Azure or others share a common pattern: Their public Kubernetes offerings are running mostly on virtual machines, not bare metal.
Public cloud providers, also known as hyperscalers, are running billions of containers worldwide. This sets the stage for further advances in computing and AI, built largely on a containerized infrastructure in both private and public clouds.
The decision by cloud providers to place their colossal bets on VMs shows what they really think is the best architecture to support their most important services. This choice runs counter to the prevailing wisdom of running containers on bare metal from over a decade ago. However, with the need for multiple workloads, more security and stricter SLAs, scalable and on-demand VMs became their primary building blocks.
A couple of decades ago, bare metal was seen as the way to achieve the best performance and access to computing resources. The idea persists that CPU-intensive workloads on bare metal have the full resources at their disposal versus workloads running in a VM, while potentially competing for the same resources shared with other VMs.
But since then, we have seen way more complexity, with the need for multiple workloads, more security, stricter SLAs, and the shift to a cloud operating model. CIOs and architects thus have to decide if bare metal is still the right approach. Organizations committed to bare metal infrastructure for the public and private cloud need to rethink their bare metal strategies as the disadvantages now largely outweigh the advantages for the majority of use cases. They can also look to cloud vendors for guidance for their public or private cloud operations when deciding between running containers either on VMs or bare metal.
Pedal On Less Metal
The concept of using bare metal for superior performance when running containerized applications gained currency over a decade ago. Then, the general feeling was that running everything on bare metal was simpler, with the biggest advantage being performance with less overhead.
The rise of cloud computing itself owes much of its adoption and success to virtualization. In the case of VMs, this comes down to the ability to share a single server’s resources among separate virtual machines. JJ Geewax notes in Google Cloud Platform in Action, one of the most respected and recognized resources on Google Cloud and cloud computing, that virtualization largely trumps any of the advantages that bare metal offers in the vast majority of cases. Google’s approach largely favors VMs for its internal infrastructure as well. Google, as Geewax explains, uses the same VM-based infrastructure in-house to run its services. Applications such as Gmail and YouTube rely on VMs.
And, Geewax notes, customers who purchase one vCPU worth of capacity on Google Compute Engine should receive the same number of computing cycles regardless of what other VMs on the same host are doing, and regardless of the workloads running at any given time. In a nutshell, whether a workload is running on bare metal or not has, for the most part, become a non-issue.
That’s because today, the containers in the VM-based model allow enforcement of hard resource limits for CPU, memory and disk resources better than bare metal.
The approach gives guaranteed performance and helps eliminate the noisy neighbor problem. SLAs represent hard limits for VMs on which the containers are running. A bare metal environment cannot offer this sort of SLA when unvirtualized applications are running side by side with other applications on a single machine.
Moreover, the bare metal server’s hardware determines the CPU- and memory-usage limits of an application and cannot be adjusted without an additional abstraction layer. In this way, VMs account for real-time adjustable capacity as resources are consumed as opposed to always having to build extra redundancy into bare metal in case of resource-usage surges.
Benchmarks have reportedly shown containers in VMs turning in performances equal to or better than bare metal. With VMs, organizations typically have the freedom to select the resources and compute performance for the workloads they require. They do not have to worry about potential performance advantages bare metal might offer to meet their performance needs.
According to the providers’ documentation, AWS, Microsoft Azure, GCP and DigitalOcean are all built upon virtual machine (VM) technology. These services — AWS Elastic Compute Cloud (EC2), Azure Virtual Machines, GCP Compute Engine, and DigitalOcean Droplets — represent the core Infrastructure-as-a-Service (IaaS) offerings, providing scalable and on-demand VMs.
The hyperscalers exhibit big differences in their underlying hypervisor implementations, representing one way the cloud providers continually invest in inventing new ways to improve the performance of their hypervisors for containerized environments.
AWS employs its custom-engineered Nitro System. Microsoft Azure standardizes on its proprietary Hyper-V platform, creating a seamless ecosystem for its vast enterprise customer base.
Notably, both AWS and GCP have made significant internal modifications to their hypervisors to optimize for their respective infrastructures.
The Hybrid Game
Bare-metal use cases do exist for high-performance computing. For example, organizations conducting advanced, data-intensive scientific experiments might require bare metal for these types of workloads.
At the same time, licensing agreements—such as those with Oracle—are based on CPU licenses and may therefore require bare metal. Some U.S. government contracts also include similar requirements.
At ReveCom, we recommend a hybrid approach, carefully assessing the resources for your organization’s specific needs. While bare metal can be leveraged for highly specific, performance-intensive use cases, the most flexible and scalable solution for the vast majority of workloads is running containers on VMs. This strategy combines the agility of containers with the robust management and reliable abstractions that virtualization provides. It’s not an either-or consideration, but rather a hybrid approach that leverages the right platform for the right use case. Given the ongoing hypervisor developments from cloud providers, this powerful combination offers a reliable foundation for modern infrastructure.
For that reason, outside of exceptional use cases, we recommend the abstractions that VMs provide — whether on a public cloud, private cloud, or on-premises infrastructure. Cloud providers and ongoing hypervisor developments will continue to offer a reliable path to follow for organizational adoption when running containers on VMs.