Choosing a Managed Kubernetes Provider
Kubernetes helps with scaling, deploying and managing containerized workloads, facilitating a faster deployment cycle and configuration management—all while providing improved access control. Kubernetes is also a Cloud Native Computing Foundation (CNCF) project, meaning it’s cloud-native and can be easily deployed through any cloud provider.
This article will compare on-premises, or self-hosted, Kubernetes clusters to managed ones, as well as outline your options for Kubernetes in the cloud. To do this, we’ll look at ease of use and setup, custom node support, cost, release cycles, version support and more factors to consider when choosing the right managed Kubernetes provider for your needs.
Managed vs. Self-Hosted Kubernetes
Building and maintaining infrastructure requires both experienced engineers and domain experts. But not every organization can assemble such a dream team. Domain experts are rare, and almost all of them are already happily employed elsewhere.
So, when choosing between managed and self-hosted Kubernetes, here are the main points you’ll need to take into consideration.
Cost of Resource Management
Self-managed Kubernetes means you’re running the Kubernetes installations either in your data center or on virtual machines in the cloud. This entails a separate cost attached to the machines used to run your control plane, meaning you’ll have to plan for high availability and disaster recovery on your own. You’ll also have to set up automation to scale the nodes along with dependencies and provision your network for increased load.
Manpower
As noted above, self-managed Kubernetes requires a big team that understands the intricacies of deployment of different Kubernetes components. Your team will need to be able to handle etcd, the control plane, nodes, the container network interface (CNI), service mesh and smaller components like role-based access control (RBAC).
With managed Kubernetes, you don’t have to manage etcd or control planes, and many managed Kubernetes services oversee the CNI and service mesh for you. This makes it a lot easier for a small team (say, of three to five people) to handle a Kubernetes cluster than if you went with self-managed clusters.
Upgrades and Updates
Upgrading clusters is a big undertaking and can take a lot of time if you’re handling it yourself—including planning and researching changes in the next Kubernetes version as well as taking into account components or APIs that are deprecated.
Managed Kubernetes, on the other hand, can be easily upgraded in one or two steps. You don’t need to take care of an etcd backup or make sure you individually upgrade control plane nodes to maintain high availability. All of this is taken care of for you by your cloud provider.
That said, relying on your provider to release a patched version of Kubernetes can cause a delay in fixing vulnerabilities, like in the case of the runC vulnerability in 2018. Meanwhile, the release cycle for a new Kubernetes version takes six to nine months.
Cost of Changing Cloud Providers
Moving between cloud providers costs more than money. You might also incur the ‘costs’ of decreased performance and reliability. It’s important to consider which provider will best meet your needs so you can avoid these peripheral costs. If you’re managing your infrastructure as code, there will be changes required for this migration, as well.
In-House Expertise
If you’re thinking about going with cloud-agnostic Kubernetes clusters, you may consider managing them yourself, as this will give you the flexibility to move clusters across clouds since you aren’t depending on any cloud resources except the underlying machine. You have to choose your CNI very carefully, though, as not all CNIs work directly on all clouds.
Managing your own Kubernetes will require in-house Kubernetes experts who can dig deeper into issues and find a resolution.
Given the above considerations, a managed Kubernetes cluster is preferable to a self-managed one if:
- You want to stay on one cloud
- You don’t need the latest Kubernetes releases
- You’re ready to offload patching vulnerabilities to your cloud provider
You’ll find managed Kubernetes solutions are easier to upgrade and highly available and—most importantly—you’ll have support from your cloud provider. There are many managed Kubernetes options to choose from; we’ll take a look at the main ones below.
Primary Kubernetes Providers
- EKS: AWS launched their public Kubernetes offering, Elastic Kubernetes Service, in June of 2018.
- AKS: Microsoft Azure’s Kubernetes offering, Azure Kubernetes Service, was released in early 2019. AWS is known to be reliable and have more-performant machines.
- GKE: Google Kubernetes Engine was released in 2015 and is one of the most advanced Kubernetes offerings because Google works closely with the Kubernetes development community.
- OpenShift: Red Hat OpenShift is used to manage and orchestrate containers with Kubernetes running as its core. Red Hat is known for security and provides additional functionality, such as creating deployment pipelines and supporting a variety of features.
- VMware Tanzu: Tanzu was introduced as a preview in Aug 2019 and later released for general availability in 2020. Tanzu is a set of tools and products used to run and manage Kubernetes clusters and comes with a user interface (UI) called Mission Control to manage the cluster and supports CI/CD integration with the concourse platform provided by VMware.
Considerations for Choosing a Kubernetes Provider
Below are the key factors you should consider when looking into managed Kubernetes clusters.
Ease of Use
In most cases, you’ll be using kubectl from the command line instead of the GUI. Amazon EKS has an intuitive, very easy-to-use UI that gives you only the options you need. AKS and GKE, on the other hand, have a lot of options that you may not use. OpenShift and Tanzu come with a custom UI for a better developer experience.
Generally speaking, deploying from the UI is discouraged. You typically wrap your deployments with pipelines that are supposed to deploy on Kubernetes, so having a UI doesn’t add a lot of value.
On GKE, it’s easier to manage Day-2 operations, and you have the unique option of config sync, allowing you to sync your Kubernetes cluster from a git repository.
Ease of Setup
All of the providers noted above are easy to set up.
Generating a kubeconfig file in GKE and getting started is especially easy. With AKS and EKS, you have to make changes in the identity and access management solution (IAM), then add the user to the cluster. Google provides a few options for deploying different CNI plugins and a service mesh, while AWS and AKS are beginners in this area. However, you can always deploy CNI and service mesh solutions on top of your clusters—if you’re willing to manage them yourself. Finally, OpenShift comes packed with its own components and uses Istio as its service mesh. VMware Tanzu also has its own CNI and service mesh.
Connectivity to Other Cloud Resources
Each of the providers has great integration with their other cloud offerings, like RDS or storage, through an internal network of various resources. AWS, for example, provides the most reliable platforms, while Azure offers a lot of configuration options.
It’s also good to note that AWS uses Layer 2 networking, while AKS and GKE use Layer 3 networking. So, if you’re planning to use CNI, which uses BGP for routing, you may face issues in GKE and AKS.
Native Plugin Support
- Service Mesh
- EKS: Has its own mesh, known as App Mesh, which is simply a wrapper over Istio
- AKS: Offers an Open Service Mesh in preview phase
- GKE: Officially supports Istio
- OpenShift: Supports Istio by default
- VMware Tanzu: Has its own service mesh built on its NSX platform
- CNI and Networking
- EKS: Offers its own VPC CNI plugin. It can run in island mode with the help of security groups and egress by using NAT. All pods are connected to each other directly, just like nodes. You can allow access to pods from inside the VPC with security groups.
- AKS: Offers Azure VPC CNI and kubenet. Using NSGs will allow you to control the networking and separate the cluster from the external network. If not restricted by an NSG, all pods can be connected with the IPs, as these are assigned from the same VPC network space.
- GKE: Google is collaborating with Isovalent to offer Cilium as a CNI. Each IP is assigned to pods from the VPC network space. Outbound traffic is passed through NAT, while internal traffic among pods and nodes can be controlled with network policies.
- OpenShift: The OpenShift SDN is deployed by default but has support for third-party CNIs out of the box, like flannel. It has its own ingress controller and uses ingress and egress with OpenShift Routes to control inbound and outbound traffic. The OpenShift router uses HAProxy to implement the routes.
- VMware Tanzu: This comes with Antrea, which is designed to be used with Open vSwitch. You can run it in island mode, and it features support for ingress through third-party controllers like nginx. Egress can be controlled with a network policy.
The visualization of network and microservice flows will depend on what CNI you’re using or if you’re running a service mesh. Most service meshes provide these functionalities. If you want to visualize these flows using your cloud provider’s capabilities, you can activate a flow log service in your VPC.
Cost
Due to the extra features it provides, OpenShift is the most expensive among the options listed here. AKS and EKS are comparable in price, while GKE is cheaper than the others. This is thanks to GKE’s sustained use discount feature, which means their machines become cheaper the more you use the same VM in a billing cycle. You can see a more detailed comparison of costs in Figure 1, below.
Custom Worker Nodes
AKS does not support custom worker nodes and has no plans to include support for them. GKE and OpenShift don’t offer it either. In fact, such nodes are restricted in GKE, meaning you cannot create a node that has basic tooling or software, like installing antivirus and then using it as a Kubernetes node. On the other hand, EKS has very good support for custom worker nodes and VMware Tanzu supports them, as well.
Ease of Upgrade
Node upgrades are automated in GKE and AKS, while you have to manually update the worker nodes in EKS. AKS also provides a manual option for node replacement. When you upgrade on GKE, you don’t have control over how, or in what order, nodes are upgraded, thus your workload will be affected.
If the workload is stateful, this can cause issues. Manual node upgrades give you control over your own node replacement, so you can move your workload before taking the VM down for an upgrade.
The table below provides a convenient reference for different aspects of the primary managed Kubernetes solutions.
Parameters |
EKS |
AKS |
GKE |
OpenShift |
VMware Tanzu |
Upgrade Cycle |
3 months |
||||
Cost |
$0.10 + cost of standard machine and resources used |
Cost of standard machine and resources used |
$0.10 + cost of standard machine and resources used |
$0.17 on most options, whether you use AWS, Azure or IBM Cloud, + cost of standard machine and resources |
$995 per CPU per year |
Nodes |
Can add custom nodes |
Predefined node images present |
Predefined node images present |
Predefined node images present |
Supports custom nodes |
Continuous Integration |
No |
No |
No |
Yes |
Yes (with vSphere Concourse) |
Continuous Delivery |
No |
No |
No |
Yes |
Yes (with vSphere concourse) |
CNI and service mesh |
Minimal support |
Minimal support |
Natively supports CNI and service mesh |
Supports service mesh |
Tanzu Service Mesh |
Log and metric collection |
Present, off by default |
Present, off by default |
Present, off by default |
EFK stack is present |
No built-in integration, supports Prometheus and Grafana |
Financial backing |
Yes |
Yes |
Yes |
N/A |
N/A |
Max nodes per cluster |
30*450 (max node group multiplied by max nodes per group) |
||||
Max pods per node |
|||||
Underlying networking |
Layer 2 |
Layer 3 |
Layer 3 |
Depends on where it is running |
Depends on where it is running |
Figure 1: Comparison of the top managed Kubernetes services
Which Managed Kubernetes Offering is Right for You?
Kubernetes clusters take time and manpower to set up. That might mean a few minutes in the cloud or hours in a self-hosted version of Kubernetes. Ultimately, your ability to handle massive open source projects is the biggest factor to consider when choosing managed over on-premises Kubernetes. If you choose to go it alone, you’ll need experts on your team who can handle such a large undertaking. Separately managing etcd, upgrades, high availability and reliability requires far more expertise than running a managed Kubernetes cluster.
When it comes to managed solutions, there are a number of options depending on what you can afford and what your needs are. If your budget allows and you want many features already deployed, OpenShift is the best option. VMware Tanzu is also a great choice if you’re considering CI/CD, are worried about security and are already using VMware products. You should, however, note that you cannot run more than 150 nodes in VMware Tanzu Kubernetes clusters. If you’re looking to play with networking, EKS’ Layer 2 networking makes it a good choice, and EKS also offers more pods per node and a lot of nodes. On the other hand, AWS has world-class support and GKE may offer substantial savings.
Handling self-managed Kubernetes is a huge task. When you combine it with demands like security and compliance, it grows exponentially harder to manage. These cloud providers implement security with pre-defined constructs like security groups, firewalls and subnet and VNet segregation. For security and auditing, you can deploy tools like Gatekeeper, Falco and Wazuh.