How Many Kubernetes Clusters Exist Today?

As we considered our 2023 development plans, we wanted to formalize our gut instincts on the state of Kubernetes deployments into numbers backed by research. There are a lot of questions in our head, and the biggest of them all is, ‘How many Kubernetes clusters exist today, and how many are in public clouds versus on-premises?’

One challenge of open source is that you will never know the exact numbers, and estimations are, unfortunately, more an art than a science. Even the Docker Hub pull counts for some open source technology and containers are unreliable indicators because of the ephemeral nature of the Kubernetes ecosystem; clusters and applications are cremated and redeployed all the time. For example, the most popular Kubernetes backup solution, Velero, has over 50M+ Docker pulls. Between various versions and deployments, it is incredibly hard to estimate how many clusters constitute all that activity. Despite that, we strongly believe Docker Hub could become the source of the best free market research and buying intent data in the open source community today.

One disclaimer before you read further is that this analysis may be deep for some and shallow for others. Our goal was to identify a ballpark estimate so that we could build a business plan based on some hard data and not rely on our gut instincts to qualify whether this ecosystem is still nascent or whether a sizable market exists.

Where Do We Begin?

Our starting point is to rely on services such as Shodan and Censys that quantify how many Kubernetes clusters are exposed to the internet. They probe and scan the internet to find endpoints of interest. The search is simple to use and available for free. At the time of this blog, a basic search on Shodan showed ~900,000 clusters and Censys showed ~1.25 million clusters.

Kubernetes clusters

We won’t get into the merits of whether exposing these endpoints to the internet is truly a bad practice. The reason we see such large numbers is because exposed or open host ports are the default configuration in EKS, AKS and GKE, among other cloud providers.

How Many Kubernetes Clusters Run in the Cloud?

Interestingly, based on Censys data, almost 95% of these clusters were accounted for by the popular cloud providers worldwide. GKE has over 500,000 clusters, AWS has over 400,000 clusters and AKS registers over 130,000 clusters (which is suspiciously small for their presumed market share).

We also know this isn’t 100% of the cloud clusters–all providers offer a variation of private clusters that lock down the control plane. It is also possible (and highly recommended) to at least lock down control plane access to a small range of IP addresses. It is anyone’s guess how many companies leverage either of these popular options. If half of them do, we are already at ~2.5 million clusters just in the cloud. If we are conservative and estimate that only about 20% make the effort to lock down the control plane, we are talking 1.5 million clusters in the cloud. Our estimate is that we are closer to 1.5 million than 2.5 million, given that Kubernetes security is still not the most well-understood topic.

How Many Kubernetes Clusters Run On-Premises?

This leaves us to guess how many Kubernetes clusters are running on-premises. The cloud clusters in Censys data did not account for less than 100,000 clusters out of 1.25 million. This makes some sense since most on-premises clusters tend to be locked down tighter, but this still felt suspiciously small.

Shodan’s data showed 74% of the 900,000 Kubernetes clusters were tagged “cloud.” This proportion seemed more credible, and one can see that they picked up real organizations such as Korea Telecom, San Diego Supercomputer Center and the University of Chicago in their report. If we assumed a similar proportion, say, 80% of the on-premises clusters, are locked down, we’d land at about 1.2 million on-premises clusters. Keep in mind that Shodan found fewer clusters overall (900,000 in Shodan versus 1.25 million from Censys’ data). If we assume the difference is systemic, then the real number could be inching closer to 1.6 million.

clusters

According to the observability platform Dynatrace’s Kubernetes adoption survey, the market share of the deployment models is almost down the middle and slightly in favor of on-premises clusters, but the number of cloud clusters is growing faster and are expected to become the majority in 2023. Dynatrace claims to make these estimates based on telemetry data from 4.1 billion pods—and that’s a sample size we can hang our hat on.

CNCF surveys show similar numbers, though the overlapping responses to some questions make them slightly harder to use. A customer running in a hybrid cloud or multi-cloud is irrelevant to the analysis here since we only care about the count and not the distribution.

So, if we assume that cloud represents about half of the clusters (observed as 45% in September 2022) and if we believe that there are about 1.5 million clusters in the cloud, we should have at least another 1.5 million to 1.8 million clusters on-premises.

We were somewhat pleased that these two estimates landed in the same ballpark. So, our conservative estimate is that there are at least 3 million Kubernetes clusters worldwide. It is not a stretch that there may even be 4 million or more Kubernetes clusters at the time this blog is published (early 2023).

However, our research job isn’t done. Next, we’d love to quantify how many of these clusters run stateful applications because that helps us know how many have persistent data that a Kubernetes backup solution must protect.

Sathya Sankaran

Sathya is the founder and General Manager of the CloudCasa business within Catalogic Software, where he provides operational and strategic oversight across R&D, marketing, sales and partner alliances. Sathya was an early enthusiast of the potential for containers and cloud technologies to transform how we innovate and deliver solutions to businesses. He is responsible for Catalogic’s strategic pivot to focus on addressing Day 2 challenges in Kubernetes and cloud native ecosystems, including data protection, cyber-resilience and cloud mobility. As the COO of Catalogic Software, Sathya leads engineering, sales and alliance teams at Catalogic and was instrumental in the strategic sale of Catalogic’s copy data management portfolio to IBM Storage in 2021. Apart from work and a young family, Sathya is passionate about Cricket, F1, electric cars and world politics. Sathya holds a BE from the University of Madras, MS from Columbia University, N.Y., and an MBA in Strategy and Finance from New York University Stern School of Business.

Sathya Sankaran has 1 posts and counting. See all posts by Sathya Sankaran