Why Kubernetes Utilization Is Stuck Below 40%
Kubernetes was sold on the premise that clusters would scale themselves into efficient, elastic infrastructure. The reality on most production estates looks nothing like that — average utilization sits stubbornly in the 30 to 40 percent range, with the rest sitting idle as expensive insurance against an outage no one wants to be blamed for. Over-provisioning has quietly become the default coping mechanism, and the bill is showing up in cloud invoices that platform teams can no longer wave away.
Eli Birger, CTO and co-founder of PerfectScale by DoiT, joined Mike Vizard to dig into why that gap has proven so durable. His read is that the root cause is human before it is technical — developers are rewarded for keeping services up and punished when they fall over, so they pad requests and limits to give themselves margin. Monitoring tools, which were designed to surface failures rather than right-size workloads, do little to correct that bias.
Birger and Vizard turn to how AI is reshaping the equation in both directions. Vibe coding and AI-generated workloads are pushing cluster costs higher as more services get spun up without anyone tuning them, while AI-driven optimization itself is becoming a wedge for fixing the problem. Birger argues the safer path is autonomous, battle-tested algorithms that operators can reason about rather than black-box models that make resource decisions no one can explain when something breaks.
Day 2 operations come up as the thread tying it all together — the work of continuous tuning, platform engineering and keep-it-simple discipline that turns a Kubernetes deployment from a sunk cost into something that pays for itself. With AI workloads accelerating the consumption curve, the pressure to close the utilization gap is only going to grow, and the platforms that figure it out first will have a real advantage on cloud spend.


