Fitting Square Kubernetes Into the Round AI-Native Apps

September 8, 2025 Alan Shimel AI control plane, AI infrastructure, AI pipelines Kubernetes, AI-native applications, cloud-native vs AI-native, container orchestration AI, distributed training orchestration, GPU scheduling, inference at scale, internal developer platforms, Kubeflow, KubeRay, kubernetes, Kubernetes AI workloads, Kubernetes future, Kubernetes limitations, Kubernetes vs AI, platform engineering, Ray on Kubernetes, Volcano scheduler

by Alan Shimel

Kubernetes has been called many things over the past decade: The operating system of the cloud, the universal control plane, the great abstraction layer. And it’s earned that reputation. Kubernetes tamed the chaos of containers, gave us a common language for infrastructure, and became the backbone of the cloud-native movement.

But today, we’re staring at a new frontier: AI-native applications. Training massive models across GPU clusters. Running distributed inference pipelines. Serving low-latency responses at the edge. Managing data pipelines is as critical as the compute itself. And suddenly, our trusty Kubernetes hammer looks a little less suited to the AI nail.

Which raises the question: Are we trying to fit square Kubernetes into the round hole of AI-native apps?

Kubernetes: The Control Plane That Won

Let’s give Kubernetes its due. Born at Google and open-sourced in 2014, Kubernetes was designed to schedule and orchestrate stateless microservices. It abstracts away the messy details of where a container runs, how it scales, and how it connects to other services. With its extensibility — custom resources, operators, controllers — Kubernetes has grown to orchestrate not just workloads, but the entire ecosystem around them.

That flexibility is why Kubernetes won. It’s now the de facto standard for cloud-native platforms. If you’re building microservices, you’re almost certainly running them on Kubernetes, whether on-prem, in the cloud, or through a managed service.

But AI-native workloads are not microservices. And that’s where the tension starts.

Why AI Doesn’t Fit Neatly

AI workloads stress Kubernetes in ways it was never designed for.

First, there’s hardware scheduling. Kubernetes was built for CPU and memory as primary resources. GPUs? TPUs? Other accelerators? Those are bolted on, awkwardly represented as extended resources. Scheduling GPU jobs efficiently — and fairly — is a whole other ballgame.

Second, job types. Kubernetes thrives on stateless services and short-lived jobs. AI workloads are often long-running, stateful, and distributed across hundreds or thousands of nodes. Training an LLM isn’t the same as serving a web API.

Third, data gravity. AI workloads aren’t just about compute. They rely on massive datasets that must be shuffled, staged, and streamed. Kubernetes doesn’t natively manage that complexity.

Finally, latency sensitivity. Inference workloads can be brutally sensitive to milliseconds. The abstractions that make Kubernetes so powerful can also introduce friction that AI teams can’t afford.

The Workarounds

Of course, the industry hasn’t been standing still. Plenty of projects are working to make Kubernetes more AI-friendly.

Kubeflow has become the go-to framework for machine learning pipelines on Kubernetes.
Ray on K8s and KubeRay bring distributed AI workloads into the cluster.
Volcano focuses on batch and high-performance computing job scheduling.
Cloud providers are all building their own AI-on-Kubernetes offerings with custom operators and GPU schedulers.

These solutions work. But too often they feel like add-ons — adapters bolted onto Kubernetes rather than the capabilities it was designed for. It’s like putting a new transmission into a sedan and calling it a racecar. It’ll get around the track, but was it really built for that?

What AI-Native Needs

So, what would a control plane designed for AI-native apps look like?

It would start with GPU- and accelerator-first scheduling. Not an afterthought, but core to the system.

It would integrate data pipelines as a first-class concern. Not just pods and volumes, but high-throughput streaming, sharding, and caching.

It would manage distributed training jobs natively, understanding how to orchestrate thousands of GPUs across multiple clusters with resilience.

It would optimize inference at scale — autoscaling tuned not for CPU utilization but for concurrency, latency, and model load.

And it would be policy- and cost-aware, because the cloud bills for AI are already shocking. A true AI-native control plane would enforce guardrails against runaway GPU jobs before your CFO comes knocking.

Can Kubernetes Bend Without Breaking?

Some argue Kubernetes can and will evolve. After all, it wasn’t designed to run databases either, and yet operators and CRDs made it possible. With enough extensions, Kubernetes could become the AI control plane too.

The counterargument? Kubernetes is squarely optimized for microservices. Retrofitting it for AI may always feel unnatural — more like duct tape than design. AI-native workloads might be better served by purpose-built systems like Ray, Mosaic, or even proprietary orchestrators from cloud vendors.

My hunch? We’re going to see a hybrid future. Kubernetes will remain the control plane for enterprise infrastructure — the place where compliance, networking, and security policies live. But AI-specific orchestrators will sit alongside it, optimized for training and inference. The challenge will be integrating the two without creating even more complexity.

The Platform Engineering Angle

For platform teams, the debate isn’t academic. Their job is to hide all this complexity from developers. Whether Kubernetes evolves to handle AI workloads or we adopt new orchestrators, the key is to provide golden paths where developers don’t care what’s under the hood.

That means building IDPs (Internal Developer Platforms) that handle GPUs, data, and AI pipelines seamlessly. Developers should request a training job or inference endpoint without needing to understand whether Kubernetes, Ray, or something else is doing the heavy lifting.

In that sense, the platform engineering movement may be the bridge, making Kubernetes “good enough” for AI in the enterprise by wrapping it in abstractions.

Shimmy’s Take

I’ve been around this industry long enough to see this pattern before. When Kubernetes first came out, it wasn’t a great fit for stateful apps, for databases, or for service mesh. But the ecosystem adapted. Operators, CRDs, and sidecars bent Kubernetes in directions it was never designed for.

So, can Kubernetes bend again for AI? Maybe. But here’s the difference: The AI wave is moving faster than anything we’ve seen before. We may not have the luxury of waiting for incremental ecosystem fixes. The pace of model training, the demand for GPU clusters, the need for inference at the edge — all of it is outstripping Kubernetes’ ability to evolve.

My take: Kubernetes will play a role. It’s too entrenched not to. But it may never be the perfect fit. Instead, we’ll see a new generation of AI-native control planes rise up — and the challenge will be stitching them into the Kubernetes world we already live in.

Closing Thoughts

Kubernetes deserves its crown as the universal control plane of cloud native. But AI-native workloads are a different beast. They don’t fit neatly into the Kubernetes model, no matter how many extensions we throw at it.

The future may not be about forcing square Kubernetes into the round hole of AI-native apps. Instead, it’s about figuring out where Kubernetes belongs in the AI era — and where we need new abstractions altogether.

Because the truth is, Kubernetes doesn’t have to do everything. It just has to do its job well. And if AI-native apps require something new, then maybe the most cloud-native thing we can do is embrace that evolution.