Why Kubernetes 1.33 Is a Turning Point for MLOps — and Platform Engineering

May 27, 2025May 27, 2025 Itiel Shwartz kubernetes, MLOps, platform engineering

There comes a point in every engineer’s experience when a platform matures to the point of being truly ready for production use. With Kubernetes v1.33, that point has arrived for artificial intelligence (AI) and machine learning (ML) infrastructure.

With over 60 enhancements and the stabilization of several long-awaited features, this marks a clear step forward. Not just in technical capability, but also in signaling to machine learning operations (MLOps) teams and platform engineers that Kubernetes is now a viable foundation for ML workloads.

Dynamic Resource Allocation (Beta)

Let’s start with dynamic resource allocation (DRA). For years, Kubernetes has made progress in supporting GPUs and specialized ML hardware, but without first-class support for non-CPU resources, it has been an uphill battle. You had to assemble node selectors, custom device plugins and hope that scheduling would work.

With DRA in beta, we now have a native mechanism to handle resource claims in a modular, flexible way. It means ML workloads that depend on high-end GPUs, TPUs or custom accelerators can be scheduled and run without brittle workarounds. That’s not just nice to have — it’s foundational for production ML.

The new alpha features layered on top of DRA, such as device taints, prioritized alternatives and partitionable devices, further push the envelope. They enable fine-grained control over hardware utilization, avoiding the ‘all or nothing’ trap with GPUs so partial resources can be allocated more efficiently.

For platform engineers, this reduces the pressure to custom-code hardware scheduling logic. Instead, they can rely on Kubernetes primitives to do it in a way that scales across teams and clusters.

Topology-Aware Routing Goes GA

Networking in Kubernetes has historically been a black box. Getting traffic routing ‘just right’ in multi-zone or multi-region clusters was often more about creative annotations and load balancer wizardry than native features.

With the trafficDistribution=PreferClose option in topology-aware routing now graduating to generally available (GA), Kubernetes finally provides a built-in way to prioritize network proximity. This minimizes cross-zone latency and cloud costs while keeping failover mechanisms intact.

For organizations running AI inference at scale — where every millisecond matters — this is a turning point. For platform teams, it simplifies what used to be one of the more painful elements of multi-zone design.

Observability and Scheduling Improvements

Another key improvement in v1.33 is improved observability. Features like ‘better pod status with generation and observedGeneration’ might seem minor, but they directly improve how we monitor and debug workloads. You no longer need to guess whether a status update reflects the most recent deployment — you can know.

Similarly, the addition of matchLabelKeys and mismatchLabelKeys in pod affinity rules allows you to write more expressive placement policies without rewriting entire manifests. This is critical for rolling updates, blue-green deployments and advanced scheduling strategies in ML pipelines.

Since platform engineers spend a disproportionate amount of time stitching together insights from different tools and APIs, these updates streamline workflows, allowing more time for building and less time troubleshooting.

Smoother Developer Experience

One overlooked addition in this release may be the new –subresource flag in kubectl. It seems minor, but enables developers and data scientists to interact with status, scale and other sub-resources directly — without jumping through hoops or writing raw API calls.

This matters because, as Kubernetes adoption expands beyond traditional infrastructure teams, we need to meet developers halfway. Giving them cleaner, simpler access to the parts of Kubernetes they care about helps reduce friction and improves platform adoption across the board.

Kubernetes as the MLOps Operating System

Overall, this release demonstrates that the Kubernetes community is serious about meeting the needs of ML teams. We are seeing deliberate investment in GPU support, hardware partitioning, advanced scheduling and observability.

It also validates the work that many have been doing to treat Kubernetes as the control plane for all things infrastructure — including ML pipelines. If you have been holding off on running serious AI/ML workloads on Kubernetes due to hardware integration gaps or resource management challenges, now is the time to reevaluate.

What This Means for Platform Engineering Leaders

For platform teams, this release signals they can now build infrastructure that supports diverse workloads — from web apps to GPU-hungry ML models — without needing separate platforms or brittle extensions.

We still need to be thoughtful. Just because a feature reaches beta or GA doesn’t mean it is ready for every use case. But with DRA, topology-aware routing, better scheduling rules and richer subresource access, we are looking at a fundamentally stronger platform.

If Kubernetes is already part of your infrastructure, version 1.33 offers meaningful updates that reinforce its role. For organizations still evaluating how to scale ML workloads, this release underscores Kubernetes’ growing maturity and leadership in the space.