The Efficiency Era: How Kubernetes v1.35 Finally Solves the “Restart” Headache

March 3, 2026 Pavan Madduri AI/ML workloads, cloud costs, Dynamic Resource Allocation, efficiency era, FinOps, immutability, Kubernetes architecture, Kubernetes enhancements, Kubernetes v1.35, Openshift, operational efficiency, resource resizing, self-healing infrastructure, Stateful Workloads, system performance, traffic distribution, vertical scaling

by Pavan Madduri

For the better part of a decade, Kubernetes practitioners have lived by a hard rule: Immutability.

If you want to change something, you destroy the old and create the new.

While this pattern revolutionized stateless application deployment, it has always been a thorn in the side of stateful workloads. If a PostgreSQL database needed 2GB more RAM, or a Java application needed a higher heap limit to avoid an OOMKill, the solution was always the same—restart the Pod. In production, “restarting” means failover, cache warming, latency spikes, and the potential for data inconsistency if replication isn’t perfectly synchronized.

With the release of Kubernetes v1.34 and v1.35, we are entering a new phase of the ecosystem. I call this the “Efficiency Era.” The focus has shifted from expanding the API surface area to refining the operational primitives that drive cost and stability.

The headline feature? In-Place Pod Resource Resizing.

Here is why this release changes the game for Platform Engineers and FinOps teams, and why you need to prepare your manifests for it.

Vertical Scaling Without the Disruption

The single most requested feature in Kubernetes history has finally graduated to stable. Previously, the resources section of a Pod specification was immutable. If you changed requests.memory from 1Gi to 2Gi, the Kubelet would terminate the container and spin up a new one with the new cgroup limits.

In v1.35, the resources field is now mutable for CPU and Memory.

How it works under the hood:

When you patch a running Pod’s resource spec, the Kubelet detects the change. Instead of sending a SIGTERM, it interacts directly with the container runtime (CRI) to update the cgroup limits on the host Linux kernel.

The Practitioner Impact:

Databases: You can now vertically scale a primary database node during peak traffic (e.g., Black Friday) without triggering a leader election or failover.
Java Workloads: We can dynamically adjust JVM heap availability in response to load without the “cold start” penalty of a JVM restart.
VPA (Vertical Pod Autoscaler): Historically, VPA was dangerous in production because its recommendations triggered restarts. With in-place resizing, VPA becomes a safe, “always-on” optimizer that breathes with your application traffic, adjusting resources in real-time without disrupting a single connection.

Solving the “Thundering Herd” with Traffic Distribution

Another silent killer in large-scale clusters is cross-zone data transfer costs.

In previous versions, TopologyAwareHints were useful but often hard to control.

v1.35 refines this with the fully stable traffic Distribution field in the Service specification. This allows architects to strictly prefer routing traffic to endpoints within the same zone or node, even before considering other load-balancing heuristics.

The FinOps Angle:

Cloud providers charge significant fees for traffic that crosses availability zones (AZs). By setting trafficDistribution: PreferClose, you instruct the kube-proxy (or your CNI’s replacement) to keep packets local whenever possible.

For high-throughput applications—like the scientific software stacks I work with in HPC environments – this doesn’t just lower the cloud bill; it significantly reduces network latency, improving the p99 response times for end users.

Dynamic Resource Allocation (DRA) for the AI Age

As Kubernetes becomes the de facto operating system for AI/ML, the old “Device Plugin” model is showing its age. It was rigid—a GPU was either “there” or “not there.”

Kubernetes v1.35 doubles down on Dynamic Resource Allocation (DRA) with structured parameters. This is crucial for the “Efficiency Era” because it allows for resource slicing.

Instead of a Pod requesting an entire GPU, DRA allows workloads to request specific attributes of a device (e.g., “I need 4GB of VRAM on an A100”). The scheduler can now bin-pack multiple AI inference workloads onto a single physical GPU more intelligently.

For Platform Engineers maintaining inference clusters, this means higher utilization rates and less hardware sitting idle—a direct improvement to the bottom line.

Conclusion: Boring is Good

The beauty of Kubernetes v1.35 is not in flashy new CRDs or complex subsystems. Its beauty lies in its boredom. It fixes the fundamental operational inefficiencies—restarts, latency, and hardware waste—that have plagued us for years.

For the architect, this release signals a time to audit your StatefulSets.

It is time to revisit your VPA configurations. The “restart penalty” is gone, and with it, the last barrier to truly dynamic, self-healing infrastructure has fallen.