The Hidden Cost of Unevictable Workloads in Kubernetes

March 3, 2025 Guy Baron KubeCon, kubernetes

Kubernetes has introduced unprecedented infrastructure flexibility through dynamically scalable workloads, resource utilization optimization and improved resilience — central features that contribute to its dominance and widespread adoption in the cloud-native ecosystem. However, as clusters grow, so do the complexities of dynamic workload scheduling — a core benefit of choosing Kubernetes — particularly when dealing with a mix of production requirements.

Kubernetes clusters often run a mix of workloads to accommodate diverse application needs, balancing flexibility, performance and reliability. Organizations deploy several types of workloads based on their operational characteristics and business requirements. Some applications — such as stateless microservices, batch processing jobs and CI/CD pipelines — are inherently elastic, meaning they can be dynamically scaled up or down, restarted or rescheduled without impacting the overall system. These workloads benefit from Kubernetes’ built-in autoscaling and scheduling mechanisms, enabling efficient resource allocation.

However, other workloads require strict continuity and stability — such as databases, stateful applications, real-time analytics pipelines and critical infrastructure services. These workloads often rely on persistent storage, low-latency networking or specific security constraints that prevent them from being arbitrarily rescheduled. In many cases, compliance, performance guarantees or architectural dependencies dictate that certain workloads remain pinned to specific nodes, limiting Kubernetes’ ability to dynamically optimize placement.

The FinOps Challenge of Kubernetes Scheduling in Complex Environments

This mix of flexible and immovable workloads introduces a fundamental challenge: Maximizing resource efficiency while ensuring that critical services remain stable and performant. If not managed properly, pinned workloads can lead to inefficient bin-packing, underutilized nodes and increasing operational complexity — ultimately driving up infrastructure costs.

Rightsizing workloads is a fundamental step in optimizing Kubernetes environments. By properly bin-packing pods and freeing up underutilized nodes, teams can significantly reduce infrastructure costs. However, the effectiveness of this approach is often hindered by workloads that cannot be evicted. These pinned workloads introduce inefficiencies that accumulate over time, leading to higher cloud costs, fragmented clusters and increased operational complexity.

At the core of Kubernetes scheduling is an optimization problem: Placing workloads efficiently across available compute nodes while adhering to a range of constraints. However, production environments rarely consist of homogeneous, flexible workloads.

Instead, clusters must accommodate a mix of:

Evictable workloads which can be freely rescheduled based on resource availability.

Unevictable workloads are critical pods that, once placed, must remain on the same node, preventing autoscalers from consolidating resources effectively.

Affinity and anti-affinity rules, which introduce further constraints, often leading to suboptimal scheduling and fragmented resource utilization.

Real-World Example: The Hidden Cost of Long-Running Workloads in Kubernetes

One of the most familiar challenges in Kubernetes scheduling is managing workloads that require uninterrupted execution. Unlike short-lived or stateless workloads, these jobs often have strict uptime requirements, making them particularly sensitive to scheduling inefficiencies.

Consider a batch job running on Kubernetes. This workload processes large datasets and requires several hours of uninterrupted computing time. If the job is scheduled on a node that later becomes underutilized, Kubernetes might attempt to reschedule workloads for better bin-packing. However, if the pod is marked with safe-to-evict=false, Kubernetes cannot safely terminate and reschedule it elsewhere. The result? The node remains active — even if it is mostly idle — leading to inefficient resource usage and higher infrastructure costs.

A similar issue arises with cybersecurity scanning workloads that analyze large volumes of data. These jobs may take hours to complete, and if the underlying node is drained or rescheduled mid-execution, the scan is interrupted and must start over from scratch. Not only does this waste compute resources, but it also delays security insights and increases operational costs.

For business-critical microservices, such as those powering high-conversion funnels or real-time analytics pipelines, even brief disruptions can have a direct impact on top-line revenue metrics. A service that manages checkout transactions or user authentication may enforce strict PodDisruptionBudgets (PDBs) to prevent downtime. However, if these are too strict, the constraints can lead to scheduling bottlenecks, where Kubernetes struggles to make placement decisions efficiently, resulting in cluster fragmentation and unnecessary resource waste.

This begs the question: Is your Kubernetes cluster truly optimized, or are unevictable workloads silently eroding your efficiency and inflating your cloud bill?

How Workload Constraints Impact Kubernetes Costs

What these constraints lead to in practice, however, when they compound over time, is a gradual reduction in the efficiency of Kubernetes bin-packing. When critical workloads remain pinned to specific nodes, autoscalers lose their ability to optimize compute utilization, leading to:

Wasted resources

Cluster fragmentation

Operational overhead

Unevictable workloads and affinity rules can lead to significant inefficiencies within a Kubernetes cluster. For one, nodes often remain underutilized because the cluster-autoscaler cannot freely reallocate these immovable pods, often leading to substantial wasted resources. Additionally, as these workloads and their associated rules accumulate, the cluster can become fragmented, and resources may be inefficiently distributed, requiring more infrastructure resources than would otherwise be necessary.

How It Worked Until Today

Until now, teams have relied on a combination of Kubernetes-native tools and manual workarounds to address the inefficiencies caused by unevictable workloads. The most common approaches have included:

Over-Provisioning Compute Resources: Teams often compensate for scheduling inefficiencies by adding more nodes than necessary. While this ensures availability, it drives up cloud costs significantly.

Manually Adjusting Affinity and Taints: Engineers attempt to fine-tune scheduling behavior by carefully configuring node affinity, taints, tolerations and node selectors. However, this is a tedious process that does not scale well as cluster workloads evolve.

Forcing Node Replacements: Some teams periodically force a rolling restart of unevictable workloads, artificially resetting their placement. This introduces unnecessary disruptions and still does not solve the fundamental inefficiencies.

Autoscaler Workarounds: While Cluster Autoscaler and Karpenter can improve bin-packing, they struggle when unevictable workloads are in play, as they lack mechanisms to dynamically adjust their placement without forcing evictions.

These methods provide temporary relief but fail to address the core issue: Kubernetes was designed to be dynamic, yet pinned workloads introduce rigid constraints that prevent autoscalers from doing their job effectively.

As a result, organizations continue to experience increasing cloud waste, with compute resources remaining idle due to ineffective bin-packing and scaling bottlenecks. As clusters grow, these inefficiencies compound, leading to unpredictable performance and complex operational workflows, where DevOps teams continually tweak scheduling configurations to maintain efficiency. None of these is ideal, to say the least.

To address these challenges, organizations need a more intelligent approach to scheduling unevictable workloads — one that enhances Kubernetes’ default scheduling behavior without introducing unnecessary complexity. While traditional autoscalers and bin-packing strategies are effective for managing flexible workloads, production-grade Kubernetes clusters require more advanced optimizations. These include the ability to dynamically adjust scheduling decisions to accommodate long-lived workloads, maximize bin-packing efficiency even with immovable workloads, provide real-time visibility into scheduling constraints and seamlessly integrate with existing Kubernetes autoscaling and scheduling mechanisms.

Rethinking Pod Placement for Modern Kubernetes Workloads

By shifting the focus from workload rightsizing alone to intelligent placement strategies, teams can reclaim wasted capacity, improve cluster stability and achieve substantial cost savings — without disrupting critical workloads.

As Kubernetes adoption continues to grow, tackling the inefficiencies caused by unevictable workloads will be key to unlocking the next phase of cloud optimization. Organizations that embrace smarter scheduling mechanisms will not only reduce costs but also build more resilient and scalable infrastructure for the future.

KubeCon + CloudNativeCon EU 2025 is taking place in London from April 1-4. Register now.