Pod Disruption Budgets: A Field Guide to What Actually Works
In Kubernetes, PodDisruptionBudgets are simple to write, easy to misuse, and cause more “why won’t this node drain?” confusions than any other Kubernetes primitive. After tracing too many node lifecycle automation problems back to misconfigured PDBs, I’m going to lay out a field guide.
What a PDB Does, and What It Doesn’t
A PDB tells the Kubernetes Eviction API to refuse evictions that would violate the disruption budget. It’s worth saying out loud because many teams treat PDBs as availability guarantees which is incorrect.
PDBs are consulted only for voluntary disruptions, meaning anything going through the Eviction API. Things that bypass the Eviction API ignore PDBs entirely:
- Direct pod deletion (kubectl delete pod)
- Node failures, where the kubelet stops reporting and pods get evicted by the node controller
- Cluster autoscaler hard scale-downs in some configs
- Operator-driven pod deletion that doesn’t go through the Eviction API
- Force-deletion as part of stuck-pod recovery
This is one of the most misunderstood thing about PDBs. People assume they’re a guarantee about pod availability. They aren’t. They’re a guarantee about a specific class of disruption: the polite ones, the ones that ask permission first. Real availability requires redundancy, spread across failure domains, plus health checks that work. PDBs are a softer guarantee on top of that foundation. If your only availability story is your PDB, you don’t have an availability story.
What Goes Wrong
A short tour of the misconfigurations I see most often.
maxUnavailable: 0 with no exceptions. The most common misconfig. Means “never tolerate any voluntary disruption ever.” This will block the node from being able to evict the pods and the node drain would never succeed. Eventually someone force deletes the pods or force drains the node, and the team learns the hard way that the PDB never protected them from the disruptions they actually cared about, only from the polite ones.
minAvailable: 100% on a single-replica workload. Equivalent to maxUnavailable: 0, just spelled differently but it’s the same problem.
PDBs on workloads that don’t need them. Many platform teams add PDBs to every workload by default. For workloads without availability requirements (dev jobs, batch processors, low-criticality services), this just creates drain friction without value. The friction isn’t free. Every blocked eviction is a queued attempt that has to retry, a lifecycle automation that has to wait, a node that stays around longer than it should.
Orphaned PDBs. A PDB references a deleted Deployment but still exists in etcd, still gets consulted. If its selector matches some other workload’s pods, you get a phantom constraint that nobody knows about until a drain mysteriously stalls.
Overlapping PDBs. A pod that matches the selectors of two or more PDBs cannot be evicted at all. The Eviction API doesn’t pick one budget to consume against; it returns an error and refuses the eviction outright. This is documented Kubernetes behavior rather than a bug, but it surprises teams every time. The usual cause is one PDB targeting a narrow label like app: gateway and another targeting a broader label like tier: frontend, both of which select the same pods. Drains stall, force-delete becomes the usual workaround, and the PDBs end up protecting nothing for any of the workloads they were supposed to cover. The fix is selector hygiene: ensure each pod is covered by exactly one PDB.
Priority preemption. When a higher-priority pod needs to schedule and the cluster is full, the scheduler preempts lower-priority pods to make room. Preemption respects PDBs, but only as a best-effort constraint. If the scheduler can’t find any other lower-priority candidates to preempt, it will violate a PDB to schedule the high-priority pod. This is documented Kubernetes behavior rather than a bug. Teams running mixed-priority workloads on tight capacity find this out the hard way: a critical pod gets evicted to make room for an even-more-critical pod, and the PDB that was supposed to protect it gets quietly overridden. If your workload genuinely cannot tolerate disruption beyond your PDB, you also need cluster headroom so preemption never has to pick your pod.
PDB plus pod anti-affinity creating deadlocks. PDB says “at least one available.” Anti-affinity says “don’t co-locate replicas.” Two replicas, two nodes, drain one node. Anti-affinity says the new replica can’t go on the running node. PDB says you can’t evict the existing replica until the new one is up. The new one can’t come up until the existing one moves. In cases whether the compute capacity is limited and there are no new nodes available, the cluster is stuck, and the symptom is a node that won’t drain for reasons that take an hour to figure out.
What Works
For most stateless workloads, the simple correct PDB is maxUnavailable: 1 or minAvailable: <replicas – 1>. Tolerate one voluntary disruption at a time. Drains proceed one pod at a time. Replicas come up before old ones go down. This is fine. It’s not fancy. It works.
For workloads with stronger availability requirements, parameterize by replica count. A 5-replica deployment with maxUnavailable: 1 allows up to 20% disruption. With maxUnavailable: 25%, you allow the same thing but it scales naturally if the workload grows to 8 replicas. The percentage form is usually cleaner than absolute numbers because the workload’s appropriate budget is a function of how big it is, not a constant.
For workloads where every replica matters (stateful systems with consensus requirements like Zookeeper or etcd, leader-elected systems that can’t survive simultaneous restarts), pair the PDB with the workload’s own logic. The PDB enforces the budget at the API level. The workload’s coordination logic (a controller, a sidecar, a custom termination handler) ensures the budget is meaningful. PDB alone is necessary but not sufficient.
What Doesn’t Usually Work
Wishful PDBs. A PDB on a workload that the workload itself can’t satisfy. If your replicas can’t actually come up fast enough to satisfy maxUnavailable: 1, the PDB doesn’t help. It just blocks drains while the actual problem (slow startup, missing readiness probes, bad rollout config) goes undiagnosed.
Per-namespace PDB defaults. Admission webhooks that add a default PDB to every workload. It sounds prudent. It causes problems for batch jobs, debug pods, and transient workloads that don’t need PDBs and shouldn’t have them. Defaults that are added by webhook are harder to remove than defaults you have to opt into, and the cost of the wrong default isn’t symmetric.
PDBs as the only availability story. I’ve read postmortems where the PDB was set, the team felt protected, then a node failure took out two replicas anyway because node failures bypass PDBs. The PDB created a false sense of safety, which is worse than no safety at all because it stopped the team from building real redundancy.
Operational Practices That Help
Regular PDB audits matter more than people think. Which PDBs select zero pods? Which select pods from multiple deployments? Which would block eviction of every pod on a node? These are the questions to ask, and you can answer them with a few kubectl get queries piped through jq. A monthly cadence works. The number of orphaned and misconfigured PDBs in any nontrivial cluster will surprise you the first time.
Monitoring PDB-blocked drain attempts is the second practice worth investing in. If your node lifecycle automation tracks why a drain failed, surface PDB-blocking-drain as its own metric. Spikes in this metric are usually a sign that someone introduced a PDB misconfig, often as part of a Helm chart update or a copy-pasted manifest. The metric catches problems faster than postmortems do.
PDB behavior should be tested. A PDB is a contract between your workload and the Eviction API. Like any contract, it should be verified. Scale your workload down to one replica, try to evict it via the Eviction API, confirm you’re blocked. If you’re not blocked, the PDB isn’t doing what you think. Run this in CI for production-critical workloads.
The disruption budget meaning is worth documenting. Every PDB on a workload should have a comment explaining the choice. “We tolerate maxUnavailable: 1 because we have 5 replicas and load testing shows we handle 80% capacity for the duration of one pod restart.” When the comment is missing, the PDB tends to drift over time as people copy it into new workloads with different replica counts and different load profiles.
What’s New or Coming
The Kubernetes community has been working on better disruption controls. A few things to watch:
Pod Disruption Conditions. Recent Kubernetes versions add structured conditions that explain why a disruption was blocked or how it happened. If you’re not surfacing these in your tooling, you should be. They turn “drain failed” into “drain failed because PDB X blocked eviction of pod Y,” which is the difference between an alert and a debugging session.
ValidatingAdmissionPolicy for PDB hygiene. A CEL policy that rejects PDBs with obvious misconfigs (maxUnavailable: 0 with no escape hatch, minAvailable higher than replica count) is a few lines of CEL. Catching these at write time beats discovering them during a drain.
Eviction API improvements. The API itself is becoming more nuanced about reasons for failure. Expect this to keep improving as the ecosystem learns what was missing.
PDBs do one thing well: assert a budget on voluntary disruption. The trouble teams have with them is usually trouble with everything around them: redundancy, anti-affinity, drain automation, monitoring. When you get those right, PDBs become unobtrusive. When you get those wrong, PDBs become the thing you blame for everything else.


