Kubernetes or Chaos: The Risks of Running AI Workloads Without Orchestration
When AI environments aren’t orchestrated, the result is GPU waste, job starvation, dependency conflicts, and runaway cloud bills. It’s like running a data center without a traffic controller—everything eventually collides.
Most organizations started adopting AI without thinking through the operational impacts. They were either experimenting with AI to see how to solve business problems or had a business problem where AI could help make an impact.
“In either scenario, the focus was on using AI quickly; not thinking through the long term impacts,” says Yasmin Rajabi COO at CloudBolt.
There was more emphasis on how to train the models correctly and deploy them quickly, rather than on how to manage the infrastructure and orchestration efficiently – which is fine when you’re starting out, but not when you scale.
Rajabi says because optimizing a GPU is not as simple as a CPU, organizations also end up with a bunch more waste and idle capacity.
“Either you get expensive GPUs that are underutilized or AI workloads fighting for the same resources,” she says. “The long-term impacts include not only rising costs but also performance degradation.”
Orchestration Prevents Chaos
Without a proper orchestration platform like Kubernetes that can adapt to changing scalability needs, organizations will face not only cost challenges but also performance issues as they attempt to move beyond a proof of concept (POC).
“Kubernetes is built for this,” Rajabi says. “It provides the best platform for a declarative, automated approach to allocating and scaling resources.”
This is critical for optimizing GPU utilization based on actual demand and reducing idle resources, because as workloads scale, it’s essential that the infrastructure scales efficiently alongside them to ensure reliability and reduce waste.
Within the Kubernetes ecosystem, tools exist for automatic GPU partitioning, which adjusts GPU allocation in real time based on pending pods, ensuring full utilization as pods are scheduled.
By using an orchestration platform like Kubernetes, workloads are dynamically assigned to the appropriate nodes while balancing resource contention as needed.
Sharing Tools, Data
AI orchestration enables the organization to share tools and data in ways that foster innovation and communication, thereby reducing the cost of discovering new value from AI.
James Urquhart, field CTO and technology evangelist at Kamiwaza AI, says proper AI orchestration drives multiple key outcomes to optimize AI application operations.
“The first is that inferencing is distributed across data locations,” he says. “This removes the latencies or other inefficiencies that come from either accessing data over an extended network or replicating data to locations close to the inferencing.”
A second is that orchestration understands the data landscape: By clearly building a metadata graph and ontology for multiple existing data sources, AI orchestration can better distribute inference to the locations where key data for any specific request resides.
It also helps understand the infrastructure landscape, Urquhart adds.
“Organizations have widely disparate existing technologies to support inferencing functions, and may add new technologies at any time in a variety of ways,” he explains. “AI orchestration should understand what technologies are available and optimize model placement and inference accordingly.”
Scalable, Sustainable AI
The CNCF recently released a Kubernetes AI conformance document to establish standards and best practices when using Kubernetes to support AI workloads.
The document outlines “must-do” and “should-do” practices, with three main goals: simplifying AI adoption on Kubernetes, guaranteeing interoperability and portability for AI workloads, and providing the ecosystem with a shared foundational standard.
It allows organizations and vendors to complete a self-assessment questionnaire to see how they measure up against the standard.
“Over time, this should evolve into an automated testing framework, but it’s a great place to start,” Rajabi says.
Jim Piazza, chief AI officer at Ensono, cautions GPU orchestration introduces a steep learning curve—managing CUDA drivers, device plugins, node labeling, and workload scheduling is non-trivial without Kubernetes expertise.
“Without centralized scheduling, usage spikes are hard to control or forecast,” he says. “Static resource allocation means queued jobs wait while other GPUs idle.”
He explains proper orchestration can cut GPU waste by half, accelerate model deployment from weeks to hours, and reduce the environmental footprint of large-scale AI.
“It’s not just about efficiency,” Piazza says. “It’s about making AI sustainable, scalable, and accountable.”