Kubernetes Environments Grow in Complexity, Challenging Ops Teams
As Kubernetes becomes increasingly complex, operations (Ops) teams are finding it challenging to manage their clusters effectively. Notably, 56% of businesses now operate more than 10 Kubernetes clusters, with 80% expecting further growth.
The Spectro Cloud survey found 75% of respondents encountered issues affecting the running of their clusters, up from 66% in 2022, and four in 10 felt they lacked the necessary skills and resources to manage Kubernetes.
Many respondents reported spending excessive time on manual tasks like troubleshooting and patching, driving a strong interest in automation as the primary means to enhance operations.
While the concept of platform engineering gains traction, 82% of organizations acknowledged their operations teams still struggled to provide developers with access to tailored clusters.
Furthermore, 37% of respondents said they experienced inconsistencies between development, staging and production environments.
While one in three developers build their clusters for deploying applications, 62% of organizations are actively utilizing or piloting tools designed to aid app developers working with Kubernetes.
Interviewees unanimously expressed their commitment to a “container-first” approach while also acknowledging the enduring importance of virtual machines (VMs) in their businesses.
A significant 85% are migrating existing VM workloads to Kubernetes, with 86% seeking to unify containerized and VM workloads on a single infrastructure platform.
The survey also highlights the growing strategic importance of edge computing, with nearly half of the organizations surveyed actively piloting or using Kubernetes in edge computing environments.
Respondents cited AI as a key driver for edge adoption, with investment in edge computing expected to enhance business processes (50%) and enable new connected solutions (41%).
However, significant challenges exist, including security, compliance, field engineering costs and concerns regarding Day 2 operations tasks.
“Ops teams are already experiencing burnout, and as our survey found, they’re already struggling to serve their developers and maintain availability, with activities like troubleshooting and patching soaking up time across multiple diverse clusters,” said David Cope, chief revenue and marketing officer at Spectro Cloud.
He pointed out respondents clearly looked to automation as the answer, but from his perspective, that shouldn’t be a knot of bash scripts — otherwise, automation itself becomes another thing to manage and maintain.
“Nor is the answer to lock down to a single environment or configuration—diverse workloads demand diverse infrastructure,” he added.
Beyond automation, Cope explained there is still a need to invest in skills and professional development.
“We live in a multi-cluster world now—the majority of our respondents have more than ten clusters,” he said. “If you’re still logging in to each cluster in turn to apply a patch or update a configuration, that’s where the biggest timesaving happens.”
He recommended building repeatable profiles for clusters, combined with automated declarative management and self-healing, which ensures that every cluster built from the same profile will behave the same way.
Cope said beyond security, the key technical and operational challenges faced by organizations when implementing Kubernetes at the edge really relate to the inherent nature of edge.
“How do you get new devices onboarded quickly, whether for a first deployment or after hardware failure, without having to send a $200k Kubernetes engineer to hundreds of sites?” he asked. “When there’s a problem, how can you troubleshoot remotely? How can you deploy a patch without the risk of bricking the device?”
He noted another area that’s of great interest is high availability (HA).
“When you’re deploying to ten thousand or more edge sites, deploying three boxes to each site to get an HA architecture—which is needed for Kubernetes’ quorum-based etcd data store—is expensive and may be impossible depending on the nature of the site,” Cope said.