AI Emerges as Next Major Kubernetes Challenge

March 22, 2024 Mike Vizard AI, application development, cluster management, data science, kubernetes

by Mike Vizard

Artificial intelligence (AI) emerged this week as a dominant theme of the Kubecon + CloudNativeCon Europe conference as it becomes more apparent data science teams are encountering the same complexity challenges that cloud-native application developers already experience.

Most of the generative AI platforms being used today are based on Kubernetes clusters. The challenge is that the providers of these platforms have small armies of software engineers to run them.

Lachlan Evenson, principal program manager for Microsoft, said it’s not yet easy enough for everybody else to build and deploy AI applications on Kubernetes clusters. In fact, AI engineering is emerging as a job function to manage workflows between data science teams and IT operations teams in much the same way DevOps engineers do for organizations that frequently deploy and update applications, he noted.

Microsoft is previewing an open source Kubernetes AI Toolchain Operator (KAITO) tool to help enable organizations to achieve that goal, added Everson.

Priyanka Sharma, executive director of the Cloud Native Computing Foundation (CNCF), added that IT organizations have successfully confronted similar challenges by defining workflows to ensure empathy is maintained for the challenges that various members of an IT team face.

Chuck Dubuque, senior director of product marketing for OpenShift at Red Hat, added that organizations need to apply the same principles used to define DevOps workflows to AI applications to maximize cooperation.

However, there are still technical challenges the Kubernetes Technical Oversight Committee should address.

Sudha Raghavan, senior vice president for the OCI developer platform for Oracle Cloud Infrastructure (OCI), noted, for example, that, in the longer term, there needs to be templates that data science teams can use without requiring intervention on the part of an IT operations team.

IT teams should also be able to seamlessly shift to new hardware platforms as processor advances are made to ensure AI inference engines are deployed on the right class of machine, she added. In addition, data feedback from a generative AI platform trained on optimizing Kubernetes clusters should enable IT teams to continuously optimize processes, said Raghavan.

Arun Gupta, vice president and general manager for an open ecosystem at Intel and governing board chair for the Cloud Native Computing Foundation (CNCF), however, noted that before those templates can be built, there needs to be a standard set of application programming interfaces (APIs) at the core Kubernetes platform. Organizations running AI workloads on Kubernetes today have extended the platform themselves to accommodate a set of workloads that Kubernetes was never designed to run.

It’s still early days as far as the deployment of AI workloads on Kubernetes clusters is concerned. Still, as more organizations look to operationalize AI technologies, it’s only a matter of time before data scientists are added to the mix of personas that regularly combine their expertise to build and deploy any enterprise-class application. The issue, as always, is to reduce the inevitable friction created any time different cultures clash.