Open Source KServe AI Inference Platform Becomes CNCF Project

November 13, 2025 Mike Vizard AI gateway, AI inference on Kubernetes, AI model registry, Cloud Native Computing Foundation, CNCF incubator, Edge AI deployment, KServe, KubeCon + CloudNativeCon 2025, Kubeflow, Kubernetes AI workloads, LLM serving on Kubernetes, Model-as-a-service, vLLM

by Mike Vizard

The Cloud Native Computing Foundation this week at the KubeCon + CloudNativeCon 2025 North America conference, revealed that KServe, an open source platform for running artificial intelligence (AI) inference engines, has been donated to the consortium.

Yuan Tang, a senior principal software engineer for Red Hat and a maintainer of KServe, said that the shift will enable maintainers of KServe to work more closely with other maintainers of other open source CNCF projects that are building software that will need to be integrated with the AI inference platform.

Originally developed in 2019 by Google, IBM, Bloomberg, NVIDIA and Seldon as an element of the Kubeflow project for building AI applications on a Kubernetes cluster, KServe was previously donated to the LF AI & Data Foundation in February 2022. KServe has now moved to CNCF as an incubator-level project.

KServe is already embedded with the Red Hat OpenShift AI platform based on Kubernetes and an instance of vLLM, an inference engine originally developed by the Sky Computing Lab at the University of California, Berkeley, that enables disaggregated serving, pre-fix caching, intelligent scheduling and autoscaling using llm-d, a distributed inference engine that runs natively on Kubernetes.

The goal is to create a bridge between cloud-native and AI applications running on Kubernetes clusters as organizations build applications that go beyond simply creating a chatbot, said Tang.

Red Hat is making a case for a Red Hat AI 3 platform that provides access to multiple AI models.
The overall goal is to create a model-as-a-service (MaaS) environment using an AI gateway that has been integrated into a Kubernetes cluster to make it simpler to scale compute resources up and down as needed.

Via an AI hub, application developers and data scientists are given access to a curated catalog of foundational AI models and extensions, including open source models such as gpt-oss from OpenAI, DeepSeek-R1, and specialized models such as Whisper for speech-to-text and Voxtral Mini for voice-enabled agents. There is also a registry to manage the lifecycle of models, a deployment environment to configure and monitor all AI assets running on OpenShift AI and support for the Model Context Protocol (MCP) server developed by Anthropic. Red Hat also makes available a compression tool to run AI models more efficiently.

It’s not clear to what degree KServe might be adopted by other providers of AI platforms, but there is definitely a shift occurring where AI applications are now starting to be deployed across distributed computing environments. That approach enables AI applications to scale out versus having to build and maintain a larger set of clusters to handle spikes in processing demand. Additionally, more AI applications are being built and deployed at the network edge, where the size of clusters that are made available is much more limited than in a cloud computing environment.

There are, of course, other options when it comes to deploying AI applications, but the CNCF community is clearly betting that as IT teams standardize on Kubernetes clusters, many of the inference engines that drive AI applications in production environments will be deployed on Kubernetes clusters.

The challenge, then, of course, is finding and retaining IT professionals who have the expertise needed to deploy and manage AI workloads running on those Kubernetes clusters.