Upbound Unfurls Control Plane for Managing AI Inference Workloads
Upbound today revealed it has extended an instance of the open source control plane it developed to enable IT teams to manage inference engines running artificial intelligence (AI) models.
Company CEO Bassam Tabbara said IT teams can now use Modelplane to declaratively manage AI inference engines using the same Crossplane control plane many of them already use to manage fleets of Kubernetes clusters. That capability enables those IT teams to define a topology for an AI inference engine that will then be automatically deployed on a Kubernetes cluster, much like any other application workload, said Tabbara.
As a result, IT teams will be able to deploy inference engines based on available capacity across a fleet of clusters, autoscale replicas based on load, cache and distribute model weights, and route inference requests through a unified gateway.
The overall goal is to make it simpler for IT teams to incorporate AI inference engines into the same workflows they currently use to manage cloud-native application environments, said Tabbara.
Designed to be deployed in the cloud or any on-premises IT environments, Modelplane is available under an Apache 2 license with no usage caps, token meters, or other artificial limitations. The underlying control plane is based on an open source Crossplane project that is being advanced under the auspices of the Cloud Native Computing Foundation (CNCF). There are now more than 3,000 contributors from over 450 organizations contributing to the Crossplane project, with more than 1,000 organizations running it in a production environment, including Nike, Autodesk and NASA Science Cloud.
It’s not clear how many IT organizations have adopted a cross plane to manage distributed computing environments. Cloud service providers have been making use of them to manage IT environments at scale for years, but adoption of control planes this far by internal IT teams has been uneven. However, the rise of AI inference workload might soon far exceed any other type of workload running on Kubernetes clusters. As such, the need to be able to distribute AI inference workloads across multiple Kubernetes clusters configured with expensive graphical processor units (GPUs) and other classes of AI accelerators that are generally not very well utilized is likely to soon force the issue.
There is, of course, still much work to be done in terms of optimizing Kubernetes clusters to run AI inference workloads, but as IT teams assume more responsibility for managing them, there is a natural preference for deploying them on a modern platform that is already widely deployed within IT environments. In addition to being able to leverage infrastructure management expertise many IT teams already have, a Kubernetes cluster makes it simpler to scale AI inference workloads up and down as required, no matter where they are deployed.
Regardless of approach, IT teams will need to find a way to govern AI inference workloads. The challenge, as always, is finding a single pane of glass through which that goal can be achieved without having to hire additional specialists just to manage a specific class of workloads.


