Google OpenRL Tames AI Model Tuning, Kubernetes-Style
Google has created OpenRL to manage the fine-tuning of large language models (LLMs) in much the same way its Kubernetes container orchestrator streamlines the management of containers.
An open source project from the labs of Google Kubernetes Engine (GKE), OpenRL consolidates many of the discrete operations required to finalize a model through reinforcement learning (RL).
The final stage of building a model, fine-tuning requires a lot of coordination. Datasets must be selected and cleansed. Hardware must be provisioned. Training loops must be debugged. Reward signals must be managed. Inference mismatches need to be dealt with.
Google has found that “decoupling the infrastructure from AI research can make these problems more tractable,” wrote GKE engineers Sunil Arora, Shuby Mishra and Chuang Wang, in a blog post introducing the technology.
“Kubernetes abstracted out the infrastructure and made application developers and SREs life easier,” the trio wrote. Also developed by Google and released as open source, Kubernetes provides a single (albeit somewhat complex) interface for managing fleets of containers (and, later, other cloud-native resources) on a set of cloud compute resources.
Other frameworks have been built to manage model creation workflow in distributed environments, though none offers much in the way of infrastructure management. OpenRLHF runs on the Ray distributed controller and the vLLM inference and serving library. Slime offers post-training management with the SGLang serving framework and NVIDIA’s Megatron training library.
These stacks are tailored for training logic, and though they can be run on Kubernetes, they don’t offer much in the way of the hardware-level abstraction needed to streamline production-level model creation and maintenance. OpenRL understands Kubernetes natively.
Tinker Provides the Base for OpenRL
To build OpenRL, Google implemented the Tinker design pattern from Thinking Machines. This approach provides four APIs, each covering a different aspect of the fine-tuning process.
One API covers the transfer of data in and out of the training environment. A second API updates model weights. A third can be called to generate samples, and the fourth API saves the weights.
“Once you separate out the infrastructure behind the APIs, you start to see the gains in user experience of developing the RL loop because AI researchers no longer have to wrangle the complex Python dependencies like CUDA,” the researchers wrote.
Getting More From Your GPUs
Once an organization commits to managing their own models, they will need to procure some GPUs, either in-house or through a cloud or a neocloud provider. So it is incumbent upon these orgs to keep those GPUs humming with work. OpenRL can help here as well.
Running jobs one at a time isn’t very efficient, given that the RL loops are strictly sequential. “The trainer waits for the sampler to finish rollouts, the sampler waits for the environment to score rewards (which is often bound by slow CPU/network tasks), and the whole loop sits blocked,” the Google team wrote.
Using the self-hosted OpenRL, ML developers can run multiple RL jobs concurrently, maximizing GPU usage. They can manage their RL loops on their own local machines, and draw compute from the training APIs running on Kubernetes clusters.
To help organizations get started on concurrency, Google has also posted an autoresearch recipe (inspired heavily by Andrej Karpathy‘s work) that shows how to run experiments in parallel with the help of agents. Users can also peruse the Tinker Cookbook for other ML-building recipes.
In 2017, Google released an internal project, called KubeFlow, that can be used to operationalize the entire ML development lifecycle on Kubernetes.
Google hasn’t released any documentation yet about how OpenRL could be plugged into KubeFlow’s pipeline, where it would manage the fine-tuning part of the workflow. But if the project proves to be a success, no doubt someone will suss out the details to integrate the two projects.


