GitOps Wasn’t Built for Models, and It Shows
GitOps won the deployment argument. Everything goes in Git, the cluster reconciles itself to match, and your repository becomes the one place that tells you what’s actually running. It’s clean and auditable and, for normal services, it just works.
Then somebody runs a machine learning model through the same pipeline, and it starts coming apart.
Not loudly. That’s the thing. You build the model into a container, send it through the same flow as everything else, let the controller roll it out. It deploys. The pods are healthy. Nothing on the dashboard looks any different from shipping a regular service. The pipeline is perfectly happy, and it has no idea it’s now handling something it was never designed to understand.
Take the whole single-source-of-truth promise. For code that promise is real. The commit describes the artifact completely; build it twice and you get the same thing both times. A model doesn’t play by that rule. What’s in the repository is the code that trained the model, not the model. The actual artifact depends on the training data, the random seeds, which library versions were installed, even the hardware it ran on. Two engineers can check out the exact same commit and walk away with two different models. So Git sits there looking like the source of truth while leaving out the part that decided how the model behaves.
A team I worked with got bitten by precisely this. Their pipeline was the kind you’d put in a slide deck. Every deploy traced to a commit, everything reviewed, full audit trail. Then a model started acting strange in production and somebody asked which version of the data it had trained on. The commit didn’t say. The data lived off to the side, versioned by nobody, and the truthful answer was that they couldn’t rebuild it if they tried. They’d recorded everything except the one thing that would have explained what happened.
There’s a second assumption hiding in every CI/CD pipeline: a green build means a working artifact. For code that’s close enough to true. Tests pass, it compiles, you’ve got real signal. For a model it tells you almost nothing. The container built. The endpoint answers. The pipeline turns green and calls it a win, and meanwhile the model might just be worse than the one it replaced. Accuracy slipped. It’s quietly failing on some slice of inputs nobody wrote a test for. The pipeline can’t see any of that, because “did it build” and “is it any good” are completely different questions, and for models they’re not even close.
And then rollback, which everybody treats as the thing that saves them. With GitOps it’s supposed to be nothing. Revert the commit, let the cluster reconcile, you’re back to yesterday. That whole promise rests on an assumption nobody says out loud: that putting the old artifact back gives you the old behavior. Code, sure. A model, maybe not. If the feature pipeline shifted, or the data drifted, or some preprocessing step upstream changed, redeploying last week’s model can hand you behavior you’ve never seen before. You ran the rollback exactly right and still didn’t end up where you were, because the model’s behavior was never really in the artifact. It was in the artifact plus a world that didn’t sit still.
None of this is an argument against GitOps, or against putting models in pipelines at all. Declarative, version-controlled, auditable deployment is exactly the discipline ML in production has been missing. The trap is assuming the pipeline understands what it’s carrying. It doesn’t. It sees a container, so it treats the model like a container, and a container is the whole of what it knows.
The teams that make this work just stop pretending the model is one more artifact. They version the stuff the pipeline ignores. The dataset gets a version and gets taken as seriously as the code. The training run gets logged so any given model can be traced back to the exact data and settings that made it. “Passing” stops meaning “it built” and starts including whether the model actually holds up on a real evaluation before it’s allowed anywhere near production. And rollback stops meaning “put the old image back” and starts meaning “put the old model and the data context it needed back,” because one without the other is just a slower way to break.
It’s more work, and it’s a lot less tidy than the GitOps story everyone likes to tell at conferences. But that story was written for stateless services, and a model isn’t one. It’s a function of data you never checked into Git, judged by tests you never wrote, with a rollback that doesn’t restore what you think it does.
GitOps will deploy your model without complaint. What it can’t tell you is whether it should have. That call is still yours, and a green pipeline doesn’t make it for you.


