Microsoft on Kubernetes: Chaos Will Reign Until We Embrace Shared Operational Philosophy & Interfaces

March 25, 2026 Adrian Bridgwater AI Runway open source, AKS Desktop GA, cloud-native AI infrastructure, cross-cluster networking, Dynamic Resource Allocation (DRA) GA, Elastic SAN for AKS, GPU-backed workloads, HuggingFace Kubernetes, KubeRay integration, Kubernetes 1.36 AI, Kubernetes AI primitives, llm-d Microsoft, Microsoft KubeCon 2026, Microsoft open source strategy, shared operational philosophy

by Adrian Bridgwater

Microsoft, like practically every other vendor attending KubeCon + CloudNativeCon Europe 2026 in Amsterdam this week, made positive plays to explain where its stack and roadmap is positioned to underpin Kubernetes development and open source at large.

In a blog penned by Brendan Burns, corporate vice president and technical fellow, Azure OSS and cloud-native, Microsoft, we learned more of how Redmond sees the landscape currently evolving.

Given the inherent and pervading complexity that cloud-native software engineers experience when deploying Kubernetes, Burns thinks there’s a clear pattern in how complex technology matures.

Fragmentation State of the Nation

“Early on, teams make their own choices: Different tools, different abstractions, different ways of reasoning about failure. It looks like flexibility, but at scale it reveals itself as fragmentation,” he said.

How do we fix these scenarios, then? It’s not a question of adding additional capabilities, functionalities, service options… It’s down to whether or not teams can adopt a “shared operational philosophy”… so there’s a real methodology-level adoption curve needed to get past our next hurdles.

Burns says we know this to be true because Kubernetes proved it, i.e., it didn’t just answer “how do we run containers?” It answered, “How do we change running systems safely?” The community built those patterns, hardened them, and made them the baseline.

AI Infrastructure Chaos is the Norm

“AI infrastructure is still in the chaotic phase. The shift from ‘working versus broken’ to ‘good answers versus bad answers’ is a fundamentally different operational problem that won’t get solved with more tooling. It gets solved the way cloud-native did: open source creating the shared interfaces and community pressure that replace individual judgment with documented, reproducible practice,” asserted Burns.

The Microsoft fellow thinks that the convergence of AI and Kubernetes infrastructure means that gaps in AI infrastructure and gaps in Kubernetes infrastructure are increasingly the same gaps.

“A significant part of our upstream work this cycle has been building the primitives that make GPU-backed workloads first-class citizens in the cloud-native ecosystem,” said Burns.

By primitives, Burns is of course referring to all the component parts of a Kubernetes environment, i.e., pods for containers, deployments for scaling, services for networking… and volumes for storage. Also, here we find ConfigMaps and secrets to oversee settings and authentication, while nodes provide the underlying physical or virtual compute.

On the scheduling side, Microsoft has worked with partners to advance open standards for hardware resource management. Key milestones include the fact that Dynamic Resource Allocation (DRA) has graduated to general availability, with the DRA example driver and DRA Admin Access also shipping as part of that work.

Workload Aware Scheduling for Kubernetes 1.36 adds DRA support in the Workload API and drives integration into KubeRay, making it more straightforward for developers to request and manage high-performance infrastructure for training and inference.

Securing AI Workloads on Kubernetes

Beyond scheduling, Microsoft says it has continued investing in the tooling needed to deploy, operate and secure AI workloads on Kubernetes. AI Runway is a new open source project that introduces a common Kubernetes API for inference workloads, giving platform teams a centralised way to manage model deployments and adopt new serving technologies as the ecosystem evolves.

Burns explains that this technology ships with a web interface for users who don’t need to know Kubernetes to deploy a model, along with built-in HuggingFace model discovery, GPU memory fit indicators, real-time cost estimates, and support for runtimes including NVIDIA Dynamo, KubeRay, llm-d, and KAITO.

Other projects that Microsoft is taking an active hand in supporting and extending in the Cloud Native Computing Foundation (CNCF) universe include:

HolmesGPT has joined the CNCF as a Sandbox project, bringing agentic troubleshooting capabilities into the shared cloud-native tooling ecosystem.

Dalec, a newly onboarded CNCF project, defines declarative specifications for building system packages and producing minimal container images, with support for SBOM generation and provenance attestations at build time.

Cilium also received a broad set of Microsoft contributions this cycle, including native mTLS ztunnel support for sidecarless encrypted workload communication, Hubble metrics cardinality controls for managing observability costs at scale, flow log aggregation to reduce storage volume, and two merged Cluster Mesh Cilium Feature Proposals (CFPs) advancing cross-cluster networking.

Azure Kubernetes Service

Of direct relevance to the Kubernetes space from Microsoft is work that runs alongside the company’s upstream contributions. New capabilities in Azure Kubernetes Service (AKS) have come forward across networking and security, observability, multi-cluster operations, storage and cluster lifecycle management.

“For organizations running workloads across multiple clusters, cross-cluster networking has historically meant custom plumbing, inconsistent service discovery, and limited visibility across cluster boundaries. Azure Kubernetes Fleet Manager now addresses this with cross-cluster networking through a managed Cilium cluster mesh, providing unified connectivity across AKS clusters, a global service registry for cross-cluster service discovery, and intelligent routing with configuration managed centrally rather than repeated per cluster,” explained Burns.

On the storage side, clusters can now consume storage from a shared Elastic SAN pool rather than provisioning and managing individual disks per workload. This simplifies capacity planning for stateful workloads with variable demands and reduces provisioning overhead at scale.

For teams that need a more accessible entry point to Kubernetes itself, Burns notes that AKS desktop is now generally available. It brings a full AKS experience to a user’s desktop, making it straightforward for developers to run, test and iterate on Kubernetes workloads locally with the same configuration they’ll use in production.

Microsoft, the Open Source Company, Right?

Microsoft has clearly been busy; it wants to extend its development efforts into every trunk road and tributary of the Kubernetes universe, and the company has clearly moved to embrace a solidified open source proposition.

Long gone are the days of “Linux is a cancer,” as we know. Over the last decade or so, under the purview of CEO Nadella, Microsoft initially positioned itself in zones where it could reap revenue from open projects “as long as they worked through the MSFT stack,” and even that initial knee-jerk reaction is now a thing of the past.

As we hear the company talk about the need for shared operational philosophy & interfaces, it almost feels like the Agile manifesto 2.0 all over again, i.e., we need to look at the way we work, before starting the work… and this time, work starts with community.