Instant Kubernetes Platform Engineering for the Cloud

Platform engineering is the practice of creating a reusable set of standardized tools, components and automated processes, often referred to as an internal developer platform (IDP). IDPs, and the teams that build them, are important because they ease development by providing “golden paths” for developers to self-serve and self-manage their code. 

At KubeCon 2022, Jason English, Intellyx principal analyst, stated that if there was one big takeaway from the conference, “it would be the reemergence of the platform paradigm in cloud-native clothing.”  

In the platform-centric model, enterprises that successfully deliver industrial-strength cloud-native applications are doing so with a platform strategy that eases Kubernetes development and management for DevOps teams.

Rather than struggle to create IDE’s by trial and error, DevOps teams “should instead lean on an expert platform team that can readily package approved ‘golden state’ environments of Kubernetes clusters for them–with scaling, networking and security settings optimized for the target application and infrastructure,” says English.

To further reduce complexity, these types of platforms are being made available from Kubernetes vendors as full-featured turnkey solutions, essentially providing “instant platform engineering.”

Platform Engineering Vs. DevOps

The responsibilities of a platform engineering team should not be confused with those of a site reliability engineering (SRE) team or DevOps team. Although there are similarities between all three roles, understanding the differences can help to explain why platform engineering has become such a significant emerging trend.

Between the late 1990s and early 2000s, most setups had a single system administrator (sysadmin) or operator through whom developers had to go to get anything done. However, the rise of the cloud created the need to accelerate the development and delivery of software to production. The traditional “throw it over the wall” workflow began to create a lot of bottlenecks for both developers and operations. 

This all changed in 2006 when Amazon’s developers began to deploy and run their services and applications end-to-end. This is how DevOps became the gold standard for agile software development. 

Although DevOps resulted in faster and better software delivery, scalability and stability for advanced enterprises like Amazon and Google, adoption fell short for most other organizations and led to a series of organizational anti-patterns. In response to this shift, senior developers took responsibility for this setup, either by doing the work themselves or by assisting their junior colleagues. This approach leads to a “shadow operations” anti-pattern where an organization misallocates their most expensive and talented resources (developers) and still cannot ship code to production as quickly and efficiently as before. 

This anti-pattern appears in several other studies, like Humanitec’s DevOps Setups Benchmarking Report, which reveals that 44% of low-performing organizations have shadow operations, with some developers doing DevOps on their own while helping less experienced colleagues. This is in sharp contrast to top-performing organizations in which 100% of all developers are completely self-serving and operating a “you build it, you run it” approach. 

When organizations underestimate the complexity and importance of operational skills and activities, it causes huge waste and frustration. This in turn, via Conway’s Law, increases cognitive load on developers and creates an environment that is difficult to support.

Platform Engineering Vs. SRE

The rise of cloud-native created the need for engineers to work in production and on operations. Established and popularized by Google in 2003, site reliability engineering is a concept that applies software engineering principles to solve infrastructure and operations problems. Site reliability engineers (SREs) are responsible for ensuring systems are scalable, stable and highly reliable. 

While there is nothing wrong with SRE as a concept, problems arise when it’s adopted incorrectly, especially among organizations that don’t have access to the same talent pool and resources as a company like Google. When organizations hire SREs who don’t have enough experience to meet the needs of their setup, operations engineers take on SRE responsibilities and the resulting “fake SRE” becomes a restrictive role. 

Platform Engineering Solves the Problem

DevOps Topologies research proves that having a dedicated platform team that provides an IDP as a product to developers is the best way to overcome DevOps and fake SRE anti-patterns. When there is too much cognitive load on developers, platform engineering alleviates it by binding complex workflows into a golden path and paved road. When fake SREs create bottlenecks for developers, platform engineering prioritizes developer self-service and automation by offering a consistent and flexible developer experience. 

According to Puppet’s 2021 State of DevOps Report, the common thread across all organizations that are “good at DevOps” is that they have adopted the platform team model, where they found “a high degree of correlation between DevOps evolution and use of internal platforms.” As the report notes, “Not every platform team is automatically successful, but the successful ones treat their platform as a product. They strive to create a compelling value proposition for application teams that is easier and more cost-effective than building their own solutions.”

Gartner identified platform engineering as one of the Top Strategic Technology Trends of 2023, and Gartner analysts predict that by 2026 80% of software engineering organizations will establish platform teams; 75% of those include developer self-service portals.

The Rise of the Platform-Centric Model for Kubernetes

As the cloud, Kubernetes and infrastructure-as-code (IaC) increase in popularity and extensibility, organizations must manage a complex network of systems without the necessary technical knowledge and skill set. To reduce the friction and mental load on developers, many forward-thinking companies have formed dedicated teams that build and maintain internal platforms and establish best practices to accelerate enterprise software production. 

While every development team will have different needs and paths to production, the overall goal of platform engineering remains the same: To accelerate software delivery with as little overhead as possible. Rather than having to understand the inner workings of the IT infrastructure, developers can focus on writing and shipping code in an efficient and reliable manner.

Fortunately, organizations can obtain ready-made “instant platform engineering” solutions from Kubernetes vendors. These fully automated, fully integrated platforms provide a “golden path” for DevOps teams by packaging all the components needed for a production-ready Kubernetes environment in a turnkey solution.

Tobi Knaup

A cloud-native pioneer and evangelist, Tobi Knaup serves as the CEO of D2iQ. Previously, Tobi served as D2iQ’s Chief Technology Officer. As the primary author of the world’s first open source container orchestrator (Marathon) and co-creator of the KUDO toolkit for building Kubernetes Operators, Tobi has the unique ability to understand an organization’s cloud-native journey from all levels--business, technological and talent. And as the driver behind D2iQ’s next-generation Kubernetes platform, Tobi helps make it possible for organizations to navigate the cost and time-intensive challenges associated with enterprise-grade container orchestration. Before co-founding D2iQ, Tobi was one of the first engineers and technology lead at Airbnb, proving the technology’s value at scale in a production environment serving millions of users. A German native, Tobi holds a Bachelor of Science and a Master of Science from the Technical University of Munich.

Tobi Knaup has 3 posts and counting. See all posts by Tobi Knaup