CloudNativeDay: Google Sees Containers Improving App Reliability

August 10, 2022 Mike Vizard cloud-native applications, cloudnativeday22, google

Containers and serverless computing frameworks play a critical role in making environments more resilient as organizations increasingly depend on the availability of applications to drive revenue.

Steve McGhee, co-author of Enterprise Roadmap to SRE: How to Build and Sustain an SRE Function and a reliability advocate at Google, tells attendees at the virtual CloudNativeDay summit that smaller containers coupled with serverless computing frameworks make it simpler to build modular components that not only isolate dependencies but also make it easier to restore services in the event of a disruption.

That approach then enables IT teams to architect services using application programming interfaces (APIs) that enable software components to be deployed in a manner that advances service objectives, he notes. In addition, it then becomes easier to roll back application services in the event that an update has unforeseen consequences, adds McGhee.

However, McGhee also warns that resiliency comes at a cost. IT teams may aspire to provide “five nines of availability,” but each of those nines adds about $10,000 to the total cost of the IT environment. IT leaders need to carefully evaluate what level of resiliency is required for each element of an application environment, explains McGhee. “It can be a fool’s errand,” he says.

When it comes to containers, much of the focus has been on enabling developers to build and deploy applications faster; applications based on microservices that are constructed using containers also are generally more resilient. If a microservice suddenly becomes unavailable, calls to its API are rerouted to another service to ensure availability. The challenge then becomes monitoring the application environment to determine when services are being adversely affected because rerouting API calls tends to have an adverse impact on application performance over time.

IT teams also must remain wary of creating single points of failure that could take an application offline. It’s often tempting to containerize a large amount of monolithic code at the core of a microservices-based application rather than fully decomposing the application into a set of granular components. The issue is that the larger the container, the more likely it is there is a single point of failure that can take down the application.

In the age of digital business transformation, there’s more focus on application resiliency simply because more organizations are dependent on applications to drive revenue. Today, application downtime directly equates to suboptimal customer experiences that impact everything from perceived brand reputation to the ability to process transactions. As such, the tolerance among business leaders for application downtime is virtually nil. It’s also why so many organizations are looking to hire site reliability engineers (SREs) to ensure application availability.

The challenge, of course, is that changes in dynamic application environments now come frequently as companies vie to provide better customer experiences via software. As every IT professional knows, unfortunately, each of those changes to an application environment adds more downtime risk. In fact, from both a business and technical perspective, the need for a structured approach to maintain application resiliency has never been more critical.