3 Design Antipatterns That Sabotage K8s App Scalability

October 18, 2022October 16, 2022 Patrick Tavares antipatterns, application development, design patterns, scalability, StormForge

by Patrick Tavares

Software design patterns were popularized in the 1990s by the authors of the influential computer science book Design Patterns: Elements of Reusable Object-Oriented Software. Although the book focuses on software development, design patterns can be used to address many IT engineering challenges, including designing Kubernetes infrastructures.

So what are design patterns? A design pattern is an established, general solution to a common problem. Design patterns evolve from the collective wisdom of experienced practitioners in a given field and provide a template for best practices.

Antipatterns can be thought of as the opposite of design patterns: They are common pitfalls that initially appear to be good solutions but prove to be ineffective and are often counterproductive. Antipatterns may seem like attractive solutions, especially when time or resources are constrained. However, they also introduce or exacerbate problems and have a negative effect overall.

This article describes three common antipatterns that can hinder effective scalability in Kubernetes environments. To explore more common antipatterns, consider reading Optimizing Java: Practical Techniques for Improving JVM Application Performance by Benjamin J Evans, James Gough, and Chris Newland.

Antipattern One: Distracted by the Simple

This antipattern manifests when we target only the simplest or easiest-to-change parts of a system rather than analyzing and diagnosing the whole system. Plucking the low-hanging fruit can be deceptive because it often seems like we’re making real progress. The reality is that we’re choosing not to optimize parts of the system we aren’t comfortable with and, even if we affect real change, it’s usually only a local optimum.

For example, let’s say we decide to configure our cluster to autoscale so that our application remains highly available during a half-hour morning spike in traffic. There’s a chance that more efficiently provisioned pods and some network traffic analysis would provide a comparable performance boost, but that requires a deeper analysis of our resource profiles. Instead, we end up paying our cloud provider for a two-hour block of cluster resources which sits underutilized for the majority of that time. The price-performance tradeoff is poor. Alternatively, we may assign a high priority to pods we deem critical but find that our aggregate service performance suffers because other pods are evicted at disproportionate rates.

Being distracted by the simple is often a defensive response to scaling challenges that stretch beyond a team’s comfort zone or to challenges that are thought to be tedious and difficult to solve. We can address this antipattern by ensuring that the team obtains a level of understanding necessary to scale each part of an application and is comfortable iterating through various application tunables to understand how each one affects various performance characteristics.

Antipattern Two: Tuning by Folklore

The availability of information on the internet makes it easy to fall prey to this antipattern. Leading responses on Stack Overflow and similar sites are often popular because they provide easily digestible recommendations that provide immediate benefits with minimal effort. As more users discover the fix, their enthusiasm can create a legend. As with the ‘distracted by the simple’ antipattern, tuning by folklore often feels productive at first. The solutions can seem deceptively simple, and they work!

Unfortunately, as with any legend, much of the truth is masked by a lack of context—and misinformation today is magnified by search engine rankings. Even when the antipattern works for the specific component versions we use and in the specific environment to which we deploy, the solutions are rarely robust or efficient.

Antipattern Three: Missing the Bigger Picture

Missing the bigger picture is one of the most pervasive antipatterns in siloed or small teams. Developers tend to focus on individual settings or components, often relying on benchmarks to inform their configurations without examining the system more holistically.

This antipattern is a product of specialization and of the human tendency to see patterns where there may be none. A single person is unlikely to have the knowledge to examine an entire system, so they focus on what they know and are more likely to attribute differences in effects to the variables they can control.

Even when optimizing small parts of a system produces measurable results, it’s almost always more efficient to consider the whole system instead. In a system as potentially complex as a Kubernetes cluster, it’s unlikely that maximizing the performance of any one component will increase service quality to the degree we’re looking for. In fact, unless we have decisively pinpointed a performance bottleneck, it’s likely that focusing on individual components will actually produce diminishing returns. So unless we’re looking for a bottleneck, it’s best to examine our system holistically to catch interactions and emergent effects that aren’t evident at smaller scales.

Antipattern Root Causes

Why do so many of us end up engaging in these common design antipatterns for Kubernetes application scalability? The major drivers include:

Tedium. Tedium comes from any series of repetitive steps that quickly bore us, even though it may yield useful results and data. Often, the solution to a problem is to gather sufficient data necessary to drive the right decisions. However, the tedious data-gathering is cut short precisely because it is boring, and it is human nature to do what we can to avoid boredom.
Time-intensive tasks. Closely tied to tedium are tasks that are considered time-intensive. They are invariably scrutinized and often trimmed back if it’s expected that the time spent on the activity is not worth the results and insights produced. Again, if we curtail a series of data-gathering tasks before the results are clear, we risk making the wrong decisions.
Difficult data analysis. Even if we have gathered sufficient data, the human brain is only capable of drawing accurate conclusions from data derived from four or fewer variables. Keep in mind that Kubernetes scalability is multi-dimensional and can easily involve ten or more variables.
Ongoing analysis. The above tedious, time-intensive or difficult tasks need to be performed on an ongoing basis for us to adequately respond to the dynamic nature of our applications. Given these conditions, it’s understandable that over-provisioning resources is often viewed as a satisfactory solution, even when we know it’s wasteful and potentially costly.

How Machine Learning Can Help

Automation solutions are available today to offload tedious and time-intensive tasks, including gathering the data necessary to make informed decisions. However, this type of automation alone is not powerful enough to overcome our difficulty in analyzing data with many variables.

Machine learning becomes crucial here, as it augments our abilities—analyzing data in a manner that we humans simply cannot. When combined with automation, ML performs this difficult data analysis on an ongoing basis. It efficiently addresses the dynamic nature of our applications by continually adjusting settings and making recommendations based on analysis that we would simply miss.

How to Scale Efficiently

To learn about successful design patterns for Kubernetes application scalability, read this StormForge white paper on this topic. And make sure to visit StormForge.io and request a demo to see how ML can help you implement these design patterns and avoid common antipatterns when scaling your Kubernetes applications.

To hear more about cloud-native topics, join the Cloud Native Computing Foundation and the cloud-native community at KubeCon+CloudNativeCon North America 2022 – October 24-28, 2022