FinOps for Kubernetes: Engineering Cost Optimization
As companies expand their cloud footprints, the potential for financial waste increases. Nobody wants to waste money or cut corners where they should not.
As the FinOps Foundation’s 2024 State of FinOps report underscores, engineering leaders considering a FinOps model in Kubernetes are motivated to prioritize savings and cost optimization. However, the practical elements of cost estimation and optimization are a black hole for both platform engineering and finance teams.
One option: a Kubernetes governance platform may be an initial step to gain clarity into resource use. With policy-based control for cloud-native environments, teams can make informed financial decisions regarding Kubernetes by allowing them to grasp and adopt cost-control strategies.
Some engineering teams are addressing engineering cost optimization challenges by employing cost models to help measure the TCO of their services and applications. However, Laurent Gil, cloud neutrality advocate and co-founder of Cast AI, feels that cost models are not always sufficient for anything but an informed starting point.
For a long-term solution, start by considering the cost drivers. CPU, memory and storage are allocated to each service and execution in Kubernetes. Workloads also grow larger over time, as do the costs related to hosting, integrating, running, managing and securing cloud workloads. While some charges directly relate to computations, data transfer and storage consumption, other factors add complexity. There are also toolings as well as integrations with other cloud services to factor into TCO calculations.
Design for Efficient Cloud Usage
If you run Kubernetes yourself, you need a strong engineering team. It’s challenging to build out yourself and to understand the nuances. Unless you are in a business close to containerization and microservices technology, it’s essentially just a cost center and an inefficient use of resources. You don’t have to hire a team for anything that you want to run at any reasonable level of reliability.
Richard Hartmann, CTO of Grafana Labs, shares two fundamental ways to efficient cloud usage. One is to “Go all-in on bespoke services, leveraging whatever you can to reduce undifferentiated heavy lifting and focus on solving problems that drive your business forward.” Alternatively, he suggests, “Use as few bespoke services as possible, relying solely on the baseline across all providers.” This approach allows you to maintain control and facilitates easy migration between clouds. “Both approaches have merit,” Hartmann says, but he cautions that being in between is usually not ideal, as it exposes you to the drawbacks of both tradeoffs.
Both solutions have similar problems. Cloud computing is expensive, and providers have zero incentive to offer great cost controls. Hartmann points out the inherent conflict of interest, underscoring, “that would enable users to pay less.” Certainly, no cloud provider would want that.
As a result, everyone wants to figure out the right model, with a happy balance between knowing what’s running in production and also how to set up a dev-test environment effectively.
“Many of our customers today are looking at different models internally for their projects running in the cloud through managed Kubernetes offerings, whether a showback or chargeback type model,” commented GitLab Field CTO Lee Faus. “We’ve had a few customers who tried to implement quotas around what they’re allowed to spend using high water, low water marks. But in doing so, they’ve realized that because of how most managed Kubernetes clusters work, they incentivize you to build things like auto-scaling.”
There are more reasons why organizations end up in situations with over-provisioned clusters, which not only lead to poor cycle times from a CPU and memory perspective but also ultimately result in a negative experience for end-user interactions with the applications.
To counteract the risk of uncontrolled spending, Hartmann says, “We have implemented deep control over what specifically we do, and built our cost controls for self-managed clusters and as well-managed platforms.” This approach helps scrutinize operations, ultimately enforcing a chargeback program that encourages a shared sense of accountability across stakeholders.
Both Hartmann and Faus highlight challenges in managing costs and finding the right balance between control and cost efficiency. FinOps practices, they affirm, help organizations to anticipate, control, check and optimize their cloud investments on a proactive and reactive basis.
Here are some (among many) technical considerations for the well-optimized Kubernetes architecture:
Bin Packing
Bin packing lets organizations save costs by reducing the number of nodes necessary to support their applications. This technique helps efficiently allocate resources within a cluster to minimize the number of nodes required for running a workload.
Bin packing in Kubernetes involves strategically placing containers, or bins, within nodes to maximize resource utilization while minimizing waste. If done effectively, bin packing results in more efficient use of hardware resources and lower infrastructure costs.
But bin packing isn’t without its challenges. Engineering a balance between density and workload isolation is important, as is avoiding the risks of over-packing a node. It can otherwise lead to resource contention and performance degradation, potentially affecting the stability of the entire cluster.
Thankfully, Kubernetes offers several built-in features that support scheduling strategies such as resource requests and limits, pod affinity and anti-affinity rules, and pod topology spread constraints. These strategies also encompass best practices for bin packing, including careful planning and testing, right-sizing nodes and containers and continuous monitoring and adjustment.
One often overlooked benefit of bin packing is its positive environmental impact by reducing energy consumption and lowering greenhouse gas emissions.
Resource Asymmetry
There can be a lot of unused capacity when the workloads require a relatively high amount of memory than CPU or vice versa. It is a classic situation where the resource demands of workloads, such as applications or processes running within Kubernetes clusters, are significantly skewed towards one type of resource over another.
This creates an imbalance between the two types of resources. It leads to resource usage inefficiencies within the Kubernetes cluster. While a single node can accommodate multiple workloads (Pods) simultaneously, comprehending how and when a specific workload can be efficiently allocated to a node with available resources can be complex and confusing.
This problem restricts optimum cost control, as underutilized resources can lead to higher overall costs. Rightsizing your nodes helps tackle this problem. Use techniques like resource requests and limits to manage skewed resource demands and improve cost efficiency. FinOps strategies and efficient cost monitoring and management at the pod level can provide a more granular and optimized approach.
Set Requests and Limits
Most teams lack visibility into resource usage, often due to inconsistently configured CPU and memory requests and limits. This lack of visibility makes it challenging to optimize compute costs, especially in a dynamic environment.
Requests and limits are critical parameters that define and control the resource usage (CPU and memory) for containers within pods. Since a pod can contain multiple containers, it’s imperative to set sensible and correct limits and requests for each container’s CPU and memory usage. Considering these parameters for all containers, you can determine the aggregate request and limit necessary for the entire pod.
Setting limits too low or too high can cause problems. For instance, if memory limits are set too low, Kubernetes is bound to terminate an application for exceeding its limits. But if set too high, resources will be over-allocated, thereby increasing costs.
With the appropriate limits defined, the Kubernetes pod scheduler can efficiently allocate resources to pods, which ensures normal function — preventing resource unavailability for others. This, in turn, minimizes resource wastage and promotes better stability.
Use Spot Instances for Non-Critical Workloads
Non-production infrastructure is like the behind-the-scenes crew in a play – it supports the customer experience but doesn’t directly serve them. This includes environments for testing teams, customer success and engineering.
Many companies spend a lot replicating the production environment for these internal needs. But that’s like building a full stage set for every rehearsal! Third-party platforms offering spot instances are a real game-changer. These services help companies save typically around 70% on costs because non-production environments are, by definition, not mission critical. Even if a specific API or something else fails for a short while, it shouldn’t significantly impact overall productivity.
Some companies are reluctant to use spot instances due to concerns over their perceived instability because cloud providers can reclaim them with as little as 30 seconds’ notice. But if your non-critical environment can handle these brief interruptions (like a hiccup in a play), spot instances are a good way to optimize costs. With the right tools and processes, they offer a reliable and cost-effective way to run workloads.
Optimize Workload Resource Allocation With Auto-Scaling
Kubernetes’ auto-scaling function ensures that costs are only incurred for the resources consumed. But if you don’t have requests or limits associated with your pods, it will be a problem. During spikes, the requests increase a lot, triggering aggressive auto-scaling, like multiple node pools and nodes being added to the cluster, which can increase costs linearly. How can you be highly prepared to solve this?
Kubernetes provides two pod auto-scaling mechanisms:
- Horizontal Pod Autoscaler: Scales pods (adds/removes) based on resource usage. This is great if you want to create multiple replicas or multiple instances of a single service to handle surges in traffic. HPA helps you automatically scale your service horizontally, ensuring enough resources are available to serve more customers without significant performance degradation. But note that HPA itself might not directly decrease costs.
- Vertical Pod Autoscaler: VPA optimizes resource usage within existing pods. It monitors how much CPU and memory your pods use and then suggests adjustments to their resource requests. By getting these requests dialed in just right, VPA can potentially help reduce the need for HPA to create extra pods during traffic spikes. This can lead to more efficient resource utilization and potentially minimize cost increases when demand rises.
With autoscaling, Kubernetes can quickly adapt to changing demand, ensuring that the right size and number of pods are being used. However, some upfront work ensures optimal performance and cost savings. This involves conducting initial terminal exercises to determine resource requests for your applications and identify services that might consume more resources than expected. With deliberate optimizations, configure your applications to include CPU and memory requests and limits within the cluster for each component. This will help improve performance while reducing resource wastage and cost by preventing excessive scaling.
Configure Quality of Service
Kubernetes configurations offer three quality of service (QoS) classes for pods: guaranteed, burstable, and best effort. These classes help ensure predictable resource allocation for critical pods while allowing flexibility for less critical ones. Conversely, burstable and best-effort classes provide more flexibility for less critical workloads.
Considering the specified resource limit and request parameters, pods can be categorized into one of three classes. These QoS classes help the Kubernetes scheduler determine the effective scheduling pods on nodes to optimize resource utilization. And they decide the order of eviction priority for pods when nodes experience resource scarcity. But guaranteed pods will be evicted last, ensuring the continued operation of critical workloads.
FinOps for Kubernetes Cost Control Strategies
Cost management on the cloud side can get out of control and most of that stems from not having good rigor in the software development lifecycle — where things are pushed into production before they’re ready or when they haven’t been adequately tested from a performance perspective. There has never been a more important time to adopt FinOps principles ‘inform, operate, and optimize’ because existing solutions do not capture the nuances needed to economically achieve the perfect cost and performance balance.
FinOps is the discipline that exhorts shared responsibility and brings together all stakeholders (tech, business, and finance) to establish policies and best practices for usage that are programmatically enforced. Adopting a FinOps approach can help platform engineering teams dramatically increase their visibility, which is necessary to find ways to reduce costs without affecting performance.
When DevOps gives developers tools and guardrails to build, deploy, and fully own an application, it’s important to also educate about overall cost management. This is because empowering teams to act is the top challenge. And it’s usually not until the bill comes due at the end of the month that finance teams realize there is an issue with sudden cost spikes.
From supporting clients, I can tell you that most organizations using Kubernetes struggle to manage their cloud expenses because their processes do not have a proper review and refining cycle, and the pool of skilled workers in this segment is very dry.
Faus, in our conversation, stipulated, “There’s a term that we’re starting to see a lot of companies use, which revolves around value streams.” Value streams allow us to map back to key performance indicators (KPIs). These KPIs are defined at CEO, CIO and CFO levels where budgets are drawn, resource hiring is planned and new product lines are decided. “This provides a high-level mapping back to those elements and around those value streams. When we drive throughout the given year, we need to have a way to ensure that we are actively tracking these aspects throughout the SDLC and in our cost management.”
Whatever you call it, empowering development teams becomes imperative when using Kubernetes. Taking responsibility to make informed decisions will, in turn, make Kubernetes cost management timely, proactive, and cost-effective. As budgets get tighter, there is a great need for cost control strategies—to build from a knowledgeable foundation your cost controls and implement a third-party solution, whether commercial or open source, to avoid linear cost increases. These strategies are effective for everyone—any cloud provider and even Kubernetes on bare metal infrastructure.
Case Study: LambdaTest
A perfect showcase for the lasting impact of FinOps practices is LambdaTest. This young company, which provides infrastructure as a cloud platform for online browser and operating system testing, quickly scaled up its services after securing initial funding. But it encountered challenges with sudden spikes in cloud costs during subsequent funding rounds.
As its senior DevOps engineering leader, Shahid Ali Khan led the responsible development of LambdaTest’s Kubernetes infrastructure and overall infrastructure system. He shared invaluable insights on navigating the exhaustive platform engineering challenges and adopting FinOps principles, which are imperative for optimizing cloud resources and saving cloud costs.
This case study highlights LambdaTest’s journey to FinOps maturity, emphasizing cost optimization. It outlines a systematic approach with insights from notable leaders to navigate these hurdles. The study discusses technology solutions, strategies, outcomes and lessons from my one-on-one interviews with these leaders.
The Challenges of Managing Infrastructure
As LambdaTest expanded its offerings, its infrastructure complexity also increased, relying on AWS and self-managed Kubernetes to support the company’s data-heavy customers. This architecture allowed them to scale rapidly, and Mudit Singh, the head of growth and marketing, reflects their initial decision. “When we started with Kubernetes, no cloud provider offered a static and stable solution. Around 2017, AWS released Managed Kubernetes, which remained in testing for an extended period. As a startup with a talent shortage, we were unsure about managing our cluster.”
As the firm’s usage increased, each month ended with sudden cost spikes that created more questions around its expenditures, such as “How much are we spending? Is this normal? How should cost be divided between teams, applications, and business units?” Questions like, “What is the problem: over-provisioning, or using too much compute or memory?’ remained unanswered. This situation, Singh shares, “drained our DevOps leaders, platform engineering and finance teams (including the founders) to invest a significant amount of time and attention in understanding the hefty incoming invoices.”
These issues escalated over time. As usage grew, the risk of losing cost control also increased. The company sourced tools to produce cloud consumption but struggled to identify and address cost drivers. LambdaTest also faced challenges in building a team with a FinOps culture and striving for cost visibility.
Singh stressed that identifying and addressing the underlying problems driving up costs proved to be a struggle while managing data centers across continents. Khan expounded on the cross-functional initiatives the company took to gain clarity on cost drivers, achieving visibility and transparency into spending and cloud usage.
Create a Tagging Framework
It is difficult to align reports to business context without insight into workload allocations, and the industry has seen the adoption of more structured approaches to resource management.
Khan detailed, “We have multiple products that are running, and there are shared services among those products. It was getting hard for us to identify which service was contributing to the cost of each product.” Tagging with labels also helps identify over-provisioned resources. “We began by implementing tagging and labeling and utilizing different node pools. This enabled us to precisely determine the cost allocation for each product in terms of specific resources, understand how their requirements have scaled over time, and effectively address those needs.”
It has now become common to use namespaces for each product or service in Kubernetes, with a clear bifurcation of services. This approach lays the groundwork for resource management and supports isolation, resource quotas and simplified access control — enhancing operational efficiency. Most importantly it makes cost analysis, reporting and optimization easier for individual teams, services or business lines.
From Data to Action
The volume of data subject to analysis for cost optimization is always considerable. Being able to vectorize that data and understand where there are errors, where there might be memory spikes, CPU spikes — these are areas where you can not only optimize for the cost structure of managing applications but also provide feedback to the engineering teams. Faus underlines, “This involves actions like automatically promoting an issue or a ticket on the product side to ensure that something is going to be done about that cost as part of a current sprint.”
This process should also involve analyzing time-series data, which helps to identify inefficiencies, make informed decisions about resource allocation and find potential automation opportunities. There are other strategies and optimization tactics you could adopt but consider what you are optimizing for.
So what is optimizing for cost? It’s another metric to consider. For example, I already manage CPUs, pods, memory, storage and compute capabilities. Each one is a piece of the larger puzzle I’m piecing together. So, adding cost into the mix doesn’t change the fundamental approach; it’s just integrating another element into the array of resources we’re already balancing.
Hartmann emphasizes the key focus is on enabling “our tech personnel to efficiently extract and manipulate data to align with our business objectives, rather than the other way around. This strategy, which we also implement internally at Grafana Labs, underscores the critical importance of fostering internal collaboration and knowledge exchange to effectively bridge the organization’s technological and business divides.”
Cost Visibility
The important aspect of optimizing the cost and running the cluster without impacting the performance and usage is monitoring. It is the core pillar toward building awareness and informing FinOps objectives for optimization strategies.
“We tried a lot of solutions and multiple plugins, but we could not get a clear understanding of the volume of requests, the performance of the cluster, or the overall system status,” Khan said. “We implemented distributive tracking (and distributed tracing) inside the cluster to monitor every request, which helped us to identify how services are being used and pinpoint optimization opportunities within the system. This helped us to identify inefficiencies, which increased accountability – informing service owners to take action while enabling things like internal chargeback and showback models.”
Visibility underpins FinOps metrics (idle resources, under-optimized infrastructure) for tracking progress. But the key metric to consider is normalized cost, which, when adjusted for your operating business metrics, provides a more holistic view of your cloud spending relative to your business activities.
Drill-Down Granularity
What you do next depends on where your baseline is. Allowing issues to persist over time makes controlling costs at a later stage challenging. Even if you attempt to control them later, the effort required from your team would be very difficult, diverting focus from implementing features.
A tagging framework with a Kubernetes cost management tool becomes helpful here. This helps you drill down into the layers of your environment so you can see exactly how each application impacts your costs, enabling proactive recommendations for cost savings.
Khan shared their approach, “We began observing all attributes that influence pricing, and based on that, we looked deeper into why there has been an increase, what could have caused it, and then took a rather difficult educational path to show individual teams how their environment impacts their department’s resources.”
The ideal solution, according to Khan’s recommendation, “should provide time-saving features.” An example would be a prioritized list of your environment’s most expensive components, ranked by cost. This allows you to focus on the areas that yield the most significant cost savings first. “We realized this and implemented a proactive approach across teams, ensuring work could proceed in a manner that does not affect production.”
Real-Time Alerting
Now, in cloud-native environments with auto-scaling enabled, your cluster or nodes can scale up or down. Therefore, implementing budgets and alerts within the cloud system is imperative, as non-tracking can lead to significant expenses that won’t justify the solution. Applying custom rules programmatically allows you to receive notifications when costs increase, enabling you to take corrective actions for specific requests.
“FinOps practices have significantly changed how we work,” Khan said. “We measure costs to a significant extent and set budgets for each product. For example, each product has clusters, and some share services. With simple tagging, we can set specific budgets for each product. And when we allocate a certain amount to a product, we get alerts if spending goes over a predetermined threshold.” A small increase triggers an amber alert, and a big jump triggers a red alert.
The optimal solution should alert you to abnormal cost spikes in real-time so you can immediately examine and remediate the issue rather than waiting for weekly or monthly reports. Sometimes, these spikes may serve as an early warning sign of a cyberattack, which requires an immediate and proactive response to safeguard your infrastructure and data integrity.
Encourage FinOps Practices
The more intentional approach taken to plan for change targets places where change is effective soonest. But the worst thing you could do as an organization is to say, “We’re going to inform” without understanding the extent to which overspending is ingrained in your Ops and, importantly, where some of the key drivers are coming from.
The best way to manage Kubernetes at scale is to take a holistic and intentional approach, which also helps in calculating the total cost of ownership and allocating budgets, however, it is not something most companies are doing. Bifurcation of resources is not what most companies are doing either. Many companies manage huge infrastructures, but they lack a dedicated FinOps team for such instances. Their reactive approach to incidents, in terms of cost management, leads to significant financial burdens.
Cloud lets you accelerate, but it can also be a double-edged sword without a proactive approach. According to the CNCF microsurvey report, over-provisioning or having more resources than necessary, is one of the most common factors leading to over-provisioning.
“We’ve analyzed usage data on thousands of applications, and there are three primary reasons companies overspend: Over-provisioning, pod requests being set too high and low usage of Spot instances,” Gil said. The biggest source of overspending is an overestimation of the real CPU/memory usage. “For more than 97% of the applications we analyzed, the pre-optimized utilization of CPU is only 12%. That means that, on average, nearly 90% of compute is paid for, but goes unused.” And these percentages, he added, “are consistent across application sizes and cloud providers.”
The underlying reason that causes most companies to overspend is the lack of education and empowerment. Tech, DevOps and infrastructure teams often lack cost awareness. Change is not easy because building a culture of transparency and openness requires sharing pricing information with engineers and creating a safe space for open communication.
This is difficult for nearly all companies because people are first concerned about not stepping on anyone else’s toes. And it can be very unhelpful if fewer people are bold enough to be involved. The key is to get everyone on the same page regarding the business objectives. This means sharing the plan, how things will look soon, what kind of services are planned for a wider rollout and even the company’s gross margin. “It is transparency that spurs on shared understanding,” Hartmann said. When everyone sees the bigger picture, they can feel the “real pain” of overspending and how their work directly impacts it. This shared understanding empowers team members to contribute to cost-control strategies.