Grafana Labs Dives Deeper Into Kubernetes Observability

March 12, 2024 Mike Vizard application performance, distributed tracing, Grafana Labs, kubernetes, monitoring, observability

by Mike Vizard

Grafana Labs today extended its observability support for Kubernetes environments to make it simpler to configure its monitoring tools, troubleshoot issues and reduce costs.

IT teams no longer need to rely solely on either Grafana Agent or Grafana Agent Operator to manually configure infrastructure data. Instead, they can now use a Grafana Kubernetes Monitoring Helm chart to send customized metrics, logs, events, traces and cost metrics to Grafana Cloud. In addition, those Helm charts add support for IBM Cloud as a configurable option.

Troubleshooting Kubernetes clusters has also been simplified, with a “Pods in trouble” section on the home page or the Grafana Alerts page to respond more adroitly to specific issues in a cluster.

At the same time, Grafana Labs has added a comprehensive overview and detailed analysis of the health and performance of a cluster to that page to better understand the severity of the alerts being generated. In addition, a time picker capability surfaces historical data analysis.

Grafana Labs has also added a summary view for clusters, nodes, workloads and namespaces to correlate CPU, memory and storage usage, along with a plug-in that uses machine learning algorithms to generate CPU and memory usage forecasts.

The company is now making it possible to break down costs on a per-container or per-pod basis via its monitoring tool. That tool is also now listed in the Amazon Web Services (AWS) marketplace as an add-on for Elastic Kubernetes Services (EKS). There is also now integration with ClickHouse, InfluxDB and Presto databases.

Finally, an open source auto-instrumentation tool, dubbed Beyla, that runs at the kernel level in Linux operating systems using extended Berkeley Packet Filtering (eBPF) now fully supports Kubernetes.

A survey of 306 IT professionals published today by Grafana Labs finds nearly all (98%) are using open source observability tools, with Grafana dashboards (91%) and Prometheus (72%) monitoring software most widely used. A full 85% are also using OpenTelemetry agent software to collect data.

In total, well over half (57%) described their observability efforts as being proactive, compared to 24% who admitted to being reactive. However, only 41% are collecting application performance and metrics and even fewer (26%) are monitoring service-level objectives (26%) in production environments.

Overall, the biggest observability concerns identified are cost (56%), the complexity of managing multiple systems (57%), the cardinality of the data being collected (47%) and the signal-to-noise ratio of the alerts generated (46%).

Richi Hartmann, director of community for Grafana Labs, said the survey makes it clear that as more data is collected, cost concerns are rising. Nevertheless, as IT environments continue to become more complex, the need for observability tools to investigate the root cause of issues is only becoming more pressing, he noted. The issue is defining a set of goals for moving beyond traditional “war rooms” where IT teams remove potential suspects via a laborious process of elimination without raising storage costs too high, he noted.

It may be a while before most organizations fully master observability, but one thing is certain: Legacy approaches to monitoring a pre-defined set of metrics no longer meet the challenges at hand.