10+ Open Source Kubernetes-Native Observability Tools

February 22, 2023February 21, 2023 Bill Doerrfeld cncf, Grafana, kubernetes, metrics, monitoring, observability, Prometheus

by Bill Doerrfeld

Kubernetes is revolutionizing the way modern enterprises manage their cloud infrastructure. But as the technology develops, so does the need for visibility into systems running on Kubernetes. Without proper observability, it’s hard to monitor the health of applications and services running on Kubernetes. Luckily, plenty of Kubernetes-native observability tools can help provide visibility and monitoring of your cloud-native applications.

Below, we’ll explore a handful of open source Kubernetes-native observability tools currently available such as Prometheus, Grafana and the ELK Stack, among others. We’ll briefly look at the features of each one, link to their Kubernetes Operators where applicable and see how each can help improve the observability of your environment.

Built-In Kubernetes Monitoring Features

It should first be mentioned that Kubernetes ships with some built-in features to aid monitoring. For example, Kubernetes comes with cAdvisor, a tool that can track the usage, performance and metrics of containers. The Kubernetes dashboard can also provide a snapshot of cluster resources. There are also liveness and readiness probes to monitor container health as well as the Horizontal Pod Autoscaler, which can automatically scale pods based on varying metrics.

Prometheus

Prometheus is an open source monitoring solution providing powerful metrics, insight and alerting. This popular tool uses PromQL queries that enable users to traverse time-series data and generate graphs, tables and alerts. Prometheus provides many client libraries enabling it to easily hook into storage and alerting tools. It also provides a Prometheus Operator for Kubernetes-native deployment and management. Prometheus is a graduated project with the Cloud Native Computing Foundation (CNCF). It can be downloaded here and is available as open source on GitHub under the Apache 2 License.

OpenTelemetry

OpenTelemetry (OTel) is a CNCF incubating project that is a vendor neutral, open source observability framework for instrumenting, generating, collecting and exporting telemetry data such as traces, metrics and logs. It is used to create and collect telemetry data from services and ship them to various tools for analysis. This power to transform observability data into various formats is a key benefit of using OTel.

OTel integrates with popular libraries and frameworks such as Spring, ASP.NET Core, Express and Quarkus, and installation and integration can be done relatively painlessly. It is adopted and supported by many vendors in the observability space. OpenTelemetry also has a reference architecture and an OpenTelemetry Operator for Kubernetes, which “manages collectors and auto-instrumentation of the workload using OpenTelemetry instrumentation libraries.”

Jaeger

Jaeger is described as an open source, end-to-end distributed tracing platform designed to help troubleshoot complex distributed systems. Jaeger helps with distributed transaction monitoring, performance and latency optimization, root cause analysis, service dependency analysis and distributed context propagation.

The latest version of Jaeger, v1.35 at the time of writing, now supports the OpenTelemetry protocol (OTLP), which allows for trace data to be received from OpenTelemetry. Jaeger offers a Jaeger Operator to easily get started on Kubernetes. Jaeger is a great solution for organizations looking to monitor and troubleshoot distributed cloud-native systems in an efficient way.

Grafana

Grafana is an open monitoring and observability platform to help visualize data. It can ingest data from many different sources, such as metrics, logs, and traces from Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and others. There are over 100 plug-ins available, both cloud and self-hosted.

In addition to visualizations, you can use Grafana to create dynamic dashboards, filter and query your metrics and explore logs. You can also set up custom rules to send alerts to various systems like Slack, PagerDuty, VictorOps and OpsGenie. The Grafana Operator makes it easy to integrate Grafana monitoring with Kubernetes.

The ELK Stack

The ELK Stack is a powerful combination of tools for Kubernetes observability—ELK stands for Elasticsearch, Logstash and Kibana. Elasticsearch provides a fast and efficient way to store, search and analyze logs. Logstash provides the ability to filter and parse logs, allowing logs to be parsed and structured in a way that is easy to work with. Kibana is the interface for visualizing logs and metrics, allowing users to gain insights quickly and identify correlations.

With the ELK Stack, Kubernetes observability is improved, as users can easily monitor system events and performance metrics, detect anomalies and identify the root causes of any issues. The ELK Stack is a popular combination of tools for observability and many use it to improve troubleshooting and maintain better system performance.

Fluentd/Fluent Bit

Fluentd is an open source project licensed under the Apache License v2.0 and hosted by the CNCF. The tool aims to translate incompatible logging formats and procedures into a unified logging layer. Fluentd can track events from many sources such as web apps, mobile apps, NGINX logs and others. Fluentd centralizes these logs and can also port them to external systems and database solutions like Elasticsearch, MongoDB or Hadoop. This document explains how to deploy Fluentd in Kubernetes using the Fluentd DaemonSet.

Kubewatch

Kubewatch is a watcher that tracks changes to Kubernetes clusters. The tool can send notifications to collaboration hubs and notification channels when resource changes and events occur. The tool can help track the history of deployments, application metrics, version drift and watching other events. Keep in mind that VMware has stopped driving the project, but it’s externally maintained by Robusta.dev in the fork here.

Kube-state-metrics

Kube-state-metrics (KSM) is a simple utility that plugs into a Kubernetes API server and creates metrics about the state of objects inside the cluster. It can be used to generate health reports about objects like deployments, nodes and pods. The tool provides raw, unmodified data and exports metrics in plain text to the HTTP endpoint /metrics on the listening port 8080. This data is designed to be consumed by Prometheus or a similar scraper. You can also open the /metrics endpoint in a browser to view current Kubernetes cluster data.

Thanos

Thanos is an open source, highly available Prometheus setup with long-term storage capabilities, allowing users to scale their Prometheus setup. Thanos makes it possible to query Prometheus metrics across multiple Prometheus servers and clusters. It supports object storage such as GCP, S3, Azure, Swift and Tencent COS, allowing for unlimited retention. It’s compatible with the Prometheus Query API, allowing users to take advantage of any tool that supports the API, such as Grafana.

Additionally, Thanos supports downsampling to increase the performance of querying large time ranges or for aiding data retention procedures. Thanos is not directly tied to Kubernetes, but several community applications can help install Thanos on Kubernetes. Founded by Improbable, Thanos is an incubating project with the CNCF.

Cortex

Cortex is another incubating project within the CNCF. It specializes in enabling highly available, multi-tenant, long-term storage for Prometheus. Data can be kept for a long time with Cortex, which can help with capacity planning. Cortex is designed to make PromQL queries extremely quick through parallelization and caching. Cortex also provides a comprehensive overview of Prometheus time-series data which can be used for analysis. Cortex is scalable, allowing data from multiple Prometheus servers to be sent to a single cluster, surpassing what a single machine can provide.

Final Thoughts on Kubernetes-Native Monitoring

Kubernetes is revolutionizing the way modern enterprises manage their cloud infrastructure. Yet, it doesn’t come without its hurdles—one of which is centralizing the increasingly disparate state of cloud-native monitoring and metrics.

Fortunately, several open source Kubernetes-native observability tools are available to help provide visibility and monitoring of your Kubernetes environment. As you can see, many of these tools are compatible with Prometheus, too, letting you leverage the setup you may already use. Each of these Kubernetes-native tools provides unique features and capabilities that can help improve the observability of your Kubernetes environment, making it easier to monitor the performance and health of applications and services running on Kubernetes.

Did we forget any? Are you using a different tool for Kubernetes observability? Let us know in the comments below.