Observability at the Edge with OpenTelemetry

Over the past two years, our world has experienced drastic and unprecedented changes to the way that we live and work. We have all learned, quite poignantly, how global trends shape our individual lives and communities. In this environment, it’s become more important than ever to understand how software and technology works—and how well it works—for each individual end user.

This focus on individual performance and utility has been a driver for edge computing for many years. From relatively humble beginnings, the edge has expanded to encompass the hyper-connected future of 5G, IoT, AI and machine learning. As our distributed systems scale and expand, so must our ability to understand the performance of these systems and the experience of our end users.

Let’s make sure we’re all on the same page before we continue. Broadly, edge computing refers to the “shift left” in workloads away from centralized data centers toward users, either at the service provider level or after the last mile. The smartphone in your pocket or the smart light bulb in your lamp are both examples of edge computing. Observability is, broadly, technologies and practices that help you understand and quantify performance in a system.

It may surprise you to learn that observability is also part of a “shift left” movement in monitoring and performance management. Traditional monitoring looks a lot like the old way of doing things—inscrutable dashboards that don’t keep up with today’s rapidly changing needs, centralized command-and-control and lock-in to proprietary vendor systems.

Crucially, edge computing and observability look to both improve end-user experiences by moving everything closer to the end user logically and, in some cases, physically.

The New Edge Stack

The 2021 State of the Edge Report defines a “new edge stack” that includes three distinct layers—systems, management and deployment—that are bound together by a “common observability layer, providing information tailored to the needs of different stakeholders.” However, traditional logs and metrics alone cannot suffice at the edge; observability is about more than just collecting data, it’s about asking questions of that data.

Effectively asking questions about our edge systems requires more than just standing up the ELK (Elasticsearch, Logstash, Kibana) stack and Prometheus. We need to think about how we correlate data from not only our software’s runtime but also the three layers of our edge stack. We need to be able to drill down to a single request as it moves from edge device through the public cloud and into our private data center while also pulling back to the 30,000-foot view where we can understand overall system health and performance on a global scale.

This may sound fantastical, but there have been several important milestones over the past year in tooling that are making this a reality. First, the OpenTelemetry project has moved into the incubation phase. With over 3,000 contributors and more than 20,000 pull requests, this is one of the most popular projects in the Cloud Native Computing Foundation (CNCF)—and one of the most exciting observability projects in years.

The Lingua Franca of Observability

OpenTelemetry aims to supply a specification for encoding, transmitting, collecting and generating metrics, logs and trace data from software. This lingua franca of observability data creates a truly vendor- and tool-neutral playing field, integrating with many other CNCF projects such as Prometheus and OpenMetrics, as well as a consensus of proprietary and commercial tools. “Write once, run anywhere” telemetry is the perfect companion to edge workloads as they move up and down the edge stack.

In addition, OpenTelemetry supplies a suite of collection and processing tools, such as the OpenTelemetry Collector, that allow for the capture of telemetry data at the edge. As an open source component, it’s able to be customized for low-power or other resource-constrained environments and can act as a processor and proxy for telemetry data—even data that didn’t start in OpenTelemetry format. Need a scalable, cloud-native way to capture simple network management protocol (SNMP) data and convert it to some other format? The collector can help! In the future, you can imagine custom builds of the collector being strategically deployed to the last mile to collect telemetry from IoT and smart devices as part of a home hub, which can then be redacted to protect personally identifiable information (PII) and then centralized, allowing engineers to understand the real-world conditions that these devices are working under.

Finally, OpenTelemetry’s APIs and SDKs allow it to be integrated into management and systems software, as well. For example, Kubernetes v1.22 has added OpenTelemetry support for tracing APIServer requests, .NET 6 is integrating OpenTelemetry as an extension to built-in metrics and tracing libraries and contributors have built CLI wrappers to allow traces to be created as part of shell scripts. OpenTelemetry is already delivering on its potential to be a transformative project for observability, and I’m excited to see how it integrates into the future of edge computing as well.

OpenTelemetry enables observability at the edge. Engineers will be able to more quickly spot problems in the wild, but also be able to proactively understand end user pain points. Businesses will be able to understand and quantify how changes are affecting end user experience with their products. Users will have more reliable, more resilient, higher performance systems. It’s not here yet, but you can see it all on the horizon.

To hear more about cloud-native topics, join the Cloud Native Computing Foundation and cloud-native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021

Austin Parker

Austin Parker has been solving - and creating - problems with computers and technology for most of his life. He is the Principal Developer Advocate at Lightstep and maintainer on the OpenTracing and OpenTelemetry projects. His professional dream is to build a world where we're able to create and run more reliable software. In addition to his professional work, he's taught college classes, spoken about all things DevOps and Distributed Tracing, and even found time to start a podcast. Austin is also the co-author of the forthcoming book Distributed Tracing in Practice, available in early 2020 from O'Reilly Media.

Austin Parker has 2 posts and counting. See all posts by Austin Parker