OpenTelemetry Offers Business Continuity for Monitoring

June 1, 2020May 29, 2020 Chris Riley application development, monitoring, observability, open source, toolchain

Very soon two popular open source projects, OpenCensus and OpenTracing, will be fully merged to make what I believe to be one of the most powerful tools for sustainable DevOps environments, and that is OpenTelemetry.

I’m sure you have heard the story. In the world of modern applications, how you monitor them has to change. Why? Because the systems we build with microservices, containers, Kubernetes and serverless are too complex and ephemeral for a human to traverse without help. This is where the new trend “observability” departs from classical monitoring to classify these unique sets of challenges and tooling.

BUT. The first part of monitoring anything is getting the data in. This is done via APIs at a code level, but more often than not with an agent. Agents will take data from the application and send it to monitoring and observability tools. In modern microservices-based architectures, a lot of organizations will use the sidecar pattern, which is one step closer to making monitoring a feature of the delivery chain. Agents are great until you have too many. Agent sprawl and management of agents is not something DevOps teams should think too much about. There is the issue of configuration disparity across a wide range of agent types and versions that can impact the quality of the data that DevOps engineers, developers and site reliability engineers (SREs) rely on so heavily. They should be more concerned with what telemetry data from the application is captured, how it is displayed and how it is utilized.

OpenTelemetry is an open source (OSS) project that helps instrument the collection of metrics, traces and other metadata from an application. It consists of a collector that can be instrumented as an agent or via the API and tools to manipulate the data it collects and sends that data to a specified source.

OpenTelemetry takes the idea of monitoring as a feature of DevOps delivery chains one step further. OpenTelemetry can act as a standard, removing the ambiguity of types of agents and how information is collected. But in the core of OTel (OpenTelemetry for the cool kids), there is much more. As part of the collectors are pipelines, which are the logic that happens between receivers, where the data comes from and exporters where the data goes. In the pipelines, standard business logic across the organization can be implemented no matter what the receiver or exporter is. This includes transformations from one format to another and processors.

Processors allow you to manipulate data for things such as redaction or additional metadata. They are a powerful tool to centralize and standardize the data that ends up in your monitoring and observability tool. Beyond out-of-the-box manipulations, extensions expand this functionality dramatically and organizations can easily write their own with the pluggable architecture.

Receivers can push or pull from Jager, the application and other sources. And exporters can export data to tools such as Prometheus and/or an observability tool. This architecture offloads responsibilities from the collector and allows data collection and sending to be language- and data source-independent.

Source: https://www.cncf.io/wp-content/uploads/2020/05/How-OpenTelemetry-is-Eating-the-Observability-World.pdf

All of this means for the enterprise that the pipeline can truly be treated as an application of its own, including the management plane. It will be vendor-agnostic and standardized across the entire organization. Often when a new tool or process comes available that an enterprise wants to utilize, its adoption causes major disruption in an already established delivery chain. With OTel, however, there is no disruption; the mechanism for connecting applications and application infrastructure to management tools stays the same.

OpenTelemetry is being built on a strong community of very active contributors and maintainers. As you would expect many of the usual suspect vendors are very active. Here are the top 10:

Results as of 5/20/20 Source: https://opentelemetry.devstats.cncf.io/d/5/companies-table?orgId=1

The community also has an impressive governance structure, focused on making sure the community is heard via ctronib components. This is not only cool for those who love OSS; it is critical for continuity, for the thing I’m pitching provides continuity for your delivery chain.

One of the things that I believe is neglected in many if not most modern toolchains is considerations around how to ensure application velocity is sustainable. Sustainability means that the toolchain can live independently as a product of its own so that changes in vendor tools or processes do not force a refactoring. With OpenTelemetry as a toolchain feature, continuity is built in. Developers are on the same page as to instrumentation of monitoring as new services come online, considering how they’re monitored is about strategy, not collection.