OpenTelemetry is Fundamental to Cloud- and AI-Native Developers

Over the last year, generative AI has taken the world of software development by storm. The launch of ChatGPT in November 2022 kicked off an avalanche of tools, all promising to aid software engineers in their never-ending journey to write better and more reliable software. We shouldn’t miss the forest for the trees here, though—all of the AI assistance in the world doesn’t replace the need for developers to understand their systems.

Indeed, the role of observability has only grown in importance. Regardless of how you’re using AI—either as a labor-saving assistant or as an integrated part of the applications you write—understanding what that AI is doing is increasingly important. Nearly three-quarters of developers surveyed by StackOverflow reported using AI tools or planning on using them soon. Generative AI’s economic potential is forecasted in the trillions of dollars annually. For this to come to pass, though, we must be able to build trust and confidence in our software, especially as we add infinitely more complexity via AI.

OpenTelemetry is Valuable for Building With AI

OpenTelemetry is the crux of this observability story. Already, we can see that LLMs demand observability due to their nature as a black box. OpenTelemetry can provide out-of-the-box instrumentation for many popular LLMs and other AI services through integrations with client libraries. Projects like OpenLLMetry build on existing OpenTelemetry APIs to provide vendor-agnostic instrumentation of popular vector databases and other LLM tools. Future work in OpenTelemetry will provide standard semantic conventions for things like LLM requests, distinguishing steps in agents or chains and patterns such as Retrieval-Augmented Generation (RAG). This will improve the ability of analysis tools to provide templated dashboards and queries to help you understand and interrogate AI results in your software.

This shift will not just be seen in how we build software or write code, though. Developers will need to become more adept at building with observability itself in mind. OpenTelemetry, again, plays a crucial role in this. By instrumenting their software as they go and providing rich contextual attributes that describe what is happening and to whom, developers will be able to leverage AI-enhanced analytic tools to better understand the impact of changes, reduce downtime and errors and build SLOs that accurately represent user-facing performance.

How OpenTelemetry Helps Developers Using AI Today

Many of us don’t have time to wait for the future—we’re building it as we speak! If you’re developing with AI, here are three ways that OpenTelemetry can help you today:

  • The OpenTelemetry Registry has dozens of integrations with popular Python packages that underpin most generative AI client libraries. There are even helpers for specific clients, like this one for OpenAI. Since OpenTelemetry data is vendor- and tool-agnostic, it can be used with most analysis tools.
  • OpenTelemetry data itself is ideal for model training and analysis as it’s highly structured and documented. Using open source tools and big data processing, you can create up-to-date models of your system. More commercial observability tools are beginning to adopt this as well, such as Honeycomb’s Query Assistant or New Relic’s Grok.
  • Code assistance tools, like GitHub Copilot, can be used to aid in instrumentation as well. You can ask them to suggest attributes for traces or how to create useful metrics about your application.

As these tools become more sophisticated, we’ll see even more advanced applications and workflows unlocked through AI.

Cloud-Native Will Become AI-Native

For cloud-native developers, the biggest advantage of using AI and large language models is that the AI can be aware of cloud capabilities and suggest them to you. How many times have you run into mysterious Kubernetes problems that you couldn’t interpret due to a lack of data or because you didn’t know what question to ask?

AI can be a huge help here by being aware of the myriad complexities that cloud-native developers and applications face. OpenTelemetry, again, is crucial as it provides high-quality data about cloud-native applications out of the box. The difference between developers and teams that are effective at becoming AI-native and those that aren’t, though, will be traceable to how well they adopt observability principles.

To get the best results, you won’t be able to slap on an agent and call it a day, you’ll need to describe the intent of your system. You’ll need to use the right instruments and layer your telemetry in such a way that analysis tools can use it to build better models—not just of what has happened, but what should happen. This point bears repeating: The difference between good AI systems and bad ones is data quality and structure. OpenTelemetry is critical to AI in observability by providing semantic conventions to improve quality and a standardized model to improve structure.

This isn’t just a re-warmed argument in favor of ‘AIOps,’ which often winds up being alert coalescing or basic anomaly detection. AI-native developers will be able to leverage these new tools to ask deeper questions and get answers to their thorniest performance optimization questions about platforms like Kubernetes without having to jump between multiple disconnected tools. You’ll be able to make better decisions faster and have more confidence about what you’re shipping to production. As AI and LLMs play increasingly larger roles in not just how we write code but in how we build distributed systems and applications, observability will be an absolutely essential ingredient in how teams and organizations deal with the uncertainty that AI can bring.

And, at the end of the day, you’ll have happier users—and less stressful on-call times. Isn’t that what we really want, after all?

To hear more about cloud-native topics, join the Cloud Native Computing Foundation and the cloud-native community at KubeCon+CloudNativeCon North America 2023 – November 6-9, 2023.

Austin Parker

Austin Parker has been solving - and creating - problems with computers and technology for most of his life. He is the Principal Developer Advocate at Lightstep and maintainer on the OpenTracing and OpenTelemetry projects. His professional dream is to build a world where we're able to create and run more reliable software. In addition to his professional work, he's taught college classes, spoken about all things DevOps and Distributed Tracing, and even found time to start a podcast. Austin is also the co-author of the forthcoming book Distributed Tracing in Practice, available in early 2020 from O'Reilly Media.

Austin Parker has 3 posts and counting. See all posts by Austin Parker