Chronological Kontrol: Google Cloud K8s Cluster History Inspector
Container image bloat, container image over-provisioning, container image misconfiguration and perhaps even container image cost and charge-back anomalies. These are just a selection of the headlines from the [insert Kubernetes deployment team name here]’s latest problem report planner for the CIO to lose their temper over.
Aiming to address these continually recurring issues this month is Google Cloud with its new open-source tool that visualizes cluster logs chronologically to simplify troubleshooting in Kubernetes environments. Kubernetes History Inspector (KHI) is now available on GitHub as a tool for cloud-native systems architects, developers and other software engineers who need to debug problems inside Kubernetes clusters.
Why Cluster Logs Matter
Serving as a comprehensive record of events that happen inside a cluster of cloud-native containers (or indeed the topography of a Windows or enterprise Linux environment network at system runtime) cluster logs detail resource operations, cluster configuration changes, error messages and more.
Why Chronology Matters
That’s why cluster logs matter, but Google Cloud’s tool brings a touch of temporal finesse to analytics in this space with its ability to visualize cluster logs chronologically. So we can further say that chronological cluster logs matter even more because they enable us to understand why one system event happens in the known K8S universe before another.
That might sound simplistic to say that A happens before B and then we can expect C and possibly D, but until we know the timeline of events (and whether or not B and C have happened concurrently, or not), we’re not in a fully informed position to identify the root cause of container component issues, even with the self-healing capabilities that exist within Kubernetes.
Tokyo-based Google Cloud technical solutions engineers Kakeru Ishii and Takeie Torinomi say that simply collecting the Kubernetes logs solves only half the problem, i.e., the real challenge lies in analyzing them effectively because many issues we encounter in a Kubernetes deployment are not revealed by a single, obvious error message.
Causal Relationships, it’s Complicated
“Instead, they manifest as a chain of events, requiring a deep understanding of the causal relationships between numerous log entries across multiple components,” explains the pair, on the Google Cloud blog. “Consider the scale: A moderately sized Kubernetes cluster can easily generate gigabytes of log data, comprising tens of thousands of individual entries, within a short timeframe. Manually sifting through this volume of data to identify the root cause of performance degradation, intermittent failure, or configuration error is, at best, incredibly time-consuming… and, at worst, practically impossible for human operators.”
To deal with the challenge at hand here, Kubernetes History Inspector has been developed as a result of Google Cloud analyzing countless user environments and the customer support tickets they have created.
What a State to Get Into
This technology extracts “state information” for every log collected through Google Cloud Logging, the cloud giant’s fully-managed service built to ingest, store and analyze log data from various sources within and outside Google Cloud. That state information is dovetailed with “raw log data” to provide a record of component usage over time. The resulting analysis is presented as a visual timeline format that administrators and cloud engineers can explore using a graphical interface rather than having to hardcode complex queries.
In terms of use, no prior setup is required to get running with KHI i.e. it uses existing logs without any additional installation or modifications being required. Cloud-native engineers and administrators can use KHI to troubleshoot past issues as well, as long as logs are still available at the backend.
Offering what has been lauded as “effortless log collection”, KHI simplifies the process of collecting and visualizing Kubernetes-related logs. As already, Instead of writing complex queries, users can make use of an interactive GUI. By setting the target cluster type, log types and parameters such as time range and cluster name, KHI automatically generates the necessary queries and collects the logs for visualization.
Macroscopic Microscopic Mashup
Aiming to go further than the pre-existing functions for log analytics available in Google Kubernetes Engine (GKE) and AWS’s Elastic Kubernetes Service (EKS), this timeline provides a macroscopic and microscope view of each Kubernetes environment. The microscopic view shows raw logs and manifests (and their historical changes related to the component) selected in the timeline.
“Effective troubleshooting still requires a solid understanding of Kubernetes concepts and [an] application’s architecture. KHI helps the engineer navigate the complexity by providing a powerful map to view logs and to diagnose issues more quickly and efficiently. While KHI represents a significant leap forward in Kubernetes log analysis, it’s designed to amplify existing expertise, not replace it,” heed Ishii and Torinomi.
The team behind this project calls it a “rich log visualization tool” for Kubernetes clusters and says that KHI transforms vast quantities of logs into an interactive, comprehensive timeline view. This makes it an invaluable tool for troubleshooting complex issues that span multiple components within Kubernetes clusters.