Bridging Observability & Security in Kubernetes: Beyond Just Metrics

September 29, 2025 Alan Shimel anomaly detection, C2 traffic, cloud native security, convergence, cross-training, crypto-mining, devops, kubernetes, lateral movement, logs, metrics, observability, observability-driven security, OpenTelemetry, organizational silos, platform engineering, runtime security, security, SRE, tool sprawl, traces

by Alan Shimel

Kubernetes has delivered on its promise of making applications more dynamic, scalable and portable. But let’s be honest: In making all of that possible, it has also blown our attack surface wide open. What used to be a tidy, if brittle, infrastructure stack is now a swirling ocean of containers, pods, and ephemeral services — all changing constantly, sometimes by the second.

To cope with this volatility, the cloud-native community built a powerful observability stack: Metrics, logs, traces, dashboards, and now entire open standards like OpenTelemetry. We can track CPU spikes across pods, follow transactions across microservices, and detect latency down to the millisecond.

That’s all great for uptime and performance. But here’s the problem: Most organizations stop there. They treat observability as an SRE or DevOps problem when in reality it is just as critical for security. The two worlds — observability and security — have lived side by side, looking at the same telemetry through different lenses, and yet they rarely converge. And in 2025, that disconnect is becoming untenable.

Observability Today: Rich, But Narrow

Let’s take stock of where we are.

Most Kubernetes shops run a familiar playbook:

Metrics with Prometheus.
Logs via ELK stacks or Loki.
Traces using Jaeger or the newer OpenTelemetry pipelines.

That observability stack answers one primary question: Is my cluster healthy? If a pod is in a crash loop, if latency spikes on a service mesh route, if resource utilization is off the charts — you’ll know.

But what about why it happened? What if that crash loop is caused by a crypto-mining container slipped into your pipeline? What if that latency spike is the result of a DDoS against your ingress controller? What if a suspicious service account suddenly starts calling APIs it’s never touched before?

From an ops lens, those are just anomalies. From a security lens, they are attacks in progress. And yet most organizations keep those perspectives in separate silos.

Security Needs Observability and Vice Versa

The recent “Kubernetes in the Wild 2025” report from Dynatrace made this point bluntly: More than half of organizations are now using some form of Kubernetes-focused security tooling, but the integration with observability is poor. Security teams still chase alerts from their own consoles, while platform teams drown in dashboards.

This is a recipe for blind spots. Misconfigurations, privilege escalations, lateral movement — these rarely show up as tidy CVE numbers or firewall alerts. They show up as runtime anomalies: Unusual traffic patterns, odd container restarts, workloads that don’t behave like yesterday.

Observability data is a goldmine of security signals. Security data adds context to know which signals matter. Without convergence, both teams are running half blind.

The Case for Convergence

Let’s make it real with a few examples:

Crypto-mining in your cluster: A pod begins consuming abnormal CPU cycles. Observability tells you “high utilization.” Security tells you “this pod pulled an unverified image yesterday.” Together, you realize you’ve got a miner squatting in your infrastructure.
Command-and-control traffic: Network metrics show a sudden spike of east-west traffic. Service mesh traces confirm requests are moving between namespaces that usually don’t talk. Observability shows the anomaly, security interprets it as potential C2 activity.
Lateral movement via service accounts: Audit logs record a service account accessing APIs it has no history with. Observability flags “new behavior,” security reads it as an attempted privilege escalation.

The trendline is clear: Observability without security is blind to intent. Security without observability is blind to reality. The future is in merging the two.

We’re already seeing early attempts at this. Vendors are experimenting with AI-driven correlation across metrics, logs and traces to highlight security-relevant anomalies. OpenTelemetry is being extended to include semantic conventions for security signals. This is a space moving fast.

Barriers to Bridging the Gap

If convergence is so obvious, why aren’t we there yet?

Organizational Silos
Security teams don’t live in Grafana. Ops teams don’t parse security alerts. They don’t share dashboards or even vocabulary.
Tool Sprawl
One team runs Prometheus, another Splunk, another Wiz or Aqua. Each claims to be the “truth.” In reality, the truth is fragmented.
Skill Gaps
Security analysts aren’t fluent in Kubernetes primitives like pods, sidecars, or CRDs. SREs aren’t trained to spot MITRE ATT&CK techniques in runtime data.
Cultural Inertia
Observability has been pigeonholed as a reliability function. Security has been treated as a post-factum audit exercise. Both need reframing.

Building the Bridge

So what does success look like?

Unified Telemetry Pipelines
The same OpenTelemetry stream that powers performance dashboards should also feed anomaly detection for security. Stop duplicating effort.
Policy + Observability
Tools like OPA and Kyverno can codify security policies. Observability can confirm in real time whether those policies are being violated.
Platform Teams as Bridge Builders
The group that owns Kubernetes — usually the platform engineering team—should own the convergence. They understand the workloads, the telemetry, and the pipelines. They’re positioned to design systems that serve both ops and security.
Cross-Training
Ops teams need to learn threat models. Security teams need to learn Kubernetes. Shared knowledge is the grease that makes convergence real.

Shimmy’s Take

We don’t need yet another siloed tool screaming into the void. We need a single truth stream where observability and security feed each other.

Kubernetes isn’t static infrastructure — it’s dynamic, ephemeral, alive. That means runtime data is everything. Metrics, logs and traces aren’t just “keeping the lights on”—they’re your first line of defense against attackers who thrive in the noise.

Stop treating observability as an uptime insurance policy. Stop treating security as an afterthought bolted onto clusters. The two are inseparable.

The future is observability-driven security: Securing by watching, correlating, and acting in real time. If your teams aren’t crossing these streams, you’re not securing Kubernetes — you’re just hoping. And hope, as I’ve said many times, is not a strategy.