SRE
It Worked Last Tuesday: What Operators Teach Us About Platform Reality
Infrastructure as code defined the cloud era, but Kubernetes operators are redefining how DevOps keeps systems reliable. Instead of “apply and hope,” operators continuously reconcile reality with intent — automating change, reducing ...
Avery Pennarun | | Atlanta, automation, CI/CD, cloud infrastructure, cloud native, cloud operations, CloudNativeCon 2025, cluster management, configuration management, continuous delivery, control loops, declarative infrastructure, DevOps automation, DevOps culture, GitOps, IaC, infrastructure as code, intent-based automation, KubeCon 2025, kubernetes, kubernetes best practices, Kubernetes controller, Kubernetes operators, Kubernetes reconciliation loop, microservices, observability, operational excellence, operator pattern, platform engineering, platform stability, reconciliation, resilience engineering, self-healing systems, service reliability, SRE
Service Mesh Evolution: Ambient Mode, Gateways & The Return of Simpler Architectures
Service mesh is evolving beyond sidecars. Ambient mode and Gateway APIs deliver security, observability, and traffic control with less overhead. Teams benefit from leaner, more flexible architectures ...
Bridging Observability & Security in Kubernetes: Beyond Just Metrics
Kubernetes has expanded agility but also the attack surface. Alan argues that observability and security can no longer live in silos — metrics, logs, and traces already hold critical security signals, while ...
Alan Shimel | | anomaly detection, C2 traffic, cloud native security, convergence, cross-training, crypto-mining, devops, kubernetes, lateral movement, logs, metrics, observability, observability-driven security, OpenTelemetry, organizational silos, platform engineering, runtime security, security, SRE, tool sprawl, traces
From Observability to Actionability: Why Metrics Alone Aren’t Enough
Observability has plateaued. The next step is actionable observability—using AI, automation, and SLOs to turn telemetry into reliable outcomes ...
Alan Shimel | | actionable observability, AIOps, anomaly detection, auto-remediation, cloud native, continuous verification, devops, ELK stack, golden paths, internal developer platforms, metrics logs traces, observability, OpenTelemetry, platform engineering, SLO-driven operations, SRE, telemetry automation
5 Reasons You Need Application Mapping for Containerized Apps
Application mapping is especially beneficial in a containerized environment where performance issues can quickly escalate ...
How Kubernetes Adoption Fosters Cloud Resiliency
In the last few years, we’ve seen Kubernetes become businesses’ default container orchestration tool, and it’s easy to understand why. With IT teams’ reliance on containers growing as they increasingly prioritize agile ...
Making Sure Your Cloud-Native Applications Can Fail
Make sure your applications can fail. Sounds weird, doesn’t it? But nothing is more critical to creating a highly reliable, cloud-native application than to ensure you can fail successfully. The key is ...
Transform Your DevOps, DevSecOps and SRE to Cloud Native
It is crucial to appreciate the transformative potential of cloud-native technology in shaping the future of business. Cloud-native represents a paradigm shift in designing, building and deploying applications, fully harnessing the benefits ...
SRE Use Cases for AI-Assisted Kubernetes
As indicated in the article Cloud Automation in 2021 – the new normal in the tech industry, an AI-assisted Kubernetes orchestrator can serve many use cases to optimize cloud costs for DevOps, ...
DevSecOps Use Cases for AI-Assisted Kubernetes
As indicated in my blog DevOps Use Cases for AI-Assisted Kubernetes, an AI-assisted Kubernetes orchestrator has a number of different use cases to optimize cloud costs for DevOps, DevSecOps and SRE. This ...

