The Cilium Story So Far
Cilium has become the de-facto networking solution for the cloud-native world, having been adopted by all major cloud providers and many end users across every vertical. But with an ever-expanding list of features and new sub-projects like Tetragon, I often get asked the question: What exactly does Cilium do and where is it headed next? As we celebrated the first-ever CiliumCon this week, let’s recap the story of Cilium so far. I’ll walk through the development of the project from a simple container networking solution to an eBPF-powered connectivity, observability and security platform.
This blog lays out the Cilium Story to understand where it came from and chart where it is going. We’ll start with the state of networking and security in 2016 when the project was founded, see why the project expanded to network security and observability, then deepened into runtime security and observability before finally understanding why the Cilium website headline is “eBPF-powered connectivity, observability and security,” and how that drives the vision for the future of the project.
Let’s Bee-gin!
Revolutionary Roots: Containers and eBPF
Many of the founders and maintainers of Cilium today have their roots in Linux kernel networking, such as Open vSwitch and iptables, communities that led the transition from physical networking to virtual networking. In 2015, Thomas Graf, Daniel Borkmann, Andre Martins and Madhu Challa were working together and noticed that a new technology called containers was starting to sweep across the infrastructure space. Having already seen how virtualization completely changed networking, they saw that opportunity to again disrupt how networking worked with containers. Daniel, Thomas and engineers from several hyperscalers were simultaneously working on a then-little-known technology called extended Berkeley Packet Filter (eBPF) that was meant to revolutionize how the kernel worked by making it programmable. The Cilium project was founded in the collision of these two technology revolutions.
The original vision for Cilium was to build an intent and identity-based high-performance container networking platform with eBPF. They foresaw that traditional networking solutions that they had already built themselves would not be able to keep up with the dynamic and ephemeral nature of cloud-native applications and required a new approach. eBPF provided the platform to create new ways of programming beyond the traditional constraints of the Linux kernel. The project was originally even IPv6 only, but that proved a little too revolutionary for the time.
Zero-Trust: Network Security and Observability
Cilium gained popularity quickly, and Thomas Graf and Dan Wendlandt co-founded Isovalent along with the early Cilium engineers to focus on the Cilium project. It was initially called Covalent; later renamed to Isovalent. Dan Wendlandt was an early employee at Nicira Networks, which created Open vSwitch, and that was where Dan and Thomas first met.
With a company now in place, they wanted to start building the features that enterprises would require to adopt cloud-native infrastructure principles. With eBPF as the foundation, the scalability of Cilium was not an issue. Instead, they turned toward network security, adding network policy and encryption to the project to build the next-generation of zero-trust security infrastructure. Once this got in customers’ and end users’ hands, they quickly realized it was not enough just to secure, Cilium also needed to be able to observe—because many users could not fully understand what was going on in their network. The famous Star Wars demo was also created to help people understand the power of eBPF for network security.
Hubble was added to Cilium to provide network observability, including flow logs, metrics, a service map and troubleshooting capabilities. This provided users the ability to see what was happening in their network and what wasn’t working. The network policy editor was also created at this time to help users create, visualize and understand the sometimes complex and esoteric workings of network policies.
Scaling to Cloud Native: Multi-Cluster, Multi-Cloud and Community Growth
Kubernetes and cloud-native were also moving out of their infancy, and end users were starting to put them into production at a serious scale, running 10,000+ nodes across multiple clusters and multiple clouds. Cilium Cluster Mesh was introduced to provide cross-cluster and cross-cloud connectivity. It was also integrated with all of the previous features of Cilium, so users had a seamless connectivity, observability and security experience across all of their infrastructure.
Cilium has always been run as a community project, with contributors from many organizations. As the project continued to grow, it made sense to bring it under a foundation, and was donated to the CNCF in 2021. Since then, the growth of the project only continued to accelerate, with the number of public adopters increasing from 20 to 92, public case studies from eight to 29, contributors from 263 to 522 and Slack members from 5,800 to 15,000 members. Cilium is one of the fastest-moving projects in the CNCF ecosystem, alongside projects like Kubernetes, Envoy and Prometheus. All of the major cloud providers also chose to integrate Cilium into their Kubernetes offerings, making it the default for users across the cloud-native ecosystem.
All of this growth in size, scope and community begged the question: Where would the project go next?
Deeper Security with Tetragon and Service Mesh
Since eBPF is at the core of Cilium, it provides a very flexible platform that can hook and modify anywhere in the kernel. Tetragon was launched to provide advanced security observability and runtime enforcement—beyond just networking in areas like file integrity and binary execution monitoring, expanding and extending the reach of Cilium. Tetragon is such an adaptable tool that it is already being applied to new use cases like software supply chain security.
Service mesh has also been a hot area for network security. Cilium originally integrated with Istio for service mesh, but users eventually convinced us to integrate service mesh capabilities directly into Cilium. Cilium Service Mesh was launched as the first-ever sidecar-free mesh at ServiceMeshCon 2022. Service mesh has now become a feature within the whole Cilium platform.
Until this point, Cilium has focused mainly on Kubernetes-based clusters, but there is much more infrastructure outside the cloud-native world than in it.
To Kubernetes and Beyond
Infrastructure can be built in isolation, but only really becomes useful when it can integrate with other systems and large legacy IT estates migrate slowly to new technology. Cloud-native also isn’t just Kubernetes, and there are some workloads that will never be able to “go cloud native.” The opportunity to truly transform infrastructure engineering is only possible when you can grapple with these challenges.
With these in mind, Cilium built features to connect Kubernetes to the world and integrate existing infrastructure into Kubernetes. Cilium first built a Layer 4 load balancer to bring existing traffic into the Kubernetes world, and the results were stunning. It allowed Seznam.cz to double their throughput while also reducing CPU usage by 72x. This load balancer uses the same technology as companies like Google and Facebook. It’s called Maglev and allows commodity Linux machines to run high-end load-balancers with fault tolerance. BGP is the backbone of the internet and the key to many data center architectures. Cilium added BGP support to integrate into these networks and provide load balancing in bare metal clusters. Finally, to ease the transition from IPv4 to IPv6, Cilium has a NAT 46/64 gateway that allows IPv4 services to talk with IPv6 ones and vice versa. Early support for connecting and securing external workloads has also been added. All of these, taken together, provide a seamless connectivity, observability and security layer enabled by eBPF wherever end users need to go.
“As applications shift toward being a collection of API-driven services driven by cloud native paradigms, the security, reliability, observability and performance of all applications become fundamentally dependent on a new layer driven by eBPF,” says Dan Wendlandt, CEO and co-founder of Isovalent. “It’s going to be a critical layer in the new cloud-native infrastructure stack.”
While the future of open source projects can be difficult to predict because they are driven by who shows up to do the work rather than top-down demands, I have a few features that I’m excited about that are coming soon. Support for the mTLS datapath has already merged into Cilium, and I’m looking forward to the complete control plane with integrations using SPIFFE and SPIRE being merged to make mTLS and even more secure networks becoming available for everyone with just one line of YAML.
If all of this excites you, there are many ways to get involved in the project. Check out the coverage of the first ever CiliumCon on April 18, 2023, with end user stories from Bloomberg, Datadog, Robinhood, Sky and The New York Times at KubeCon + CloudNativeCon in Amsterdam. Help us grow the buzz!