Sidecars or Sharing: A Practical Guide to Selecting Your Service Mesh
Over the last year or so, we’ve seen a lot of confusion about “sidecars or sidecarless” in the context of service meshes. This architectural choice turns out to be very important – not because of sidecars per se, but because it deeply affects how the mesh approaches sharing and multi-tenancy.
In this article, we’ll dive deep into these architectural issues and their tradeoffs, so that you can look at which is the best fit for your environment, rather than just following the marketing hype.
What’s a Sidecar?
To make sure we’re all on the same page, let’s define some terms:
- A proxy is a component that intercepts network communications in order to mediate and measure them. For our purposes here, proxies are always software. All current meshes use proxies.
- A sidecar, by contrast, is a Kubernetes design pattern that involves inserting an auxiliary container (the “sidecar”) into a running pod, without altering the main container. Most current meshes use sidecars, but not all.
It may seem odd to go to the trouble of adding another container to a running pod, but they offer real advantages to application developers and platform engineers.
- Sidecars don’t change application code. Putting a new feature into a sidecar means you don’t have to modify the application code. Sidecars can even add functionality to applications for which you don’t have the source code!
- Sidecars are language-independent. Sidecars are completely isolated from the main application, so they don’t need to use the same libraries, runtime, or even the same language. For example, Linkerd’s micro-proxy is written in Rust and happily runs in a sidecar no matter what the main application uses.
- Sidecars let application developers focus. Sidecars can be owned and managed by the platform team since they’re not part of the main application. This separation of concerns lets the application developers focus on business needs, not low-level platform annoyances.
- Sidecars provide a clear operational model. Sidecar containers belong to a pod, so Kubernetes already knows how to manage their operation and lifecycle. This consistency makes sidecars predictable and straightforward to manage.
Kubernetes sidecars have been used for features like automatic encryption, retries, timeouts, logging, distributed tracing, and more – all without the trouble and risk of modifying, recompiling, and redistributing the application.
Why Would a Mesh Use Sidecars?
Given the kinds of cross-cutting concerns that service meshes have to address, one natural approach for a mesh is to deploy a sidecar proxy next to each application pod. The proxy intercepts all inbound and outbound network traffic from the main application pod, ultimately giving the mesh the ability to:
- Enforce security policies such as automatic mutual TLS (mTLS) encryption, authentication, and authorization;
- Collect and publish observability data like the golden metrics or distributed tracing; and
- Provide advanced traffic management such as per-request load balancing, automatic retries, and timeouts.
Most current meshes, including Linkerd, use sidecar proxies as a way to simplify the operational model. While most meshes use Envoy as their sidecar, Linkerd’s sidecar is an ultralight, ultrafast Rust-based proxy.
Why Don’t All Meshes Use Sidecars?
Sidecars definitely do have downsides. They aren’t necessarily deal-breakers but rather trade-offs that should be carefully considered.
- “Side trucks”. If what’s in the sidecar is particularly resource-heavy, the resource needs of the sidecar can overwhelm those of the application. Weighing down a lightweight microservice with a massive “side truck” (say, an entire Envoy proxy) sharply limits the value proposition of microservices.
- Pod immutability. Because pods are immutable, any change to a sidecar container requires restarting the pod and all the containers within it, including the application pod. Cloud-native software generally handles this with aplomb, but for some applications it’s a burden.
- Visibility and attention. Sidecars are visible components within the pod, unlike software libraries or network functions, and Kubernetes tracks their resource consumption separately from everything else. Splitting out the bill can easily make it seem like sidecars use significantly more resources than embedding functionality in the application or handling it elsewhere, even though that’s usually not the case.
- Rough spots in the past. Kubernetes Jobs used to often end up stalled at completion because the sidecar would keep the pod from exiting. Improvements in how Kubernetes handles sidecars have basically resolved this, but it’s still on many platform engineers’ minds.
Where Does Sharing Come Into It?
Sidecars, of course, aren’t the only architectural approach meshes can take. There are three broad categories of mesh architectures, with the most critical difference being whether – and how – proxies are shared within the mesh:
- Sidecar proxies are never shared. In a sidecar mesh like Linkerd or Istio Legacy, each pod always has its own proxy.
- Node proxies share proxies across an entire node. In a mesh with node proxies, every node has its own proxy, which is shared among all the application pods for the entire node.
- Ambient mesh shares proxies in a more complex way. The ambient architecture uses two proxies: a node proxy handles Layer 4 functionality and a separate shared proxy handles Layer 7 functionality. The mesh administrator has to choose the scope at which the Layer 7 proxy is shared, with popular choices being per-node, per-namespace, or per-pod (yes, you can effectively run an ambient proxy as a sidecar).
Shared Proxies and Multi-Tenancy
Proxy sharing might seem like a fairly minor point, but it is a critical difference between meshes. Sharing the proxies creates a number of challenges in contended multi-tenant environments, like clusters, where independent, potentially uncooperative applications must share the same resources.
- Noisy neighbor effects. One application might overload the shared proxy, affecting the performance of other applications.
- Single points of failure. If a shared proxy fails, all applications depending on that proxy (often an unpredictable set) are impacted.
- Resource allocation challenges. Appropriately sizing a shared proxy that serves multiple applications can be difficult.
- Security challenges. Since the proxy is where authentication happens, it has to hold the secrets used for identity for every pod it serves. Compromising a sidecar proxy exposes one identity, but compromising a shared proxy can expose many.
To be clear, these are not new issues. The very first service mesh, Linkerd 1.x, used node-based JVM proxies written in Scala. Linkerd 2.x moved to sidecars because of the lessons learned by Linkerd 1.x customers wrestling with the problems described above. More recent attempts to go back to sharing proxies bring these tradeoffs back with them.
To Share, or Not to Share
This is the question:
- Sidecars don’t share proxies, so they bring strong isolation and an operational model that’s easy to reason about, though the need for restarts can be annoying.
- Node proxies and ambient can lessen required restarts and resource usage (if your proxies are “side trucks”), but give up isolation and add complexity.
Be careful with blanket statements like “sidecars use more resources”, though: we did benchmarking recently and found that Linkerd’s sidecars are so lightweight that Istio Ambient used more resources. What you run in the sidecar matters a lot more than whether you use sidecars.
Ultimately, selecting the best service mesh requires understanding these tradeoffs and understanding your use case and requirements. Here’s hoping this article helps make your decision-making process a little easier.