Case Study: Why InfluxData’s Storage Team Wrote a Kubernetes Controller

Using Kubernetes for production container orchestration enables InfluxData to run our SaaS database product across multiple cloud providers. In the beginning, we chose to run the InfluxDB storage engine as a StatefulSet. It didn’t take long before we realized that the StatefulSet fell short of our particular needs. We needed more fine-grained control over pod management, so we wrote a Kubernetes controller.

Here’s the story behind that decision.

Background

InfluxData is the company behind InfluxDB. InfluxDB is available in multiple forms, one of which is a fully managed cloud option. To provide developers with deployment flexibility, we wanted to offer our SaaS product on multiple cloud platforms. However, deploying and managing the “microservice” components of InfluxDB Cloud on multiple cloud service providers seemed overwhelming. In contrast, if we chose to stick with one cloud provider, then we would have given up meaningful revenue.

We turned to Kubernetes to solve this problem. From the InfluxData software engineer’s perspective, Kubernetes functions as an abstraction layer, hiding the differences between cloud service providers behind a single infrastructure model and API.

Among the components that make up InfluxDB Cloud, the “storage engine” is our stateful workload. The storage engine is horizontally scalable–capacity grows with the quantity of concurrently running storage engine instances. So, within Kubernetes, a storage engine instance is deployed as a Pod, and each of those “storage pods” is assigned a Persistent Volume. The volumes comprise the “data” in the “database,” and the pods supporting a workload comprise a “storage cluster.”

Each storage pod is assigned a specific portion of the storage cluster’s data through a simple hash of the data.

The Problem

In the beginning, we used StatefulSets to deploy and manage storage clusters but soon ran into dissonance between storage clusters and StatefulSets.

Here’s an example. When a storage pod initializes, which portion of which cluster is it assigned? The answer was derived from:

  • The desired quantity of storage engine instances
  • The desired data replication factor of this storage cluster
  • The index of the Pod within its StatefulSet

By relying too much on the StatefulSet abstraction, the storage engine process had to parse these values indirectly from environment variables like HOSTNAME and POD_NAMESPACE and even inspect the StatefulSet resource itself.

Here’s another example. As it turns out, the Kubernetes StatefulSet controller rolls out changes to the Pod with the highest index first, so changes to StatefulSet “foo,” configured with six Pods, are deployed beginning with Pod foo-5, then foo-4 and so on.

InfluxDB Cloud is deployed with CI/CD principles–every code change triggers full integration tests, then deployment of all services that depend on that change. When code changes arrive frequently, Pod foo-5 would cycle with every code change, while Pod foo-0 would cycle only when there were no changes for long periods of time.

Our stateful, distributed storage engine needed better management for these reasons and others. So began the “storage controller” project.

Some Challenges

When we started writing the storage controller in 2018, many of our engineers were still relatively new to Kubernetes. Interacting directly with the Kubernetes API revealed some interesting behaviors.

For example, almost every Kubernetes API returns a success/fail status immediately but may take several seconds or minutes to actually implement. The API client (our storage controller) cannot assume that the desired change is ready or even visible via the Kubernetes API until a little time has passed.

In fact, writing code against the Kubernetes API is like interacting with Kubernetes via the kubectl CLI tool. Automating in this context can be a challenge at first, but in the long term, InfluxData benefitted in two important ways, which I’ll highlight at the end.

Configuration: The StorageCluster CRD

Kubernetes controllers often pair with CustomResourceDefinitions. We are no exception here; a storage cluster is defined with an instance of a StorageCluster CRD. The StorageCluster began life with just a few fields (which should look familiar):

  • The desired quantity of storage engine instances
  • The desired data replication factor of this storage cluster
  • The index of the Pod within its StatefulSet

Since then, the StorageCluster CRD and storage controller have evolved, adding features such as:

  • Canary releases
  • Compaction configuration and status
  • Pod deployment concurrency (without losing database availability)

Code: The Storage Controller

The storage controller is an application running in our Kubernetes environments which creates, updates and deletes storage clusters defined with StorageCluster resources.

The storage controller is implemented as a loop running in a single goroutine. It requires very little compute and memory resources. Each loop iteration follows these steps:

  • Get the desired state.
    • Query StorageClusters CRDs via the Kubernetes API.
  • Get the actual state.
    • Query Pods and PVCs via the Kubernetes API.
    • Query application-specific health metrics from the storage pod API.
  • For each StorageCluster:
    • IF the actual state does not match the desired state
    • AND IF all of the actual storage pods are healthy
    • THEN implement the smallest possible change to nudge the actual state closer to the desired state.
  • Forget everything and sleep for a few seconds.

In a bit of irony, the storage controller manages a stateful workload without maintaining state. By gathering the desired and actual state with each loop iteration, there is never confusion about staleness.

Results

The business logic of automatically managing a storage cluster is delicate and must avoid permanent customer data loss at any cost. This, together with the patience required when working with the Kubernetes API, led to a storage controller design with a simple architecture.

Choosing to implement a custom Kubernetes controller benefited InfluxData in two important ways.

First, by automating Kubernetes Pod management through a custom controller, we have much better control over individual Pods within the workload. This allows us to respond more quickly to incidents, fix bugs and launch features and performance improvements for our customers.

Second, simply choosing to write and maintain a Kubernetes controller strengthened the depth of our engineering organization. Every software engineer that has worked on the storage controller understands Kubernetes much better than they would otherwise.

Overall, adopting Kubernetes generally and our decision to write a custom Kubernetes controller for our specific requirements make our SaaS product more efficient and ensured a consistent experience for both our engineers and our customers.

Jacob Marble

Jacob Marble is an Engineering Manager at InfluxData. He has implemented, maintained, and sunset several backend services, APIs, and Big/Medium/Small Data pipelines. When he isn't writing code, Jacob hangs out with his wife and kids and flies small airplanes.

Jacob Marble has 1 posts and counting. See all posts by Jacob Marble