Managing Databases in Kubernetes: What to Consider
Kubernetes continues to grow in popularity: According to the Cloud Native Computing Foundation (CNCF), 84% of its annual survey respondents ran containers in production in 2019. Although the organizations responding would probably be companies running cloud-native applications, it still demonstrates that adoption of Kubernetes is growing quickly. More importantly, Kubernetes has matured beyond just being used in testing and development.
Orchestrating containers is an essential part of building new applications, as they can scale up and down to meet demand and to keep applications portable. However, data needs the same approach too.
Kubernetes: Looking at Applications and Data Together
Kubernetes is commonly used to support application portability between different cloud providers, or to make hybrid cloud approaches across public and private cloud work in practice. This approach was developed with applications in mind—not data.
Any application will create data, and that data will normally be stored in a database of some kind. For containerized applications, the first question is whether that database is run using container images as well. Databases typically run for as long as the application is operational, and in some cases, even longer when data retention for compliance is factored in. Conversely, containers were designed for stateless uses, where images could be used to add more compute where needed and scale back after.
It is possible to create stateful containers that will exist for as long as they are needed. These images link to specific storage and database instances, which will then store the data created by the application. Kubernetes operators for databases manage the life cycle of these containers, adding more if new workloads are created or if there are problems with containers becoming unavailable.
There are benefits to running your database in a traditional way alongside your containerized application, just as there are for running your database in containers as part of the whole application. Running databases separately to your containerized applications makes sense if you are running a private cloud—after all, your data is in one place alongside your application. You can also use your existing database skills to run the database and its associated infrastructure. However, it can be more difficult to manage this over time, particularly given how much change can take place around containers that are ephemeral by their very nature. This can also defeat the point of using Kubernetes for automation.
Instead, running stateful containers that hold the database can be a better option for containerized applications, as the same management approach can be applied to both. When a new database container is needed, either for holding more data or to support a new service, this can be provisioned automatically by the Kubernetes operator. If a database container goes down, the Kubernetes management layer will automatically detect this and replace that image with another, based on the same Dockerfile.
Kubernetes itself is complex. Applications running in containers have more moving parts, and Kubernetes becomes essential the more you scale. However, Kubernetes does do a great job at managing those applications over time. Adding database management processes alongside can help to simplify your workload.
Thinking About Application and Data Portability
Another reason why developers like containers and Kubernetes is that this approach improves application portability. When you run in containers you can take images from one cloud provider to another, or from your private cloud to a public cloud service. At least, that is the theory.
Of course, there are more dependencies that exist around applications than developers tend to account for. Data is one of those dependencies. Let’s take an example of running containerized applications and then connecting to a separate database service based on MongoDB. To move this to another cloud service should be simple—after all, containers can run anywhere. However, this is not always the case when it comes to data.
As part of this example, consider how you would use MongoDB in the cloud. If you want to run on MongoDB Atlas you are linked specifically to the company’s version of MongoDB. This provides great functionality around data backup and management, but is only available from MongoDB the company. If you want to run on the open source community edition of MongoDB, this is possible in the cloud, but it misses some enterprise functionality that many companies need. There are also Mongo-compatible services available, but these are not the full MongoDB, either.
When moving this application to another location and cloud provider, you would therefore need to answer several questions:
- Am I able to run the version of MongoDB that will support my application appropriately?
- Am I able to get a fully open source compatible version of MongoDB?
- Do I want to learn another cloud platform, or invest in training my staff in another platform?
- Will I be able to move out again, if things don’t go the way I want them to?
- What will the service cost for the database side as well as the application side?
Asking these questions should help you make the right decision around portability for your data, and your database, alongside any container service that you might use.