Data Protection for Stateful Applications in Kubernetes

April 29, 2021January 10, 2022 Michael Cade data protection, database, kubernetes, stateful applications

by Michael Cade

Stateful applications: Whether or not you believe they belong in Kubernetes, the fact is most organizations run a combination of stateful and stateless workloads in containers. Stateful applications and workloads require back end storage, whereas stateless applications do not. According to 451 Research, stateful applications make up more than half of containerized applications for 56% of enterprises. And although developers may say they don’t have any stateful applications running in Kubernetes, they are likely referring to the data, rather than the application itself.

For comprehensive data protection, the goal is to recover and back up the entire application, including the front-end service, the back-end microservices, the persistent storage volumes and the cloud database. A backup policy on a cloud database has no awareness of the other components of the application or the dependencies, such as microservices. You can’t successfully back up the entire application at the infrastructure layer in Kubernetes — you need a data protection strategy that starts at the application level itself.

Data storage for Kubernetes is accomplished using one of three approaches:

The application includes data services, all within Kubernetes. You have pods with state and pods without state.
Data services are within Kubernetes, but logically separated from the application, and deployed and managed independently. In this database-as-a-service model, applications communicate with the databases via control APIs.
Applications use managed data services that are outside of Kubernetes, such as Cloud SQL, Amazon RDS or others.

These are all valid approaches — and they all have pros and cons. But regardless of where the data lives, having a data protection strategy in place is critical.

Misconceptions about Data Protection in Kubernetes

At a high level, data protection means that you have systems in place to recover applications and data if something goes wrong, such as data loss or corruption resulting accidentally or as a result of malicious activity. For example, you may have bugs in your code or application misconfigurations, or you may need to recover from a ransomware attack that caused an infrastructure failure. Perhaps you need to meet regulatory requirements that mandate having a data protection system in place.

When you consider your data protection strategy, there are certain elements that are table stakes, such as automated backup and recovery, scheduling and retirement policies, security and encryption and recovery SLAs. You must be able to do all of these things at scale, without manual processes, for all types of applications that show up in your cluster.

However, misconceptions about data protection for Kubernetes may get in the way of implementing a solid data protection strategy. For example, many developers misconstrue replication for data protection. While replication boosts resiliency and can assist with infrastructure failures, it’s ineffective to guard against accidental or malicious data loss or configuration issues. A data protection strategy is still necessary to ensure backup and recovery in those cases.

Another misconception is that data backup and recovery features offered by the cloud service or database provider are sufficient. But in a cloud-native application, it’s typical to have a front-end service behind an ingress along with back-end microservices, which might be stateful. One is using a persistent storage volume for unstructured data, and the other is using a cloud database for structured data.

Creating a Blueprint for Data Protection

Creating a blueprint for data protection requires capturing the application configuration and persistent data, mapping these components and resources and then creating an orchestration workflow:

Capture the application configuration: The first step is to determine what Kubernetes resources comprise the application. This may include shared secrets or pipeline information, information about how/what environment the application is running in or other data. Some of this information is available from the API server, such as runtime state. You can also refer to the source code or your repository. Applications are dynamic and configurations change frequently, so if you’re backing up data, you need to be sure you capture the configuration that matches that data, for full recovery.
Capture persistent data: The process for capturing persistent data depends on where the data resides. If the data is in persistent volumes, you can leverage volume snapshots, file system backups or both. The volume snapshot provides a consistent copy of the volume, and file system backups can be used to extract what you need. If you’re using data services, you can snapshot the underlying volumes for crash-consistent recovery, or use application-level tools such as Postgres or Mongo. Using both of these together is a powerful approach for large datasets. Finally, if you’re using managed services, you can use application-level tools or the managed service API, which will provide a copy of the data to use as part of the backup set.
Orchestrate the workflow. Once you’ve determined all of the components you need to capture, the next step is to determine the orchestration workflow for backup and restore. Every application has a specific workflow for backup and recovery. There may be application requirements, or you may need to quiesce the application. Or, there may be operations to run before and after the backup and recovery process. You will also need to consider interactions between Kubernetes and the containers. For example, how do you gain access to the application data and volumes? Will you need to shut down and restart services?

The Importance of Workflow

The example below illustrates why workflow orchestration is such a critical step to Kubernetes backup and recovery.

Consider the standard recovering table for Postgres from physical backups. If Postgres was running in a VM, you would first need to shut down the Postgres server, then restore the database and the log files. Next, you would perform a recovery step, then start Postgres. However, in a containerized environment in Kubernetes, as soon as you shut down the Postgres instance, the container will shut down, and you’ll lose access to the database and files. In fact, you can’t progress any further in the workflow. Another option is to use init containers, but they become brittle, because you end up embedding scripts in your application in a sidecar container.

A better approach for orchestrating backup and recovery in Kubernetes is to scale down the original Postgres instance and start another pod that contains all your tools and the Postgres instance. You can attach your data volumes, then run through the remaining steps of your workflow and restore the database files. When you’ve finished, you can shut down the pod, then scale up the original Postgres application.

Where Will You Store Your Backups?

All of this backup and recovery takes up space, so deciding where to store your backups is another critical step. Object storage is a good choice because it’s cost-effective and scalable and offers multiple tiering options and security controls.

If you’re leveraging data services, volume storage or RDS snapshots, consider the underlying guarantees. Storage appliances take snapshots, but those snapshots aren’t really backups – if the appliance goes away, the snapshots are gone. If you’re storing snapshots in the cloud, your cloud service provider might offer better guarantees.

Portability is another consideration. What environment will you be using to recover the applications? Does it offer the same infrastructure? If it does, you may be able to use a snapshot that’s tied to that infrastructure. If not, portability will be essential.

Lastly, security and encryption are paramount. Consider who can access the data and who can restore the data, and ensure you have the right controls in place.

A New Approach for Seamless, Reliable Data Protection

Deploying applications on Kubernetes without the appropriate data backup and management systems in place could put your organization at risk. While there are many options for backing up stateful applications in Kubernetes, appliance-based approaches and replication services are complex and often incomplete. Protecting data and applications at scale without impeding productivity or innovation requires a new approach.