Understanding the Container Storage Ecosystem

November 17, 2020November 16, 2020 Barbara Murphy container storage interface, containerization of applications, containers, kubernetes, storage

by Barbara Murphy

The growing use of containers means storage increasingly is part of the conversation

As full enterprise workloads move to the cloud, containers have become the go-to technology for application developers. Containerization of applications is a packaging mechanism that abstracts the application from the underlying server hardware, and enables the fast deployment of new features.

Containers isolate one application from other applications while allowing each application to have full access to all necessary system resources. Using containers is in some ways similar to creating a wholly virtualized environment, but is much more lightweight. This concept and implementation of containers allow for cloud-native applications to be quickly delivered to customers.

There are several popular formats for containers, with Docker leading the way, according to some estimates. However, the management of containers, such as creation, replication and deletion, requires a management interface and the most popular today is Kubernetes (K8s). Kubernetes is widely available and has been open-sourced by Google, a pioneer in the development of containers. Kubernetes is a framework for managing containers that an application requires, or that a set of applications need to work together. Today, many developers use Kubernetes as the orchestration software behind containerized applications deployed in public clouds.

Application Deployment Models

There are several options for deploying applications. In the past, an on-premises data center or a bare-metal cloud environment would typically dedicate a new server to a specific set of applications. The OS would be set up, the application(s) installed and, once certified, rarely would be touched. This often resulted in an infrastructure imbalance for demanding applications or wasted resources if the installed applications did not use the underlying hardware to its fullest.

Virtualized deployments sought to overcome the issue of resource imbalance and under-utilization. By allowing for full guest OSes to be installed on top of a host OS, an entirely different operating environment could share the same resources. However, a complete guest OS installation was very “heavy,” in that a full OS had to be installed. Although the underlying resources could now be utilized at a higher level, the distribution and management of full OSes were cumbersome and resource-heavy.

Containers are “packaged” solutions which only contain what is needed for an application to run. This would include runtime libraries and any supporting software that is required. As can be seen below, there are no guest OSes that need to be installed, and the container runtime manages the communication and other resource needs to the base OS. The applications share a single operating system and are referred to as “lightweight”; the result is much higher performance than can be achieved with virtual machines.

Since containers are lightweight, creating a new one and spinning it up is quite fast and straightforward. New containers that are needed for a new service or more distribution of the workload can be implemented quickly, and the results are apparent immediately. By design, a container is stateless, which means that when the container is removed the data associated with that specific container disappears as well. This creates a challenge in that critical information or data, whether in memory or on a storage device, will not be available for future use.

Many applications are stateful, meaning that the data must remain available even if the application stops executing. Stateful applications include single-instance databases such as MySQL, Postgress and MariaDB; NoSQL databases such as Cassandra and MongoDB; in-memory databases such as Redis and MemSQL; data processing and AI/ML workflows such as Hadoop, Spark, Tensorflow, PyTorch, Kubeflow and NGC.

Also, many applications are performance-sensitive and use local SSDs to achieve low latency access for critical data. However, this creates an issue if containers are the method of deployment. If a container goes down, then the data that is stored in that container will be lost as well. A persistent storage mechanism that delivers data from SSDs (even faster than local access) across a pod of servers and works within the container system is required for leading-edge applications.

In the past enterprises have relied on using local storage and a local file system that is installed on the application server and accessed through a container runtime. However, as applications become more distributed and the data sets grow beyond local storage, increased demands are placed on the I/O system. These environments need a container that includes a persistent file storage system that can respond to application requirements. Persistent storage, where the data is always available even when a container is moved or removed, is critical when the data must remain available to other applications and cannot be destroyed. A parallel file system that uses a global namespace, working within a container-based environment, is ideal for stateful applications that require fast access to massive amounts of data.

A Container Storage Ecosystem

A container storage interface (CSI) enables storage vendors to create a plug-in for adding or removing volumes of storage across different container orchestration systems. CSI has been promoted to GA status with the Kubernetes V1.13 release. CSI allows developers and storage vendors to expose their file systems, whatever these may be, to the applications running in a container. Thus, storage becomes much more extensible and allows innovative storage products to be made more widely available. By using CSI, for example, a highly performant parallel file system or PFS can quickly be used by an application, meeting the performance requirements of large and distributed applications that require high-speed input and output.

Innovative applications need to be able to incorporate the latest developments that users demand. By combining new storage capabilities through the Kubernetes CSI plugin, applications can share data across a range of cloud providers with pre-determined security features enabled.

Container as a service, or CaaS, contains all necessary components and management services to start, organize, scale, execute, replace and stop containers. An essential addition to a CaaS offering is to include packaged I/O volumes through the CSI.

CaaS is most commonly positioned as part of an IaaS offering. Some of the benefits of using a CaaS include:

Reduction in operating expense – Enterprises only pay for what is currently consumed, which includes the compute, the storage capacity and the I/O portion of the application.
Scaling up or down as needed – As workloads change, the enterprise can scale resources either up or down with CaaS.
Developers can respond quickly – By using CaaS, developers can quickly deploy new or updated software, which includes new application features or new storage options.

A variety of industries will benefit from Kubernetes CSI plugins, such as high-performance computing applications, artificial intelligence, machine learning and databases. Persistent storage that is fast and easily accessible by multiple containers running applications, whether in serial or simultaneously, allows new applications that require massive amounts of data to be ingested and acted upon.

Whether implemented on-premises, in a public cloud or as part of a hybrid cloud solution, this new software development simplifies cloud-native applications. As container-based software becomes the norm for easy deployment and workload scaling, using a CSI plugin is the way to go for getting the most out of a cloud strategy, while reducing costs.

Effective Kubernetes CSI plug-ins that deliver CaaS functionality address many of the challenges that innovative software developers and their customers face, such as:

Support for stateful and stateless applications.
Shareability of data—a single data set can be shared across many containers.
Scalability of data—ability to scale to exabytes of storage in a single namespace.
Performance of storing or retrieving data—low-latency NVMe performance across a shared file system.
Portability across various cloud systems, including multi-clouds, hybrid-clouds and on-premises deployments—container mobility provides cloud bursting capability.

With these flexible deployment options and persistent storage, organizations can make faster decisions based on more rapid access to data and have the confidence that valuable data is always available.