BLOG: Container Storage – The Use and Benefit of Persistent Storage in Your Data Center

Chris Dedham
Solutions Architect

Storage architecture has evolved over the decades as compute power and data capacity demands have increased substantially. Meanwhile, compute power delivery methods within operating systems have gained efficiencies by moving from bare metal, to virtualized, to containerized applications.

About Containers

Containers are highly efficient software delivery systems because the operating systems are packaged with the run-time executables. This configuration enables portability—a characteristic that allows a computer program to be used in an operating environment other than the one it was created in, without having to make significant adjustments. The benefit of portability is significant because the application can be created easily, without the DevOps issue of: “The application worked in one environment but it didn’t work in a different environment.”

A popular container platform is Docker. To create a container in Docker, you do the following:

Create a Docker file – The blueprint for building a Docker image
Create a Docker image – The template for running a Docker container that is built from the Docker file
Create a Docker container – The container is a running process of a Docker image

Think of container creation as a layered cake. Each layer of the cake is a step within the Docker file for building the image. The first layer is the base image of the operating system. The second layer could be a runtime environment like NodeJS. The container image is immutable. When a container is started from an image, it has a clean file system. Containers can be thought of as a process, but they offer greater kernel namespace isolation than an operating system process.

Multiple containers can be spawned from one image. This can be done for scalability and availability, or in container parlance, auto-scaling and auto-healing, respectively. A technology like Kubernetes is used for orchestration of the container’s lifecycle.

Persistent Storage for Containers

Containers can come and go, along with their data, implying that storage is not required. This is a stateless architecture. But what if you want to keep the data generated by a container?

This can be accomplished by making a persistent volume claim by the container to a mount point in the container from the host system that runs the container. However, a mount point from a host is an anchor to the container because container creation is tied to a specific host compute instance that is running the container which, in turn, makes a container less portable. This issue can present a challenge for application designers.

Container Management Challenges

Customers are refactoring their legacy applications into cloud native, microservice architectures to offer user services with better agility. However, containerized workloads that are deployed for microservices don’t always have the storage and data management maturity compared to monolithic architectures using bare metal or virtualized compute systems.

Consequently, monolithic systems can be easier to administer from a storage and data management perspective. This is because the data volumes that are mapped into the virtualized compute systems are always persistent due to the tight coupling between compute and storage. The computational systems make an initial claim of the storage resources, which persists throughout the lifecycle of the compute instances.

VMware has enterprise features like vSphere High Availability (HA), vMotion, and the Distributed Resource Scheduler (DRS) that make virtualized compute resources portable. When the compute resources are moved, they stay connected to their underlying storage definitions. This is often enabled by VMware, vSAN, or similar storage sharing technologies. In fact, hyperconverged infrastructure (HCI) is an architecture that works well for virtualization.

Similarly, VMware has robust APIs for data management that are not present yet for containerized workloads in a similar fashion. The VMware API for Data Protection (VADP) provides data protection vendors a connection point to access VMware snapshots for backup, archiving, and disaster recovery operations. VADP has optimization to only pass the changed data blocks for efficiency purposes. These higher-level management features are still evolving for containerized workloads, even though containers often run within virtualized systems.

Container Storage Interface (CSI)

Fortunately, storage management functionality for Containers is maturing rapidly with a concept called shared persistence that is manifested in a new volume plug-in standard called the Container Storage Interface (CSI). Volume plugins have been around for a while, but their code was embedded within Kubernetes, and that architecture had inherent drawbacks because storage vendors had to depend on Kubernetes to add functionality to the plugin or fix bugs. CSI was developed as a standard to provide block and file storage to containerized workloads that is de-coupled from Kubernetes. It is used by Kubernetes to allocate persistent storage volumes from storage systems without the data storage being tied to a specific container, thereby providing that magical portability containers require to be scalable and highly available in a stateful architecture. It is storage and container orchestration within one platform.

IBM Cloud Pak Solutions and CSI

IBM launched Cloud Pak offerings in August of 2019. Currently, IBM offers six Cloud Pak solutions: Cloud Pak for Applications, Data, Integration, Automation, Multi-Cloud Management (MCM), and Security. The vision behind Cloud Pak is to have a hybrid, multi-cloud platform built on Red Hat’s Kubernetes-based OpenShift Container Platform (OCP), which can be deployed anywhere. The foundation for Cloud Pak is Red Hat Enterprise Linux CoreOS (RHCOS), which is specifically designed to run containerized applications. Integrating IBM Storage with CSI for on-premise OCP is a perfect marriage of compute and storage technology because the provisioning of storage volumes is not tied to any specific container runtime ID. This enables OCP to orchestrate containers for scaling and availability while simultaneously ensuring the correct storage volumes are assigned to the ephemeral containers. Simply put, OCP without CSI can dilute the benefits of implementing containerized workloads.

IBM is Improving Container Management Maturity

Fortunately, IBM is innovating to provide parity between monolithic and microservice architectures as it relates to storage and data management. For instance, CSI is available on IBM’s Flash Systems and Spectrum Scale storage platforms. IBM Spectrum Scale is a high-performance clustered file system which provides extreme scalability, reduces storage costs, and improves security and management efficiency. In fact, IBM Spectrum Scale is a particularly useful platform for containers because its filesystem can be shared across multiple containers easily, and it has built-in tiering, provisioning, and data protection attributes. Its software-defined approach is important so containers can realize their full potential at it relates to scalability and availability.

Moreover, data management functionality for containerized workloads is provided by Spectrum Protect technology. In fact, in the Cloud Pak for Data (CP4D), the Spectrum Protect backup agent for DB2 is built into the container file like a layer in a cake. The DB2 instances can be backed up directly to a Spectrum Protect server upon container startup.

There are additional integrations that are possible with IBM Cloud Pak for Data, such as:

Spectrum Protect Plus server as a container
Container backup enhancements
Enhanced support for Red Hat OpenShift
Container Cluster and App Resources backup/restore
Support for CSI Storage Platforms
Red Hat OCS Ceph RBD
IBM Spectrum Virtualize
Cloud Pak for MCM

More Information

With hundreds of storage certifications, Mainline’s architects and system engineers help our customers select, architect, and implement efficient, secure storage systems that improve their bottom line. For more information on storage solutions, contact your Mainline Account Representative directly, or reach out to us here.

You may be interested in:

BLOG: IBM Storage Support for Containers

VLOG: NAS Consolidation with IBM Spectrum Scale

VLOG: IBM Storage Class Memory (SCM)

OnDemand Webinar: Exploring IBM Cloud Paks

Visit the Mainline Storage page