BLOG: IBM Spectrum Fusion HCI – Hybrid Cloud Infrastructure for the Enterprise

Ian Wright
Systems Engineer

Containers are definitely “a thing” now

This probably isn’t a surprise to many people who have been following their development, but containers/cloud native development are really taking off. In 2020, Enterprise Systems Group (ESG) reported that 74% of organizations were using containers. You can expect that container usage has continued to grow in the year and a half since that report was released.

This means that businesses want to move beyond their traditional development strategy and their Virtual Machine (VM) environment. This doesn’t mean that VMs are going away, but it does mean that if your business needs to move towards DevOps, DevSecOps, or GitOps strategy, the VMs are somewhat more decoupled from those development practices.

DevOps, DevSecOps, and GitOps

I’ve heard of DevOps. What are DevSecOps and GitOps? Well, let’s get the entire glossary out of the way. First, let’s make sure that we’re all on the same page in terms of containers.

Containers are a small unit of computing. Very minimal. Typically, these would be run on a minimal version of a Linux operating system, such as Red Hat CoreOS (though Windows containers are a thing!). They would be part of a pod with whatever runtimes and libraries are needed for the code. Each container can be part of a service in the architecture of an application.

In the “older” (not that old to some of us) Services Oriented Architecture (SOA), the enterprise was designed around applications running on different VMs and Services. It was about how they all fit together.

With Containers, we can look at Microservices. This is a way of having each container run a service in an application. They could be message brokers, web servers, or pretty much anything else. They are disaggregated, so each part can be developed or updated separately, allowing for faster creation of new versions and new features.

Having the basics out of the way, let’s talk about DevOps.

DevOps is a way of managing Development and Operations together. It is a strategy for Continuous Integration and Continuous Delivery (CI/CD). Businesses that implement a DevOps strategy have a pipeline for new versions of applications to be deployed. So, as developers create a new version of an application it can be put into the pipe and brought into use fairly quickly because the operations side of the equation is connected.

This is only possible because of Containers. With the architectures enabled with containers, the IT infrastructure team doesn’t need to try to keep the runtimes, libraries, and operating systems on the same plane for Development, Test, and Production. The environment that needs to be provided is the same, minimalist, operating system. The pods will contain the runtimes, libraries, and application containers. The same pods (with the same components) will be used in each phase of development, so it creates more consistency for all parties involved.

This strategy is commonplace today. When we go to our phone App Stores, we see updates available all the time. Sometimes multiple times in a day for a given app. This is because developers can deliver them that easily. Additionally, if something isn’t working out, it’s easy to return to an earlier version of the container.

Now, DevSecOps builds on this by integrating Security into the CI/CD pipeline. Obviously faster development is important, but unless security principles and personnel are interconnected, it is certainly possible to end up in trouble.

GitOps (term coined by WeaveWorks) looks at the environment from a different angle. While Dev(Sec)Ops looks at how the application is delivered from development, through processes, to operations, GitOps is looking at the infrastructure. As data centers start to use more automation (such as Red Hat Ansible Automation Platform and HashiCorp’s Terraform) to maintain their “Infrastructure As Code”, Git has become an indispensable tool for making sure that everyone is working together. Tools like GitHub and GitLab provide the Source Control Management that is critical to ensuring that everyone in the organization is working from the correct versions of templates and playbooks.

To be sure, GitOps certainly is used with VMs. It really comes into its own, though, as part of a strategy of full containerization of the environment.

IBM Spectrum Fusion HCI

This leads us to the main topic.

What is IBM Spectrum Fusion HCI and why would it help me with any of this?

When you look at a container environment in the abstract, it’s easy to understand. But whenever I have a discussion with a customer about setting up Red Hat OpenShift this question inevitably comes up: “What do I run it on?”.

That’s where things get a bit more complicated because it can run on many different environments. The compute portion runs on x86, IBM Power, or IBM Z. x86 is the most common, certainly, but it does best if it’s not running in a VM. So, ideally you need to have bare metal servers available specifically for it.

Container storage can be handled a few different ways. Most vendors do have Container Storage Interface (CSI) drivers for block devices. But block devices are limited to “Read Write Once” (RWO) access – that a single pod can write to the volume. If you need multiple pods to write out, then you need “Read Write Many” (RWX). So, while many block storage systems do have CSI drivers and certainly have value in providing data to their applications, they won’t be able to do everything in a container environment.

When deploying Red Hat OpenShift, it’s common to use Red Hat OpenShift Data Foundation (ODF). It was, until recently, called OpenShift Container Native Storage. It’s based on the Ceph open-source project and can be deployed as part of the OpenShift cluster or as its own storage cluster. The right answer for which way to go depends on the environment.

Then, of course, there’s the networking to make sure that there’s sufficient bandwidth to support the communication between the servers, storage, and clients. It gets complicated quite rapidly and can be an impediment to trying to get a container strategy off the ground.

That’s not to say that nobody should ever try to build their own, because it does give a great level of flexibility. With that flexibility comes the power to make decisions that might be helpful. But for most environments it really does feel like a headache when these details start coming up.

Spectrum Fusion HCI is IBM’s attempt to answer this specific need with a HyperConverged Platform. The platform includes storage rich x86 servers (starting at 6 with up to 20 in a rack), with highly available, high performance, Container Native Storage provided by Spectrum Scale: Erasure Code Edition. There is 100 GbE networking, as well as the option of including NVidia A100 GPUs for enhanced AI/ML capabilities.

All of this has a single, web based, management interface. Basically, the only missing component is the Red Hat OpenShift license itself (which, spoiler alert, Mainline can provide!) and any IBM CloudPaks that you would want to run on top of it.

Difference from other HyperConverged Platforms

If you want to run VMs, Nutanix and VxRail are great. Both do a tremendous job at that. What IBM is doing here is creating a HCI platform specifically for the Hybrid Cloud, the environment that nearly everyone wants to implement. These servers can technically run VMs because OpenShift Container Platform does allow for the creation of VMs through OpenShift Virtualization. (But that’s not the focus, nor should it be. If you want to go HyperConverged and it’s just for VMs, we’ll be happy to discuss other solutions available.) The servers primarily provide bare metal on which to run the container environment along with the disks that IBM Spectrum Scale will manage and maintain availability. This is meant to enable easy deployment of Red Hat OpenShift and IBM Cloud Paks.

If your business has a goal of transitioning to a hybrid cloud environment if the development teams really want to be able to do Cloud Native Development, or if you have a business need that could be addressed by something like IBM Cloud Paks for Watson AIOps, this is as close to a “drop in” solution as I’ve seen.

Setting this in place can enable a long-term growth plan as well. With OpenShift, you aren’t bound to the Kubernetes platform provided by any given cloud provider (and all the major cloud providers support OpenShift or even provide managed OpenShift services). So, if you want to move an application that you developed locally into a cloud, be it Microsoft Azure, Amazon Web Services (AWS), Google Cloud, or even IBM Cloud, the applications can run and be managed the same as they would in your data center.

Even if you grow beyond the 20 servers in a rack, OpenShift Advanced Cluster Management for Kubernetes (ACM) will allow you to keep on top of your growing cluster environment. ACM provides capabilities to improve application life cycle management, streamline security compliance, and ease multi-cluster management.

Spectrum Scale: Erasure Code Edition (ECE)

Spectrum Scale is the successor to the well established General Parallel File System (GPFS). It has been used in high performance computing and multimedia environments for decades. Because it has been designed around providing highly available storage in some of the most demanding environments, a key feature of this platform has been GPFS Native RAID (GNR).

If you want a great primer on GNR, check out this video from a researcher in Almaden. GPFS Native RAID for 100,000-Disk Petascale Systems – YouTube But the short version is that GNR is a software based declustered RAID technology. It was designed because drives were growing larger and rebuild times were extending. With those two issues, the chances of having data loss due to multiple drives failing was increasing. This is especially true in environments where you have 100,000 disk drives such as in the video above.

GNR was architected to use Reed Solomon Erasure Coding to protect the data by putting the data and parity on all drives and nodes in the cluster. Instead of having spare disk drives, this was combined with spare capacity spread out across the cluster.

This is combined with capabilities like “Disk Hospital” to keep an eye on drives that are misbehaving, make corrections, and to be aware of disks that have been repeat offenders where errors are concerned. Additionally, every IO (both reads and writes) has a checksum to make sure that the data is written to the disks or read from the disks correctly.

But Spectrum Scale is more than the Erasure Coding that it uses to protect the data. It also has a capability called “Active File Management”. This allows multiple clusters to share cached copies of the same, commonly used, file. This means that you can have multiple clusters sharing files for availability and for disaster recovery.

With each server holding between 2 and 10 7.68 TB NVMe flash drives, Spectrum Fusion HCI will be able to use Spectrum Scale: ECE to deliver incredible speed, as well as reliability, across the cluster.

The Bottom Line

If your business needs to move into cloud-native development, take advantage of IBM Cloud Paks, and make DevOps a reality, this platform is a real opportunity. As mentioned before, it has everything needed other than the OpenShift license itself. If you grow, the servers and storage will grow with you. If you find that you need to grow into a new rack, Advanced Cluster Management and Active File Management will enable that.

Rather than trying to build a new cluster in a more manual process, this will let most environments hit the ground running and make the services available for their users. And that’s really the goal.

For more information on building a Hybrid Cloud environment or gain more efficiencies from your data storage environment, contact your Mainline Account Executive directly or reach out to us here with any questions.

Related information:

Video: Introduction to Containers & OpenShift

BLOG: Container Storage – The Use and Benefit of Persistent Storage in Your Data Center

IBM Documentation: IBM Spectrum Scale: Erasure Code Edition Guide

Video: IBM GNR primer: GPFS Native RAID for 100,000-Disk Petascale Systems – YouTube