Director » Power Systems
On August 17, 2020, IBM announced and presented the POWER10 processor at the virtual HotChips 32 (2020). Next generation POWER processors have historically been on a three-year rollout cycle and it has been a little over three years since POWER9 was introduced. IBM POWER10 represents an incredible advancement of POWER (Performance Optimized with Enhanced RISC) micro- chip architecture and demonstrates IBM’s commitment to ensure Power Systems support today’s dynamic business trends. This blog outlines the new technology and, while it may be a little technical, relates the advancements to how they will address business needs with enhanced performance, scale, security, and packaging options.
The IBM POWER10 Processor Chip by Samsung
First, here is a picture of the chip. Only the chip has been announced at this time, not the servers themselves.
- The POWER10 processor contains 18 billion 7nm transistors and it is slightly smaller in size compared to the POWER9 processor chip but with a 3x improvement in energy efficiency.
- It supports up to 15 cores (manufactured at 16 but for yield and cost reasons, the maximum cores per chip will be 15) on 18 metal layers which are the connection layers and pipes.
- The chip will be packaged as an SCM (one chip per socket) and a DCM (two chips per socket), depending on the server.
- The chip is built for massive data handling and, to enable compute to keep up, the L2 cache has been increased to 2MB per core and the L3 cache is now 128MB per chip.
- The processor is enabled to support PCIe Gen5 devices once they become available.
- The GHz will likely be about the same as the POWER9 processor but the larger pipes, bandwidth, and interconnect speeds should make the per-core performance 25 to 30% faster.
- Performance will obviously be dependent on the applications you’re running but those that use large data volumes, high I/O volumes, and/or span many cores will benefit from throughput gains and the usual price/performance cost decrease.
- There are also enhancements that will optimize artificial intelligence; specifically, AI inference, along with large database support, large in-memory application support, and security enhancements.
Some new features in the IBM POWER10 that stand out as major technology improvements include:
Affinity and AXON
Affinity is the delay caused by remoteness to memory, cache misses, connections to other cores for coherency, and connection to other sockets and nodes. In prior Power architectures, the interconnections were performed by what was labeled the A bus and the X bus. In POWER10, these busses have been replaced by AXON on the chip which has a 1 TB/s bandwidth. It may be that AXON means A bus and X bus on the chip or corner, but that is just a guess. This bus is also used to connect Open CAPI devices like FPGAs, ASICs, and potentially Storage Class Memory. It will be the support interface for memory inception. This change is key to increasing system performance and allowing the physical location of code and memory to be less significant than it has been in the past.
Memory Bandwidth, Memory Inception, and OMI
If you have a large memory requirement for an application like SAP HANA, want a data base to be memory resident for speed of access, or desire very large Java heaps, you were historically limited by the size of memory that was available or that the system could support since you normally run multiple workloads on Power Servers due the efficiency of PowerVM. In Power Systems Scale-Out Servers, the limit was 4 TB per server. If you needed more memory, you had to scale out and distribute the workload accordingly or move to a more costly vertical scaling Enterprise Class Power System server. Memory Inception is where a cluster of servers can share variable memory clusters from 1 to N systems to a particular server. So, effectively, the memory available to an application can be quite large. This sharing uses AXON for interconnect of the data from the “other” systems. For on-system memory connection to the processor, there is a new memory interface called Open Memory Interface (OMI) that runs at about 410 GB/s on DDR4 but can expand to 1 TB/s with GDDR or DDR5 when these become available. Using the OMI, the memory inception could expand to two petabytes but there would be some delay to the memory on the other systems being transferred over the AXON busses. This supports business needs that require large, memory-based workloads and dynamic memory needs that exceed the memory capacity of a single server.
Security is a key concern for everyone and the number of data breaches keeps increasing. To address this, IBM POWER has always had memory encryption logic on the processor for faster encryption performance. In IBM POWER10, the performance has been increased and workloads using data encryption will now take less time. Additionally, POWER10 has the capability to encrypt the entire main memory. This should greatly increase data and application code security.
Data Center Performance
Everyone wants to know how much faster POWER10 is than the previous systems. Performance is a result of many factors, including instruction set, SMT, I/O, affinity, memory speed, pre-fetch, look-ahead, programming efficiency, number of execution units, dispatch cycles, and queues—which means the answer is “it depends.” IBM has published a chart that compares a POWER9 two-socket system (with 12 core SCMs) to the POWER10 two-socket system (with 15 core DCMs) and it hits over 3x the performance and energy efficiency. But this means that 60 cores of POWER10 is 3x that of 24 cores of POWER9. While that core ratio seems to explain the 3x performance, additional data says the per-core performance from POWER9 to POWER10 may be 25 to 30% greater per core. Remember, results are dependent on the workloads, and there is no doubt the POWER10 systems with the new capabilities will support more workload capacity on a single system. OpenShift container density for hybrid cloud workloads is also optimized on the Power10. The price/performance advantages will be very positive and the ability to support existing workloads on fewer servers means lower capital costs and greater cost savings on administration and environmental expenses.
Matric Math Accelerator and Nested KVM Environments
Two additional items are particularly interesting. First, the addition of an embedded Matrix Math Accelerator (MMA). This is an addition to the ISA and will enable matrix math functions (multiply, Eigen values, FFT, etc.) to be performed by hardware and not simulated in software. This will be useful for AI inference for FP32, BFloat16, and INT8 calculations as well as other high performance computational (HPC) applications. The embedded matrix math accelerator will not be as fast as a GPU but it will be faster than software simulations.
The other item is the statement that in POWER10 Systems, you will be able to have nested KVM environments under PowerVM. This means that all POWER10 systems will be able to run Red Hat OpenShift Container Platform (OCP) in a KVM environment with containers, which was not possible in POWER9 systems unless the Power System used the OPAL hypervisor in place of PHYP. This is good news for customers that need KVM solutions running alongside traditional AIX and IBM i solutions.
Key Support Elements – Future Releases
POWER10 is a tremendous advancement in technology and also in forethought of capabilities that today’s and tomorrow’s businesses require. While it is optimized for Red Hat OpenShift for enterprise hybrid cloud, the exploitation of some other capabilities will require support and changes to AIX, IBM i, Linux, PowerVM, PHYP, compilers, and applications. The processor is “done,” however IBM has more work ahead on enablement and exploitation, which will surely happen with new releases of key support elements.
For further discussions and insights for planning purposes, and to learn how IBM POWER10 would be an asset to your data center, contact your Mainline Account Executive directly or click here to contact us with any questions..
You May Also Be Interested In:
Video: IBM Power 9 XIVE Delivers IO Performance Improvement (7:00)