POWER9 Performance and Performance Modes

March 22nd, 2018

Ron Gordon
Director – Power Systems

Now that POWER9 systems have been announced and shipped, I think the next question on everyone’s mind is performance. IBM has published rPerf and CPW data, in the latest Power Performance report, but I think some explanation and views on the data may be helpful.

First, POWER9 is a powerful chip with many enhancements over POWER8, and for sure enhancements over the POWER7, 7+, and earlier chips. POWER9 is very efficient at STM8, due to more execution units (aka pipelines). Unlike POWER8, I suggest everyone specify SMT8. The performance reports show this, but if your application is single threaded, then ST is probably better to eliminate the overhead of SMT.

Second, we have improved “turbo” modes that can over clock, independent of environmental factors, in Dynamic Performance Mode or Maximum Performance Mode. Interesting is the implementation that each socket can be optimized separately. For idle time, the Energy Saving mode still exists.

Third, we have several factors that require some action or configuration details. The 25G bus is a 48- lane high speed bus, but needs special devices to exploit it, as well as CAPI 2.0. As these require special programming for exploitation, I do believe that IBM will be working with Open Power members to provide “ready to run” solutions. Then, there is PCIe Gen4, which will normally require new network switches and adapters, but will provide greater IO bandwidth and throughput, if required by the applications for higher performance.

Fourth, we have a very high-speed internal bus (7.2 TB/s) for inter-socket core communication and a faster SMT speed. This will make affinity less of an issue than it was in POWER7, and also, better than the improvements the POWER8 provided.

Some items will effectively be unchanged from POWER8 (but will benefit POWER7 upgrades) and will include the same speed memory controllers, the same speed DIMMS, and support for PCIe Gen3. It should be noted that the Scale-Out models will use industry-standard memory, and hence, will have a memory bandwidth of 172 GB/s, as opposed to buffered memory of POWER8 running at 192 GB/s. I personally feel this is negligible, and it is offset by larger caches and faster cache coherency.

So, what are the results? IBM marketing material is stating “up to 1.5 times performance over POWER8.” There is also an analysis that is a little better at showing the performance deltas from POWER7 and POWER8. Below is the POWER8 S824 to POWER9 S924 comparisons, and it should be noted that these are not a per code comparison, but rather a per system comparison. Since Scale-Out systems have all cores activated, this table is widely applicable when doing system replace/upgrades. (You can always divide the numbers to determine the per code performance increase.)

Please contact your Mainline Account Executive directly, or click here to contact us with any questions.

Mainline