23 June 20, 07:23
Quote:Continue Reading
High performance computing is now at a point in its existence where to be the number one, you need very powerful, very efficient hardware, lots of it, and lots of capability to deploy it. Deploying a single rack of servers to total a couple of thousand cores isn’t going to cut it. The former #1 supercomputer, Summit, is built from 22-core IBM Power9 CPUs paired with NVIDIA GV100 accelerators, totaling 2.4 million cores and consuming 10 MegaWatts of power. The new Fugaku supercomputer, built at Riken in partnership with Fujitsu, takes the top spot on the June 2020 #1 list, with 7.3 million cores and consuming 28 MegaWatts of power.
The new Fugaku supercomputer is bigger than Summit in practically every way. It has 3.05x cores, it has 2.8x the score in the official LINPACK tests, and consumes 2.8x the power. It also marks the first time that an Arm based system sits at number one on the top 500 list.
Due to the onset of the Coronavirus pandemic, Riken accelerated the deployment of Fugaku in recent months. On May 13[sup]th[/sup], Riken announced that more than 400 racks, each featuring multiple 48-core A64FX cards per server, were deployed. This was a process that had started back in December, but they were so keen on getting the supercomputer up and running to assist with the R&D as soon as possible – the server racks didn’t have their official front panels when they started working. There are still additional resources to add, with full operation scheduled to begin in Riken’s Fiscal 2021, suggesting that Fugaku’s compute values on the top 100 list are set to rise even higher.
Alongside being #1 in the TOP500, Fugaku enters the Green500 List at #9, just behind Summit, and below the Fugaku Prototype installation which sits at #4.
At the heart of Fugaku is the A64FX, a custom Arm v8-A CPU-based chip optimised for compute. The total configuration uses 158,976 of these 48+4-core cards, running at 2.2 GHz peak performance (48 cores for compute, 4 for assistance). This allows for some substantial R[sub]peak[/sub] numbers, such as 537 PetaFLOPs of FP64, the usual TOP500 metric. But A64FX also supports quantized models with lower precision, which is where we get into some fun numbers for Fugaku:Due to the design of the A64FX, it also allows for a total memory bandwith of 163 PetaBytes per second.
- FP64: 0.54 ExaFLOPs
- FP32: 1.07 ExaOPs
- FP16: 2.15 ExaOPs
- INT8: 4.30 ExaOPs
To date, the A64FX compute card is the only implementation of Arm’s v8.2-A Scalable Vector Extensions (SVE). The goal of SVE is to allow Arm’s customers to build hardware with vector units ranging from 128-bit to 2048-bit, such that any software that is built to run on SVE will automatically scale regardless of the SVE execution unit size. A64FX uses two 512-bit wide pipes per core, with 48 compute cores per chip, and also adds in four 8 GiB HBM2 links per chip in order to feed the units for 1 TiB/s of total bandwidth into the chip.
...