23 June 20, 07:26
Quote:Continue Reading
One of the key metrics we’ve been waiting for since AMD launched its Zen architecture was when it would re-enter the top 10 supercomputer list. The previous best AMD system, built on Opteron CPUs, was Titan, which held the #1 spot in 2012 but slowly dropped out of the top 10 by June 2019. Now, in June 2020, AMD scores a big win for its Zen 2 microarchitecture by getting to #7. But there’s a twist in this tale.
Measuring success by the TOP500 list is not so much for scoring revenue, but for scoring prestige. On the database are systems that were built over a decade ago, so a chance to put something into the list on the latest and greatest at a fraction of the size and power ends up being a big promotional opportunity for the company whose hardware is involved (as well as where it ends up being based). Obviously since AMD started introducing its new Zen-based processors, as a return to the high-end of performance after several years, we’ve been wondering how long it would take for a large scale AMD deployment.
AMD has had HPC success in the past, most notably with the Titan supercomputer, built on a mixture of Opteron 6274 CPUs paired with NVIDIA K20x accelerator cards. The machine hit #1 in 2012, and still sits at #12 today. This was a sizeable deployment, coming in at 17.6 PetaFLOPs for 8.2 MegaWatts.
Anand back in the day event went for a look around:
When it comes to AMD’s Zen designs, the two main CPUs we have to look for are Naples (1[sup]st[/sup] Gen EPYC) and Rome (2[sup]nd[/sup] Gen EPYC). That latter has been getting a lot of attention for having up to 64 high performance cores as well as a lot of memory bandwidth and heaps of connectivity for storage and add-in cards.
However, the first Zen system on the top 500 was technically neither of those.
The Hygon joint venture actually provided the first Zen based supercomputer to join the list in November 2018 at #38. This was a system built at Sugon, the company distributing the Hygon systems, to showcase the hardware. It used 5120 of the Hygon 32 core CPUs. We’ve reviewed and done a deep dive into the Hygon hardware. The Hygon joint venture has since dissolved, but the supercomputer it's based on is still running at #58.
It wasn’t until late 2019 that systems based on AMD EPYC show up. In November’s list that we saw two AMD Naples and two AMD Rome systems push AMD’s total up to six (5 based on EPYC, one on older Opterons). For the June 2020 announcement this week, another seven AMD Rome systems are in the list, making Rome the 10[sup]th[/sup] most popular processor family for supercomputers. But it’s Selene at #7 that’s making the headlines.
Selene is the name of the new supercomputer sitting at #7. For host processors, it is using AMD’s Rome 7742 parts, which are the highest performing commercial parts available that aren’t for specialized markets – technically a list price of $6950 each. What makes Selene a bit odd for an AMD win is that it is part of a supercomputer built with NVIDIA A100 accelerators. And it’s also built for NVIDIA to use at NVIDIA.
When NVIDIA announced its new A100 Ampere accelerator card for compute, it also announced the concept of a DGX A100 ‘SuperPod’, connecting 140 DGX A100 nodes and 1120 A100 GPUs to supply up to 700 PetaOPs of AI-based performance. It turns out that this concept of a SuperPOD also just happens to hit #7 in the TOP500 supercomputer list, which uses more traditional LINPACK FP64 FLOPs, straight off the bat. Each of the DGX A100 nodes contains two AMD EPYC CPUs and eight A100 accelerators.
Selene scores a performance of 27.6 PetaFLOPs of FP64 throughput, for 1.3 MegaWatts of power. Compared to the previous Titan supercomputer, which had Opterons and K20x accelerators, that’s 57% more performance for only 16% of the power, making it almost 10x more efficient. Selene uses NVIDIA’s Mellanox HDR Infiniband for connectivity, and has 560 TiB of memory installed.
...