04 March 20, 08:31
(This post was last modified: 04 March 20, 08:36 by harlan4096.)
Quote:Continue Reading
Several years ago, at a local event detailing a new Arm microarchitecture core, I recall a conversation I had with a number of executives at the time: the goal was to get Arm into 25% of servers by 2020. A lofty goal, which hasn’t quite been reached, however after the initial run of Arm-based server designs the market is starting to hit its stride with Arm’s N1 core for data-centers getting its first outings. Out of those announcing an N1-based SoC, Ampere is leading the pack with a new 80-core design aimed at cloud providers and hyperscalers. The new Altra family of products is aiming to offer competitive performance, performance per watt, and scalability up to 210 W and tons of IO, for any enterprise positioning.
The Arm Server Market: 2010-2019 (Abridged)
We’ve seen companies such as Broadcom/Cavium/Marvell, Calxeda, Huawei, Fujitsu, Phytium, Annapurna/Amazon, AppliedMicro/Ampere, and even AMD put Arm-based IP into silicon and subsequently into the server market. Up until recently, most designs have been fairly lackluster – with companies either developing their own core on an Arm architecture license and not getting a performance lift, or using the standard Arm cores and not finding the right mix of performance, power, and software uptake needed to drive home the design. As a result, we’ve seen multiple companies fall by the wayside, be acquired, or limit their activities to specific customers and keep very hush-hush.
A big example of the ‘be acquired’ type of company was Annapurna, whom Amazon acquired and eventually released its Graviton2 processor in recent months. This chip has 64 cores based on Arm’s N1 design, which is the leading microarchitecture layout for Arm server chips at this point. To that end, Ampere (who originally purchased Applied Micro) is now set to release its second generation product, with 80 of the N1-based cores, and it now has a name: Altra.
Ampere Altra
Ampere has already given a number of details away about Altra in an announcement late last year, however this time around we have concrete details and the company has performance projections. On the back of its first generation eMAG product, Ampere is looking to offer better-than-Graviton2 performance to any cloud provider or hyperscaler who isn’t called Amazon, given that Graviton2 is built by Amazon and only available to Amazon. In that regard, Ampere has taking Arm’s full recommendations for its N1 design, building a chip with the most number of cores that N1 is designed to support.
As with other N1-based products, Altra will be single threaded, ensuring that each thread has its own core, its own resources, and removing any potential core-sharing thread security issues that have occurred recently. The Altra SoC is built with containers in mind, ensuring high-levels of quality of service with multiple customers on the same chip, and additional RAS features to ensure consistent performance.
The N1 core is by design what we’ve covered when Arm detailed the microarchitecture design last year. There is a 4-cycle 64 KB L1I/L1D caches per core, along with a 9-11 cycle 1 MB of private L2 per core. This is partnered with 32 MB of system wide LLC distributed through the SoC mesh, and all these caches are ECC with SECDED operation. It’s worth noting that 32 MB across 80 cores is less per core than Amazon’s Graviton2, which has 32 MB for 64 cores. 32 MB is actually half of what Arm recommends, as in Arm’s presentation it stated that it would expect a 64-core design to have 64 MB.
On top of the 80 cores, the SoC will also have eight DDR4-3200 memory channels with ECC support, up to 4 TB per socket. There are also 128 PCIe 4.0 lanes, with which the CPU can use 32 of them to hook up to another CPU for dual socket operation. The dual socket system can then have a total of 192 PCIe 4.0 lanes between it, as well as support for up to 8 TB of memory. We are told that it’s actually the CCIX protocol that runs over these PCIe lanes, which means 25 GB/s per x16 linkup. That’s good for 50 GB/s in each direction.
Each of the PCIe lanes can be bifurcated down to x8/x4/x2, and every different variant of the Altra SoC will only be segmented on core count and frequency: all CPUs will have 4 TB support and 128 lanes of PCIe 4.0. Each CPU can also support up to four CCIX-based accelerators.
Altra is built on TSMC’s 7nm, and while is technically an Arm v8.2 design, it does borrow a couple of features from 8.3 and 8.5, namely hardware based mitigations for side channel attacks and a couple of other small micro-architectural features.
Each of the 80 cores is designed to run at 3.0 GHz all-core, and Ampere was consistent in its messaging in that the top SKU is designed to run at 3.0 GHz at all times, even when both 128-bit SIMD units per core are being used (thus an unlimited turbo at 3.0 GHz). The CPU range will vary from 45W to 210W, and vary in core count - we suspect these SKUs will be derived from the single silicon design, and it will depend on demand as well as binning as to what comes out of the fabs. Exact SKUs are going to be announced later this year.
...