24 May 24, 07:05
Quote:Less energy, more performance.Continue Reading
AMD CEO Lisa Su attended imec’s ITF World 2024 conference to accept the prestigious imec Innovation Award for innovation and industry leadership, joining the ranks of other previous honorees like Gordon Moore, Morris Chang, and Bill Gates. After accepting the award, Su launched into her presentation covering the steps that AMD has taken to meet the company’s 30x25 goal, which aims for a 30x increase in compute node power efficiency by 2025. Su announced that AMD is not only on track to meet that goal, but it also now sees a pathway to more than a 100x improvement by 2026 to 2027.
Concerns about AI power usage have been thrust into the limelight due to the explosion of generative AI LLMs like ChatGPT, but AMD had the vision to foresee problems with AI’s voracious power appetite as far back as 2021. Back then, AMD began work on its 30x25 goal to improve data center compute node power efficiency, specifically citing AI and HPC power consumption as a looming problem. (AMD set its first ambitious energy goal back in 2014 with its inaugural 25x20 goal to improve consumer processor power efficiency by 25x by 2020, which it exceeded with a 31.7x improvement.)
That problem has now come to the forefront. Generative AI is fueling the rapid expansion of data centers as the world’s largest companies vie for AI supremacy, but public power grids aren’t prepared for a sudden surge of power-hungry data centers, making power the new limiting factor. There are hard limits on the amount of power available to data centers as grid capacity, infrastructure, and environmental concerns limit the capacity that can be dedicated to both new and expanding data centers alike. In fact, many new data centers are being constructed next to power plants to ensure the supply of power, and the crushing demand has even reignited the push for nuclear Small Module Reactors (SMRs) to supply individual data centers.
The problem is only intensifying as the compute required to train models increases. Su pointed out that the size of the first image and speech recognition AI models used to double every two years, largely matching the pace of advances in computing power over the last decade. The size of generative AI models is now growing at 20x per year, however, outstripping the pace of computing and memory advancements. Su said that while today’s largest models are trained on tens of thousands of GPUs consuming up to tens of thousands of megawatt-hours, rapidly expanding model sizes could soon require up to hundreds of thousands of GPUs for training, perhaps requiring several gigawatts of power to train a single model. That clearly isn’t tenable.
AMD has a multi-pronged strategy for improving power efficiency, consisting of a broad approach that expands beyond its silicon architectures and advanced packaging strategies to AI-specific architectures, system- and data center-level tuning, and software and hardware co-design initiatives.
...