AMD had its “Advancing AI” event this month, and it’s hitting this market hard with a focused group of solutions that span from PCs to datacenters and from CPUs and GPUs to advanced networking. But AI is starting to hit a wall already with the incredible amount of power required, how quickly power requirements shift between servers based on loading, and how much heat is being generated.
Companies like Microsoft with massive energy needs are looking at buying old nuclear plants and refurbishing them to attempt to manage these vast energy needs, and they have also, in the past, explored solutions like placing datacenters underwater to address the cooling problem.
While AMD is already working on aggressive designs that run more efficiently and can be cooled more effectively, it’s also working with a lot of people who are quietly working on completely redesigning the datacenter concept with efforts that are so secret they weren’t able to share them at the event.
Solving the data center cooling and energy problem could be as important as creating AI in the first place given it will rapidly become the biggest obstacle to overcome.
Let’s look at what is likely to come.
Load-Balancing Hybrid AI
One of AMD’s advantages is that it’s aggressively developing NPUs for the desktop and AI accelerators for the datacenter. This gives it the future ability to better load balance between the datacenter and the desktop and potentially lower datacenter loading while also lowering latency for the user in a win/win kind of scenario.
In the future the hardware that is bringing your AI to life will dynamically shift based on need and loading between these two elements both to mitigate datacenter and PC loading by moving the processing between these two processing elements. Where processing power is light, say on a smartphone, then the datacenter will take more of the load; where it is heavy, on a gaming system, workstation or powerful desktop computer, much of the processing may be done on the client device to reduce the energy and cooling requirements of the datacenter.
Rethinking the Datacenter
AMD is finding that load balancing with AI is far more difficult than it has been with other types of processing. The loads seem to shift far more quickly and be far heavier than prior application types. This means that datacenters will need to be redesigned for more rapid shifts in cooling and energy delivery and this will likely also accelerate the need for smaller, more distributed datacenters that can be placed closer to those using the services in order to further reduce latency and bring the power cooling requirements down to more manageable levels.
Cooling
AMD shared that the move is toward aggressive water cooling which should particularly benefit Lenovo which currently leads in the most efficient type of water cooling: warm-water cooling. However, existing water-cooling solutions were developed for pre-AI loads, and as I’ve noted above, AI loads are not only potentially much higher, they can shift between systems far more quickly. This suggests that eventually we’ll even have to rethink warm-water cooling to make it more responsive to these extremely rapid shifts in thermal loads.
I expect we’ll eventually see a reemergence of immersion cooling but at a scale we haven’t seen before and the use of robots to service the then submerged datacenters in order to maximize the far better heat transfer capabilities of liquid over air.
Power Generation
Governments are extremely concerned about how much energy AI is pulling from the grid today, but this load is relatively light compared to what will be needed in the future. To address this need, Microsoft is looking into spinning up Three Mile Island’s nuclear plant, and Westinghouse just launched a small nuclear reactor for datacenters.
Future datacenters will likely be required to self-generate the power they need. While many today have their own generators for backup and massive solar and wind farms to reduce grid loading, future datacenters will likely have a mix of geothermal and nuclear power sources which will require the industry to gain competence in advanced energy generation.
Wrapping Up: Change Is Coming
AI is forcing us to rethink datacenters and push harder on hybrid AI to lower latency, spread out and lower localized energy demand, and create more efficient ways to cool this technology. AMD, among others, are working with an increasing number of companies to not only improve the efficiency of what they build, but to rethink the datacenter because the datacenters of today won’t be able to handle the kinds of AI loads they’ll need to handle tomorrow.
- Things to Look for at Microsoft Ignite - November 18, 2024
- IBM Power: Doing Hybrid by Choice - November 13, 2024
- How to Build the Perfect AI Workstation - November 5, 2024