Energy efficiency hasn’t always been a big deal for high performance computing (HPC) and supercomputing customers. In fact, for the first four decades (beginning in the early 1960s) that supercomputers were commercially produced, HPC owners were far more concerned with systems’ computational capabilities than the electrical energy they consumed.
That was partly due to the unique value of those calculations when custom-built systems like the Cray CDC 6600 (delivered in 1964) took on highly complex jobs and performed them faster than ever before. In addition, the price of supercomputers limited interest in the systems to any but the deepest-pocketed large enterprises and government labs for which results outweighed virtually any cost.
However, other issues and events began to change that dynamic beginning in the early 2000s in what might be called HPC’s modern era. Those points resonate in Lenovo’s new ThinkSystem SD650, a high-density commercial solution designed to maximize compute performance for HPC workloads and applications while minimizing energy consumption. Let’s consider how power issues are impacting HPC, what Lenovo has achieved and how the new SD650 solutions address modern energy constraints and concerns.
Taming the HPC energy beast
So what were the issues that conspired to make energy efficiency a central question in HPC?
- The arrival of Intel x86-based solutions that fundamentally changed the availability of and interest in commercial HPC.
- Financial crises, including the dot.com bubble/bust and subprime mortgage debacle, along with their slow recoveries and investors’ heightened focus on companies’ short-term performance.
- Growing interest in renewable energy technologies, coupled with increasing concerns about connections between global warming and traditional energy sources.
It should be noted that these are not equally important to every HPC customer or even individually germane in many situations. For example, Intel-based clusters that supplanted custom-built systems have helped businesses of nearly every type to access and use HPC. However, government and university research labs mostly continue to define the leading edge of supercomputing with massive installations sporting thousands or tens of thousands of servers. Those same organizations remain relatively immune to the financial pressures facing publicly-held companies.
That said, interest in renewable energy technologies and trends is growing steadily. To cite an extreme example, Iceland’s pioneering work in geothermal power production has made it a leader in energy-intensive computation, like bitcoin mining. Similarly, numerous countries in the Middle East are pursuing solar energy development despite the fact that they’re sitting atop some of the world’s richest petroleum reserves.
Government programs also impact these efforts. For example, many fund research into alternative energy, and regulate the cost of services to reward thrifty and dun spendthrift power customers. However, such programs can differ substantially within individual countries, including the U.S. where some states amply reward renewable energy companies and consumers while others prop-up locally-produced fossil fuel industries, including coal production and power plants.
Similar disparities exist in the U.S. response to global warming. As we’ve seen during the past year, Federal policy is highly dependent on and can vacillate significantly according to who is in power. That’s inspired officials in states, including California to formulate and implement their own climate-related policies and programs.
Lenovo’s ThinkSystem SD650
Given these widely and wildly differing points, the way ahead for global-facing HPC vendors like Lenovo is clear. That is, to focus on innovative, flexible new solutions that address and can be adapted to a variety of use cases and circumstances. The new ThinkSystem SD650 fits that scenario to a “T.”
As noted in a blog by Vinod Kamath, Ph.D., a Lenovo thermal architect, the SD650 arose from a customer project with the Leibniz Supercomputing Center (LRZ) in Germany to develop a highly powerful, highly energy efficient system for HPC. In collaboration with Intel, Lenovo developed a motherboard that uses warm water (up to 50°C/122°F) rather than air to cool numerous system components, including the processors and DIMMs.
That helps the SD650 to run considerably cooler than previous water-based solutions which only cooled the CPUs, and also required water to be chilled to a lower temperature (45°C/115°F) than the SD650. The new Lenovo systems also utilize unique heat exchanger technologies to enhance overall energy efficiency. I plan to write more about that subject in the future.
Why is innovative water cooling such an important element in the new ThinkSystem SD650? Water conducts heat (thus removing it from the server) far more efficiently than air. In fact, Kamath noted in his blog that the SD650 can deliver up to 90 percent heat removal efficiency—a 2X increase in cooling capacity compared to conventional air-cooled systems. In addition, cooler systems mean that their Intel Xeon processors can continuously run in “Turbo Boost” mode, thus delivering greater computational performance.
There are also other potential benefits to using water cooling. For example, the LRZ used the hot water produced by its SuperMUC HPC installation (a 6.8 petaflop cluster that was deployed in 2012 and leveraged IBM System x iDataPlex dx360M4 technologies) to heat other parts of the facility.
The LRZ is looking at a substantial upgrade with its new cluster which will consist of some 6,500 Lenovo ThinkServer SD650 systems and deliver roughly 26.7 petaflops in compute capacity, an order of magnitude performance improvement over the original SuperMUC. However, despite that massive performance boost, the SD650’s warm water cooling capabilities will result in almost no additional demands on the LRZ’s chilled water infrastructure—a critical issue for the facility.
The Lenovo SD650’s energy efficiency will also deliver substantial operational savings, a crucial point in Germany where energy costs can be 2X to 3X more than what U.S. companies are charged. Over the 4- to 5-year operating life of a HPC cluster that consumes 4 to 5MW (like LRZs), Lenovo estimates that annual facilities savings can amount to over €100,000 (or about $125,000.00 US).
Final analysis
Nothing like a “one size fits all” solution exists for HPC or supercomputing, largely because customers focus on and systems are designed for such a wide range of projects, use cases and workloads. The affordability, flexibility and scalability of Intel-based technologies have helped x86-based solutions reach and remain atop the HPC market for the past decade.
But during that same period associated issues, including energy cost, availability and efficiency have profoundly impacted the value and affordability of HPC. Climate change and its likely impact on traditional power generation resources, including hydroelectric mean that HPC customers and the vendors that serve them need to focus their attention and talents on future-focused, energy efficient alternatives.
That underscores the importance and value of innovative solutions, like Lenovo’s ThinkSystem SD650. The company’s new offering delivers the topline performance that customers have come to expect from Intel-based HPC solutions. But by means of highly imaginative and effective thermal engineering, Lenovo is delivering that performance in a substantially more efficient and less operationally expensive package.
That’s great for Lenovo’s customers and their HPC projects, of course. But it’s also excellent news for markets and a world increasingly concerned by the financial and environmental impacts of energy production and consumption.
- Dell Concept Luna – Inspiring Sustainable Innovations with Circular Design - December 21, 2023
- AI Alliance: IBM, Meta, Dell and 50+ Founding Partners Pursue Open, Transparent and Safe AI Innovation - December 13, 2023
- Dell Technologies: Creative Partnering = GenAI Innovation - November 30, 2023