Ashwin's Roam Garden

As AI and HPC workloads proliferate, the power density of the electronic components increases dramatically. Despite the recent plateau in energy use, a doubling of demand will soon have implications for the global energy demand.

Questions What triggers a cooling technology jump?

Data centers need liquid cooling to keep pace with the increase in demand for cloud services

For AI-driven workloads, air-cooled forced convection solutions are limited by heat sink pressure drop and the noise because of high air velocities. Thus, making single phase liquid cooling a viable option despite the higher upfront cost of building the liquid coolant loop

5mNnEdVA2

Questions Is the technology jump at the server level?

application-driven need to design server configuration and the resulting high-power density chassis

AI-driven workload expects a certain number of high-power compute nodes to be integrated a certain fashion at the system level with storage and networking nodes.

Switching from an air-cooled system to liquid is not a decision to be made quickly or lightly; there are many factors and possibilities to consider when improving your thermal management to handle higher heat loads. Although market trends indicate that full liquid cooling systems will eventually be the industry standard for cooling power electronics, there are many options and hybrid solutions that can apply the benefits of both as your system evolves or upgrades. If budget or timeline constrictions are such that a direct switch to liquid is unrealistic, optimizing your forced convection solution either through design improvements or by introducing two-phase cooling or liquid components are viable interim solutions.

Liquid cooling provides easily trackable bottom-up transference of heat from the component level to the building level

Constraints of single phase liquid cooling

size, weight, thermal performance

limitations on coolant or volume availabiity

effect on application or facility downtime after implementation at the server level

Two-phase liquid cooling

thermal transient

transient types:

planned

time interval known and rates of ramps known

gentle ramps

unplanned

dynamic predictive avoidance

varsamopoulos2013

volume of coolant in loop

need for lower cooing water temperatures

a $/W measure for comparing the various hybrid cooling designs

top-down hybrid cooling design

baseline data center ashrae2019std90.4

total DC power = 1MW

total number of racks = rack density * total DC power

total number of servers = ITE RU density * total number of racks

total number of compute nodes = node density * total number of servers

ITE/RU density is 1 if all the servers are 1RU

{{TODO}} Read Appendix G to find out how a baseline building model is defined ashrae2019std90.1

Questions Is there a baseline data center energy model defined in 90.4 similar to 90.1?

regulated

unregulated

NZjZ40FZY

*bottom-up hybrid cooling design

$W/cm^2$ or $mL/W$

chip, TDP; maximum die temperature

module, effectiveness; $W/cm^2$

server, RU

cabinet

Technologies to improve the energy efficiency of large-scale computers, data centers, and computational infrastructure.

ARPA-E encourages submissions stemming from ideas that still require proof-of-concept R&D efforts as well as those for which some proof-of-concept demonstration already exists. arpa

Submissions requiring proof-of-concept R&D can propose a project with the goal of delivering on the program metric at the conclusion of the period of performance. These submissions must contain an appropriate cost and project duration plan that is described in sufficient technical detail to allow reviewers to meaningfully evaluate the proposed project. If awarded, such projects should expect a rigorous go/no-go milestone early in the project associated with the proof-of-concept demonstration. Alternatively, submissions requiring proof-of-concept R&D can propose a project with the project end deliverable being an extremely creative, but partial solution. However, the Applicants are required to provide a convincing vision how these partial solutions can enable the realization of the program metrics with further development.

Applicants proposing projects for which some initial proof-of-concept demonstration already exists should submit concrete data that supports the probability of success of the proposed project.

ARPA-E will provide support at the highest funding level only for submissions with significant technology risk, aggressive timetables, and careful management and mitigation of the associated risks.

ARPA-E will accept only new submissions under this FOA. Applicants may not seek renewal or supplementation of their existing awards through this FOA.

ARPA-E plans to fully fund your negotiated budget at the time of award.

From Google Doc draft filter

CONCEPT SUMMARY

The energy consumption of data centers is growing rapidly to support cloud computing, the internet of things, and streaming services. Through the miniaturization of devices at the component level, the demand for higher performance servers has been achieved at the expense of larger thermal footprints of the components. The energy demand of a compact data center is 100X the energy demand of a regular office of equivalent volume [1]. The rapid removal of heat and overall system cost minimization is the driving force leading this team to conduct the research and development of the technologies proposed with the aim to design and develop next-generation energy-efficient data centers.

As artificial intelligence (AI) and high-performance computing (HPC) workloads proliferate, the power density of the electronic components increases dramatically. A concerted effort to decrease the energy use has mostly offset the growth in total IT device energy use. Despite this plateau in energy use, a doubling of service demands in the coming years will have clear implications for global energy demand. Due to the rise of hyperscale data centers, custom configured volume servers are expected to represent about 40% of all volume servers operating in the USA. Data center thermal management needs direct liquid cooling (DLC) to keep pace with the increase in demand for cloud services. For AI-driven workloads, air-cooled forced convection solutions are limited by heat sink pressure drop and the noise because of high air velocities. Thus, making single phase liquid cooling a viable option despite their higher upfront cost in building the liquid coolant loop.

What triggers a cooling technology jump? i.e. air to single phase, air to two-phase, single to two-phase?

When equipped with single phase liquid cooling, a typical chip power density of W/cm^2 at the module level results in 2000 W/rack-unit at the chassis level and subsequently adding up to 25-30 kW/rack. Two-phase liquid cooling ...

We propose the development of a hybrid cooling technology through the synergistic integration of a scalable bio-inspired evaporative cooling module for targeting high-heat flux systems. In parallel, we will also develop an innovative liquid cooling module. We will implement these technologies depending on the specific application. Air cools other components of the server equipped with internal fans. The center envisions the development of cooling technologies that facilitate a comprehensive union of energy-efficient electronic systems with higher reliability. Based on thermodynamics, transport phenomena, and control strategies, the center investigates to develop new models. The primary goal is to develop and commission next-generation, cooling technology which will help in saving energy. The hybrid cooling system integrated with a flow control device (FCD) is a developing technology that will play a huge role in thermal and reliability performance. The center will also concentrate on numerical analysis and simulation which will help to save time, money and also design and develop an efficient technology.

INNOVATION AND IMPACT

Traditionally air cooling has been the first choice for cooling IT equipment in a data center. As the power per processor increases, new cooling technologies are investigated to have a reliable operating condition of IT equipment. The increase in the number of transistors per processing unit has been the driving force for implementing liquid cooling in a data center. High heat dissipation requires aggressive cooling strategies for ensuring reliable performance of a data center.

The Nanoscale Energy & Interfacial Transport (NEIT) technology lab, led by the Co-PI, uses evaporative cooling, which can remove a heat flux of 735 W/cm2, with a required flow rate of only 10 mL/min. This enables our technology to dissipate heat fluxes with 50X smaller required flow rates compared with state-of-art liquid cooling technologies. This evaporation cooling technology also has an exergy potential as high as 81X greater than the traditional air cooling systems and can be used as a heat source for building or other applications.

Evaporation from asymmetric micro-droplets can achieve high heat transfer coefficients because of the substantial fraction of the heat transfer occurring in the contact line region, e.g. the solid-liquid-vapor interface. The ultra-small thickness of the liquid layer close to the contact line minimizes the thermal resistance across the liquid domain which can yield a high local heat transfer coefficient, exceeding 106 $W/m^2 K$

There are separate guidelines for both air and liquid cooling which outlines the operational boundaries for operation such as air temperature and humidity for air cooling and primary (facility) water temperatures for liquid cooling. But using the standards separately for a data center with IT equipment being cooled by both air and liquid is still ambiguous.

The choice of liquid cooling class for a liquid-cooled data center is based on the facility water condition and required temperature limits on the secondary loop inside a data center. As the limits vary and warm-water cooling can be applied, there is the possibility of efficiency improvements with W3 and W4 classes.

But in hybrid cooling, there are auxiliary components that are cooled by air which requires the use of a Computer Room Air Conditioner (CRAC) or Computer Room Air Handler (CRAH). There has been a continuous drive to increase the efficiency of air cooling as well, such as utilizing the free cooling directly or indirectly (air or water economizers), proper site selection for better air or water availability, different configurations using dry coolers and cooling towers instead of chillers to reduce the amount of mechanical energy required for cooling. But there is no unique guideline for hybrid cooling, as it involves both liquid and air cooling. As a result, the recommended envelope for liquid and air cooling will not be the same for hybrid cooling.

Also, there are challenges in implementing liquid cooling in air-cooled data rather than a purpose-built liquid-cooled data center. With liquid cooling, there is additional equipment to be installed and the cost of the components also contributes to additional capital cost. The air and liquid cooling in a data center need to be optimized not just to cool efficiently but also for reliable operation and cost of deployment. A unique guideline can help an air-cooled data center to implement liquid cooling and determine levels of cooling needed considering the module, server, rack, and room level knowledge and experience. The emerging technologies address a fraction of the benefits while neglecting the others (Google TPU3). This proposed concept and innovative method will provide a holistic approach from a novel module level and scalable evaporative cooling to a robust system-level air cooling including flow control devices and mitigation of contaminants (including COVID-19) and provide a unique guideline to find an optimum solution based on the primary and secondary side configuration, cost and time of deployment and reliability of operation.

Describe how the concept will have a positive impact on at least one of the ARPA-E mission areas in Section I.A of the FOA.

The aim of this innovation is to address the requirement of establishing a standard for hybrid cooling in a data center. The guidelines will help data center owners, managers, engineers, and researchers to select proper configuration on the primary and secondary side, define areas for design and optimization and implement the latest research for building, deploying, and monitoring hybrid cooled data centers. The impact of this innovation will translate to a unique guideline for improving the efficiency and cost savings of transforming a data center while switching from traditional air cooling to air and liquid cooling.

To the extent possible, provide quantitative metrics in a table that compares the proposed technology concept to current and emerging technologies and to the Technology Category in Section I.D of the FOA.

Referenced in

ARPA-E OPEN 2021

Concept Paper: Holistic Development of Hybrid Cooling for Hyperscale and Edge Data Centers