Ashwin's Roam Garden

April 1st, 2021

06:11

ARPA-E OPEN 2021

Email sent to the group March 31st, 2021

The 'standards' narrative has been cut.

I've changed the title accordingly to : " holistic development of hybrid cooling technology for hyperscale and edge data centers." Note: 'hyperscale' and 'edge' are business jargons

Top-down scaling is straightforward

Assuming a 1MW data center (DC)

total DC power = 1MW

total number of racks = rack density * total DC power

total number of servers = ITE RU density * total number of racks

total number of compute nodes = node density * total number of servers

Bottom-up scaling needs to be carried out to differentiate the various cooling technologies

At the module level we consider a) heat flux (W/cm^2) & b) flow rate (mL/W)] for each tech

We need to identify what triggers a cooling technology upgrade

i.e. air to single phase, air to two-phase, single to two-phase

Since 40% of servers deployed today are customized, we assume, say, 500 W per compute node and back calculate the server power based on rack density specification instead of assuming the number of compute nodes per server

With a rack power of 10kW, we get 10 racks.

If a single rack occupies 2 ft^2; Total white space = 20 ft^2

In a hyperscale DC, the deployments are in terms of pods. This article details how Facebook defines their pod.

The pod specification is largely governed by the hyperscale network topology.

I don't know why TPU v3 needed single phase liquid cooling but my wild guess is : hyperscale pod specification

In a hyperscale DC, the building blocks are the pods. Adding a pod is how one fulfills the growing service demands

Fulfilling the 'machine-to-machine' service demands CAN become a bottleneck. Maybe this is what triggered the TPU v2 to TPU v3 cooling technology upgrade. I don't know for sure.

But this shows how we can scale bottom-up starting from the module level heat flux

mL/min is another factor, when scaled up, results in different system level infrastructure

Suppose two cooling technologies have comparable W/cm^2 but one requires a higher coolant flow rate per W than the other. This suggests that the total volume of coolant-in-loop can be the deciding factor sometimes.

Factors to consider when rating different cooling tech.

size, weight, thermal performance

limitations of coolant or volume availability

effect on application or facility downtime after implementation at the server level

the need for lower cooling water temperature

thermal transients (planned and unplanned)

Concept paper template

Document the various ARPA-E concept paper essentials

Identify how the essentials were addressed in the previous concept papers

Referenced in

Readwise

On April 1st, 2021 at 12:49 AM Readwise synced 3 highlights from 1 book

April 1st, 2021