06:11
Email sent to the group March 31st, 2021
The 'standards' narrative has been cut.
I've changed the title accordingly to : " holistic development of hybrid cooling technology for hyperscale and edge data centers." Note: 'hyperscale' and 'edge' are business jargons
Top-down scaling is straightforward
Assuming a 1MW data center (DC)
total DC power = 1MW
total number of racks = rack density * total DC power
total number of servers = ITE RU density * total number of racks
total number of compute nodes = node density * total number of servers
Bottom-up scaling needs to be carried out to differentiate the various cooling technologies
At the module level we consider a) heat flux (W/cm^2) & b) flow rate (mL/W)] for each tech
We need to identify what triggers a cooling technology upgrade
i.e. air to single phase, air to two-phase, single to two-phase
Since 40% of servers deployed today are customized, we assume, say, 500 W per compute node and back calculate the server power based on rack density specification instead of assuming the number of compute nodes per server
With a rack power of 10kW, we get 10 racks.
If a single rack occupies 2 ft^2; Total white space = 20 ft^2
In a hyperscale DC, the deployments are in terms of pods. This article details how Facebook defines their pod.
The pod specification is largely governed by the hyperscale network topology.
I don't know why TPU v3 needed single phase liquid cooling but my wild guess is : hyperscale pod specification
In a hyperscale DC, the building blocks are the pods. Adding a pod is how one fulfills the growing service demands
Fulfilling the 'machine-to-machine' service demands CAN become a bottleneck. Maybe this is what triggered the TPU v2 to TPU v3 cooling technology upgrade. I don't know for sure.
But this shows how we can scale bottom-up starting from the module level heat flux
mL/min is another factor, when scaled up, results in different system level infrastructure
Suppose two cooling technologies have comparable W/cm^2 but one requires a higher coolant flow rate per W than the other. This suggests that the total volume of coolant-in-loop can be the deciding factor sometimes.
Factors to consider when rating different cooling tech.
size, weight, thermal performance
limitations of coolant or volume availability
effect on application or facility downtime after implementation at the server level
the need for lower cooling water temperature
thermal transients (planned and unplanned)
Concept paper template
Document the various ARPA-E concept paper essentials
Identify how the essentials were addressed in the previous concept papers