Cooling Solutions for Overworked Datacenters

Discover how Coherent is solving the hottest challenge for AI-fueled datacenters with cold plate cooling.

 

January 10, 2024 by Coherent

AI is hot. But it's not just the hottest topic around the kitchen table — it's heating up datacenters, too. Fortunately, Coherent offers a range of innovative thermal management solutions to keep high-workload datacenters running efficiently.

Other applications like cloud computing, gaming graphic demands, cryptocurrency mining, or edge computing are also rapidly (and radically) boosting workload demands, and the temperature, inside datacenters. Likewise, as semiconductors become smaller with a higher density of transistors, heat is generated very quickly in such a compact area. 

Across the board, servers are tasked with a processing burden they’ve never experienced, while this unprecedented energy demand quickly increases the risk of overheating in datacenters. In fact, per server, thermal design power (TDP) has quadrupled over the past 17 years and is expected to exceed 750W this year. This costly, cumulative workload demand not only lowers the energy efficiency of global datacenters, it leads to an adverse effect on both performance and reliability. Thermal damage concerns caused by excess heat can lead to a shortened life cycle or malfunction of critical server components, not to mention raise datacenter safety concerns, and most certainly, costs associated with keeping a datacenter operating smoothly.

 

Modern Datacenters Are More Demanding

GPU computing is at the heart of training large-scale AI models due in part to potentially thousands of additional processing cores vs. CPU computing. Because today’s datacenters require more power than traditional CPU processing called for in the past, to aid this acceleration, many on-prem datacenters are shifting to high-density rack solutions that require more power and emit heat at levels they’re often unequipped to deal with.

To achieve an efficient “AI cool down” or to solve other high-energy efficiency problems in datacenters requires strategic thermal management. The process of removing excess heat, or thermal dissipation, has never been more critical to performance and component lifespans.

 

Preventing Thermal Damage in Overheated Datacenters

To mitigate thermal damage and safeguard against costly downtime, planning and management for thermal distribution has never been more important. Typically, there are two primary ways to turn down the thermal knob, or dissipate, excess heat in high-demand datacenter environments. 

1. Immersion (or air cooling): expensive, complex, and environmentally challenging 

This macro-cooling (non-direct to chip) method requires plates and server rack components to be cooled through high-convection air (top-level) or full liquid immersion (bottom-level). This adds up to a costly solution.

2. Cold plate cooling (direct-to-chip): maximizes efficient heat transfer, resistant to corrosion.

Coherent recommends a high thermal conductivity material micro-cooling solution, which uses a physical cold plate technology to directly extract heat from high-energy chips like GPUs (for example).

 

The Advantages of Cold Plate Materials 

Functionally, cold plate cooling — also known as direct-to-chip or simply “micro cooling” — does what it sounds like: it uses a cold plate to directly extract heat from high-energy chips like GPUs.

Similar to a home refrigerator, which uses a condenser to remove heat, cold plate cooling dissipates thermal activity from GPUs by transferring heat from the component to the coolant. The cold plate itself maximizes the heat transfer efficiency. 

 

blog-thermal-dissipation.jpg

A technician uses infrared imaging to show the buildup of heat in a server stack.

 

But what makes cold plate cooling more successful? Well, it boils down to higher thermal conductivity. To put it in perspective, a conductor like copper has a thermal conductivity of around 400 watts per meter kelvin, whereas a material such as poly-crystalline CVD diamond performs significantly higher—nearly four times that amount. 

 

Material

Thermal Conductivity (W/mK)

Copper

~400

Coherent Ceramic + Diamond (SiSiC/70% diamond)

~670

Coherent Poly-crystalline CVD diamond

~1500

A Thermal Safety Net Across Industries

At Coherent, we’re passionate about solving the heating challenges of datacenters, especially those running hot thanks to the popularity of AI. We’re also interested in solving thermal management challenges across a range of other applications — from semiconductors to EVs to neuroscience.

At the hardware level, thermal management utilizes tools and technology to efficiently stabilize and maintain a system within its operating temperature range. Coherent’s thermal management materials and systems span beyond just microelectronics like semiconductor equipment to a broad range of markets and applications like materials processing, automotive, aerospace & defense, datacom & telecom, and life sciences.

There’s no shortage of applications for differentiated engineering materials and devices across multiple end markets:

 

thermal-dissipation-and-datacenters.jpg

 

Coherent is a world leader in innovative engineered materials and sub-systems for thermal management, delivering strategic, tailored material solutions. 

Our wide range of globally leading, innovative thermal management applications include:

 

Reaction-Bonded Si/SiC

Coherent provides multiple reaction-bonded Si/SiC formulations to meet a broad range of design requirements and product applications, including applications for thermal management. Some of the reaction-bonded formulations we offer to thermal management market allows high thermal conductivity with CTE matching to AlN or Si3N4. With addition of diamond added into Si/SiC materials, Coherent can offer ultra-high thermal conductivity for heat-critical applications.  

Moreover, reaction-bonded Si/SiC products can be manufactured by near and near-net-shape fabrication processes. A very complex shape is possible with net-shape molding, green machining, and/or preform joining. The shape capability supports a broad range of product features including finned elements and internal micro-cooling channels. This allows us to meet some challenging application requirements.  

 

Metal Matrix Composites

Silicon carbide particle reinforced aluminum (Al/SiC) MMCs provide distinct advantages for thermal management applications. Since Al and SiC offer low density and high thermal conductivity, combining the two materials maintains these important material characteristics. At the same time, CTE can be tailored based on the SiC (CTE of 3 ppm/K) to Al (CTE of 23 ppm/K) ratio in the composite.  

MMC products can be manufactured by near and near-net-shape processes. It is fully machinable, including direct threading. Those materials are also compatible with standard plating processes. Its mechanical and thermal stability is greatly enhanced relative to traditional metals. And it is less fragile than ceramics. Also, Coherent has patent-protected manufacturing processes for MMC products, which allow us to meet customer’s specific application requirements.

 

CVD Diamond

Diamond offers the highest thermal conductivity of any material, at least four times higher than that of copper, the most commonly used metal for heat transfer. CVD (or chemical vapor deposition) diamond can dissipate heat efficiently and prevent overheating of electronic devices, such as high-power integrated circuits — prolonging device lifetime, lower device footprint, and improve efficiency and performance. 

CVD Diamond has a low coefficient of thermal expansion, which means it doesn’t expand or contract much when heated or cooled. Supporting a wide optical transmission range (UV through long IR), low coefficient of thermal expansion, and high thermal shock resistance, it’s ideal for applications like datacom, telecom, semiconductor manufacturing, and life sciences instrumentation.

 

Single Crystal SiC: High conductivity, wide-ranging applications 

The key advantages of SiC-based electronics include reduced switching losses, higher power density, better heat dissipation, and increased bandwidth capability. And as far as thermal conductivity is concerned, single-crystal SiC has a thermal conductivity of about 490 W/mK, which is more than three times higher than that of silicon (150 W/mK). Because SiC can dissipate heat more efficiently than silicon, it lowers the overall need for cooling and improves the reliability and performance of devices over time. 

With a great chemical stability, high saturated electron drift velocity — and high thermal conductivity — single-crystal SiC is an outstanding material for a wide range of applications, including but not limited to, optoelectronics, microwave devices, datacom, telecom, semiconductor manufacturing, electric vehicles (EVs ), as well as life sciences instrumentation.

 

Why Coherent: Performance, reliability, collaboration 

Coherent can offer solutions across different platforms with a track record of high performances and reliability. You’ll benefit from our patent-protected processes and tailored solutions for thermal dissipation in your datacenter.

Coherent works with teams of all sizes to help provide trusted, flexible custom solutions and capabilities inside datacenter environments and beyond. Efficient, effective thermal management provides cost savings, reduces downtime, and maximizes component lifecycles across many industry applications. 

Learn more about Coherent solutions for thermal management