Skip to content

How Chiplet Thermal Co-Design Fails When Package Engineers and RTL Teams Work in Isolation

P. Nakamura P. Nakamura
/ / 5 min read

Thermal problems in chiplet packages rarely announce themselves early. They show up after months of parallel work, when a package engineer discovers that the power map the RTL team handed over bears almost no resemblance to what the silicon actually does under load. By then, changing the bump array or the interposer layout is expensive. Changing the floorplan is worse.

Detailed view of a metallic heat sink showcasing technological design. Photo by Nic Wood on Pexels.

This is the real cost of siloed thermal co-design: not the heat itself, but the timing of when you find out about it.

Why the Handoff Model Breaks Down

In a monolithic SoC flow, thermal analysis sits comfortably at the back of physical design. Power intent gets refined through synthesis and place-and-route, and by the time a thermal model is needed, the power numbers are reasonably accurate. The package team gets a single die, a single thermal interface, and a straightforward path to the heat spreader.

Chiplet packages destroy that model. You now have multiple dies, potentially from different foundries, each with its own power density profile. Each die is also thermally coupled to its neighbors through the interposer, the substrate, and any underfill material between them. A compute chiplet running at 0.9 W/mm² will heat an adjacent HBM stack even if the HBM is barely active. That coupling changes HBM junction temperature, which changes refresh rate requirements, which changes effective memory bandwidth.

The RTL team didn't model that. They modeled their die in isolation.

The Coupling That Gets Ignored

Here's a concrete scenario. A 2.5D package carries a GPU compute die flanked by four HBM stacks on a silicon interposer. The compute die is 400 mm²; each HBM stack occupies roughly 100 mm². The package engineer ran thermal simulations using a uniform power map for the compute die: total TDP divided evenly across die area.

The RTL team, working in parallel, placed the memory controllers and high-activity shader clusters in the die quadrants nearest to the HBM stacks. That placement makes electrical sense: it minimizes die-to-die interconnect distance and reduces latency. But it concentrates the hottest logic within 5 mm of the temperature-sensitive DRAM.

Neither team was wrong. They were just optimizing for different objectives with no shared thermal model to arbitrate.

When the package came back from thermal characterization, the HBM stacks adjacent to those quadrants were running 11°C hotter than the simulation predicted. At those junction temperatures, the HBM vendor's reliability specs require derating. You either throttle the memory interface or you accept a shortened device lifetime. Neither answer is acceptable in a production AI accelerator.

What a Shared Thermal Model Actually Requires

The fix sounds simple: share thermal data earlier. The practice is harder.

A useful shared model needs power maps at realistic granularity (per functional block, not per die), a package thermal resistance network that includes interposer conductivity and TIM properties, and an agreed-upon set of workload scenarios that drive the worst-case power distribution. That last item is the one that stalls most programs. RTL teams work from microarchitectural assumptions; package engineers work from measured silicon. Bridging those two takes a committed co-design loop, not a one-time handoff.

The diagram below shows where that loop has to close:

graph TD
    A[RTL Power Intent] --> B(Block-Level Power Map)
    B --> C{Thermal Coupling Simulation}
    C --> D[Package Thermal Model]
    D --> E(Hotspot Identification)
    E --> F{Floorplan Constraint Update}
    F --> A
    D --> G[HBM Junction Temp Check]
    G --> F

The loop from hotspot identification back to RTL floorplan is where most programs break. It requires RTL designers to accept placement constraints that originate from package-level physics, not from timing or congestion. That's a cultural shift as much as a technical one.

Where the Industry Is Heading

Some EDA vendors are starting to close this gap with tools that ingest package thermal resistance matrices directly into floorplanning engines. The idea is to surface thermal coupling penalties as first-class placement costs alongside wirelength and timing. Cadence's Integrity 3D-IC platform and Synopsys's 3DIC Compiler both include thermal-aware placement capabilities, though their practical adoption depends heavily on whether the package data exists in a compatible format early enough to matter.

The format problem is underrated. Package engineers often work in Ansys or Siemens tools that export thermal results in formats RTL tools can't directly consume. Converting those results into something a floorplanner can use requires either scripting effort or a dedicated data exchange layer. UCIe defines electrical interfaces between chiplets; nobody has yet standardized the thermal data exchange that should accompany it.

Until that exchange becomes routine, the most effective approach remains organizational: put a thermal architect in the room with both teams from the start of the chiplet floorplanning phase. That person's job is to own the shared model, arbitrate trade-offs, and make sure the HBM neighbors of a hot compute die don't become an unpleasant surprise six months into the program.

The silicon will tell you eventually. Better to ask before the masks are cut.

Get Chiplet Ecosystem in your inbox

New posts delivered directly. No spam.

No spam. Unsubscribe anytime.

Related Reading