Get Started with HeteroSTA3D
Introduction
Face-to-face hybrid bonding heterogeneous 3D integrated circuits (ICs) enable the vertical stacking of dies with different process nodes through hybrid bonding technology, achieving heterogeneous integration that significantly improves performance and integration density. HeteroSTA3D is a high-performance static timing analysis (STA) engine specifically designed for analyzing face-to-face hybrid bonding heterogeneous 3D ICs.
HeteroSTA3D is built on top of the HeteroSTA library and requires both HeteroSTA and HeteroSTA3D library files to function. Unless otherwise specified, the HeteroSTA library (header) bundled with HeteroSTA3D can be replaced with a compatible version of the HeteroSTA library (header) and will function correctly without additional configuration.
This guide will walk you through the core concepts of a 3D STA workflow, explaining the essential API calls and their sequence.
Prerequisite: License Initialization
Before you begin any 3D STA workflow, you must initialize and validate both the HeteroSTA and HeteroSTA3D licenses. This must be the very first API call. If either license is not successfully initialized, the subsequent call to heterosta3d_new() will fail by returning NULL, preventing any further interaction with the library.
You can obtain license keys by following the instructions on our getting started page.
- API:
heterosta3d_init_license()
Best Practice: Managing Your License Keys
For better security and flexibility, we recommend storing your license keys in environment variables rather than hardcoding them. This makes your program easier to deploy and migrate across different environments.
1. Setting the Environment Variables
-
Temporarily (for the current shell session):
export HeteroSTA_Lic="lic:heterosta:your-license-key-string-here" export HeteroSTA3D_Lic="lic:heterosta3d:your-license-key-string-here" -
Permanently (for all future sessions):
Add the above lines to your shell's configuration file (e.g.,~/.bashrc,~/.zshrc).
2. Acquiring the License Keys in Your Code
Here is an example of how to read the licenses from environment variables:
#include <cstdlib>
#include <iostream>
// 1. Acquire licenses from environment variables.
const char* lic_2d = std::getenv("HeteroSTA_Lic");
const char* lic_3d = std::getenv("HeteroSTA3D_Lic");
if (lic_2d == nullptr || lic_3d == nullptr) {
std::cerr << "[FATAL ERROR] License not found in environment variables." << std::endl;
std::cerr << " Please ensure both 'HeteroSTA_Lic' and 'HeteroSTA3D_Lic' are set." << std::endl;
return 1;
}
// 2. Initialize the licenses.
bool license_ok = heterosta3d_init_license(lic_2d, lic_3d);
if (!license_ok) {
std::cerr << "[FATAL ERROR] Failed to initialize HeteroSTA3D licenses." << std::endl;
return 1;
}
// 3. Proceed with library initialization and workflow.The Standard 3D STA Workflow
Any 3D timing analysis follows a logical sequence of operations. This section breaks down the essential steps based on the example in run_cpu.cpp, providing a clear and concise workflow.
Step 1: Initialize the Environment
Every session begins with creating a Heterosta3D environment. This object serves as the central context for all 3D STA operations.
- APIs:
heterosta3d_init_license(),heterosta3d_new(),heterosta3d_free()
Example:
#include "heterosta3d.h"
#include <cstdlib>
#include <iostream>
// Initialize licenses
const char* lic_2d = std::getenv("HeteroSTA_Lic");
const char* lic_3d = std::getenv("HeteroSTA3D_Lic");
if (!heterosta3d_init_license(lic_2d, lic_3d)) {
std::cerr << "[FATAL ERROR] Failed to initialize licenses." << std::endl;
return 1;
}
// Create and initialize the 3D STA environment.
Heterosta3D* sta = heterosta3d_new();
if (!sta) {
std::cerr << "[FATAL ERROR] Failed to create Heterosta3D instance." << std::endl;
return 1;
}
// ... perform all analysis ...
// Free the environment at the end.
heterosta3d_free(sta);Step 2: Create Liberty Sets
Before the netlist can be understood, HeteroSTA3D needs the characterization data for the standard cells in both top and bottom dies. You must create Liberty sets for each die, and each set requires both EARLY (min/hold) and LATE (max/setup) timing corners.
- API:
heterosta3d_create_liberty_set_batch()
Example:
// Setup liberty sets for 4 combinations: top/btm × ss/ff = 2×2
// Each liberty set needs both Early and Late timing corners
const char *top_early_ss[] = {"simple_top_Early_ss.lib"};
const char *top_late_ss[] = {"simple_top_Late_ss.lib"};
const char *top_early_ff[] = {"simple_top_Early_ff.lib"};
const char *top_late_ff[] = {"simple_top_Late_ff.lib"};
const char *btm_early_ss[] = {"simple_btm_Early_ss.lib"};
const char *btm_late_ss[] = {"simple_btm_Late_ss.lib"};
const char *btm_early_ff[] = {"simple_btm_Early_ff.lib"};
const char *btm_late_ff[] = {"simple_btm_Late_ff.lib"};
bool ok = true;
// Create liberty set "top_ss"
ok &= heterosta3d_create_liberty_set_batch(sta, EL_EARLY, "top_ss", top_early_ss, 1);
ok &= heterosta3d_create_liberty_set_batch(sta, EL_LATE, "top_ss", top_late_ss, 1);
// Create liberty set "top_ff"
ok &= heterosta3d_create_liberty_set_batch(sta, EL_EARLY, "top_ff", top_early_ff, 1);
ok &= heterosta3d_create_liberty_set_batch(sta, EL_LATE, "top_ff", top_late_ff, 1);
// Create liberty set "btm_ss"
ok &= heterosta3d_create_liberty_set_batch(sta, EL_EARLY, "btm_ss", btm_early_ss, 1);
ok &= heterosta3d_create_liberty_set_batch(sta, EL_LATE, "btm_ss", btm_late_ss, 1);
// Create liberty set "btm_ff"
ok &= heterosta3d_create_liberty_set_batch(sta, EL_EARLY, "btm_ff", btm_early_ff, 1);
ok &= heterosta3d_create_liberty_set_batch(sta, EL_LATE, "btm_ff", btm_late_ff, 1);Step 3: Create Delay Corners
A delay corner combines a top die Liberty set with a bottom die Liberty set, representing a specific process corner combination. You can create multiple delay corners to analyze different combinations (e.g., ss_ss, ss_ff, ff_ss, ff_ff). Each delay corner can be assigned to a specific CPU or GPU device for parallel analysis.
- API:
heterosta3d_create_delay_corner()
Example:
// Create 4 delay corners: top/btm × ss/ff = 2×2
const char *corner_names[] = {"ss_ss", "ss_ff", "ff_ss", "ff_ff"};
const char *top_sets[] = {"top_ss", "top_ss", "top_ff", "top_ff"};
const char *btm_sets[] = {"btm_ss", "btm_ff", "btm_ss", "btm_ff"};
for (int i = 0; i < 4; ++i) {
// Use HETEROSTA3D_CPU_DEVICE_ID for CPU mode, or 0, 1, ... for GPU devices
ok &= heterosta3d_create_delay_corner(sta, corner_names[i], top_sets[i], btm_sets[i], HETEROSTA3D_CPU_DEVICE_ID);
}Step 4: Load the Design
Provide the circuit's logical structure from a Verilog netlist file.
- API:
heterosta3d_read_netlist()
Important Note on Cell Naming:
Cell names in the Verilog netlist must carry _top or _bottom suffix to indicate die location. For example:
NAND2_X1_top- indicates this cell is on the top dieINV_X1_bottom- indicates this cell is on the bottom die
Example:
// Read netlist
ok &= heterosta3d_read_netlist(sta, "simple.v");
if (!ok) {
std::cerr << "Failed to read netlist" << std::endl;
heterosta3d_free(sta);
return 1;
}Step 5: Prepare the Timing Graph
After the netlist is loaded, it must be finalized into a performance-optimized format and used to construct the timing graph. This is a mandatory step before analysis can proceed.
- APIs:
heterosta3d_flatten_all(),heterosta3d_build_graph()
Example:
// Finalize the loaded data. This is a one-way operation.
heterosta3d_flatten_all(sta);
// Construct the internal timing graph for analysis.
heterosta3d_build_graph(sta);Step 6: Apply Constraints
With the graph built, apply timing constraints from an SDC file for each delay corner.
- API:
heterosta3d_read_sdc()
Example:
// Read SDC for all corners
for (int i = 0; i < 4; ++i) {
heterosta3d_read_sdc(sta, "simple.sdc", corner_names[i]);
}Step 7: Extract 3D RC Parasitics
To accurately model signal propagation time in 3D ICs, the delay calculator needs the resistance (R) and capacitance (C) for each net, including the vertical connections through HBTs. This function extracts RC parasitics from 3D placement data.
- API:
heterosta3d_extract_rc_from_placement()
Key Parameters:
pos_x,pos_y: Arrays of pin coordinates (indexed by internal pin order)hbt_x,hbt_y: Arrays of HBT (Hybrid Bonding Terminal) coordinates per netunit_cap_x_top/y_top: Unit capacitance for top die (fF)unit_res_x_top/y_top: Unit resistance for top die (kΩ)unit_cap_x_btm/y_btm: Unit capacitance for bottom die (fF)unit_res_x_btm/y_btm: Unit resistance for bottom die (kΩ)hbt_r: Vertical link resistance (kΩ)hbt_c: Vertical link capacitance (fF)flute_accuracy: Accuracy of the FLUTE algorithm for Steiner tree construction
Memory Requirements:
- For GPU corners:
pos_x,pos_y,hbt_x,hbt_ymust be on GPU memory - For CPU corners:
pos_x,pos_y,hbt_x,hbt_ymust be on host memory
Example:
// Prepare placement data
std::vector<float> pos_x{500.f, 600.f, 700.f, ...}; // Pin X coordinates
std::vector<float> pos_y{500.f, 600.f, 700.f, ...}; // Pin Y coordinates
std::vector<float> hbt_x{2.f, 12.f, 22.f, ...}; // HBT X coordinates per net
std::vector<float> hbt_y{2.f, 2.f, 2.f, ...}; // HBT Y coordinates per net
float unit_cap_x_top = 0.002f, unit_cap_y_top = 0.002f;
float unit_res_x_top = 0.0005f, unit_res_y_top = 0.0005f;
float unit_cap_x_btm = 0.002f, unit_cap_y_btm = 0.002f;
float unit_res_x_btm = 0.0005f, unit_res_y_btm = 0.0005f;
float hbt_r = 0.003f, hbt_c = 0.6f;
// Extract RC for all corners
for (int i = 0; i < 4; ++i) {
heterosta3d_extract_rc_from_placement(
sta, pos_x.data(), pos_y.data(), hbt_x.data(), hbt_y.data(),
unit_cap_x_top, unit_cap_y_top, unit_res_x_top, unit_res_y_top,
unit_cap_x_btm, unit_cap_y_btm, unit_res_x_btm, unit_res_y_btm,
hbt_r, hbt_c, 4, corner_names[i]);
}Step 8: Run Timing Analysis
With the graph built and parasitics extracted, run the core analysis functions. The sequence of these calls is critical.
- APIs:
heterosta3d_update_delay(): Calculates delays for all cell and net arcs. Must be called beforeupdate_arrivals.heterosta3d_update_arrivals(): Propagates arrival times through the graph to determine slack.
Example:
// Run the core STA calculations in order for all corners.
for (int i = 0; i < 4; ++i) {
heterosta3d_update_delay(sta, corner_names[i]);
heterosta3d_update_arrivals(sta, corner_names[i]);
}Step 9: Retrieve Timing Results
Finally, retrieve the results as either summary metrics (WNS/TNS) or detailed slack arrays.
- APIs:
heterosta3d_report_wns_tns_max(): Reports setup WNS/TNSheterosta3d_report_wns_tns_min(): Reports hold WNS/TNSheterosta3d_report_slacks_at_max(): Gets setup slack arrayheterosta3d_report_slacks_at_min(): Gets hold slack arrayheterosta3d_dump_paths_max_to_file(): Exports setup path reportheterosta3d_dump_paths_min_to_file(): Exports hold path report
Example:
// Report results for all corners
std::vector<float> slack(num_pins * 2);
float (*slack_ptr)[2] = reinterpret_cast<float (*)[2]>(slack.data());
for (int i = 0; i < 4; ++i) {
float wns_max, tns_max, wns_min, tns_min;
heterosta3d_report_wns_tns_max(sta, &wns_max, &tns_max, corner_names[i]);
heterosta3d_report_wns_tns_min(sta, &wns_min, &tns_min, corner_names[i]);
heterosta3d_report_slacks_at_max(sta, slack_ptr, corner_names[i]);
// Process slack data...
std::printf("Corner %s:\n", corner_names[i]);
std::printf(" Setup: WNS=%.3f, TNS=%.3f\n", wns_max, tns_max);
std::printf(" Hold: WNS=%.3f, TNS=%.3f\n", wns_min, tns_min);
// Dump timing paths to files
char paths_max_file[64], paths_min_file[64];
std::sprintf(paths_max_file, "paths_max_%s.rpt", corner_names[i]);
std::sprintf(paths_min_file, "paths_min_%s.rpt", corner_names[i]);
heterosta3d_dump_paths_max_to_file(sta, 10, 1, paths_max_file, corner_names[i]);
heterosta3d_dump_paths_min_to_file(sta, 10, 1, paths_min_file, corner_names[i]);
}Key Features
Multi-Corner Management
HeteroSTA3D uses the delay corner concept to manage different process corner combinations for top and bottom dies. Each delay corner can be independently configured with different Liberty sets.
3D Placement Modeling
HeteroSTA3D supports accurate modeling of 3D IC layouts, including:
- Separate RC parameters for top and bottom dies
- HBT (Hybrid Bonding Terminal) coordinates for vertical interconnects
- Vertical link resistance and capacitance parameters
Cell Naming Convention
Cell names in the Verilog netlist must include _top or _bottom suffix to indicate which die they belong to. This allows HeteroSTA3D to automatically determine die location and apply the correct timing library.
Multi-GPU Parallel Analysis
Different delay corners can be assigned to different GPU devices, enabling parallel analysis across multiple corners. This significantly improves throughput when analyzing multiple process corner combinations.
Next Steps
For detailed information on every function, including all parameters and data structures, please refer to the complete API Reference document.