Introduction

Amid continuous advances in semiconductor technology, heterogeneous integration has emerged as a transformative paradigm in chip design1,2,3. By tightly packaging chiplets fabricated with different materials, processes, and functionalities, heterogeneous integration has enabled unprecedented improvements in system performance and functional density4. However, this highly integrated architecture represents a double-edged sword: while delivering substantial computational power, it also introduces severe thermal management challenges5,6,7. Heat dissipation becomes increasingly constrained within densely stacked microstructures, and localized overheating has emerged as a critical bottleneck affecting system stability and long-term reliability.

In this context, accurate prediction of chip thermal resistance—an essential indicator of heat dissipation capability—has become a crucial task in heterogeneous integrated circuit design8,9. Traditional solutions have primarily relied on physics-based numerical simulations. Although such methods are theoretically rigorous, they typically require complex model construction and extensive computational time, often taking hours or even longer for a single evaluation. As a result, they struggle to meet the demands of rapid iteration in modern chip design workflows10,11,12. Conversely, overly simplified empirical formulas lack the ability to capture the intricate structural and material variations inherent in heterogeneous integration. Engineering practice has therefore long faced a fundamental trade-off between prediction accuracy and computational efficiency13.

Recent studies have explored artificial intelligence techniques for thermal analysis; however, most existing approaches still treat the chip as a “black box,” focusing mainly on the macroscopic mapping between input parameters and output thermal metrics14,15,16. The thermal behavior of heterogeneous integrated chips is intrinsically governed by their multi-layered, multi-material physical structures17,18. Training models solely on final temperature or thermal resistance data, while neglecting explicit structural characteristics, leads to shallow representations and may severely limit generalization capability19,20. A truly reliable prediction model must therefore be capable of perceiving and understanding the internal geometric and material composition of the chip.

Motivated by this insight, the present study incorporated internal structural parameters and material properties of heterogeneous integrated chips as explicit inputs to the neural network, enabling the model to directly learn the causal relationships between physical configuration and thermal performance. A classical yet interpretable backpropagation (BP) neural network was selected as the core architecture to construct an intelligent prediction framework that maps chip “anatomical” descriptions to thermal resistance metrics. By integrating explicit physical representations into the learning process, the proposed approach allowed the model to move beyond simple correlation learning toward a deeper understanding of underlying physical mechanisms, thereby enabling more accurate and reliable predictions when applied to previously unseen heterogeneous integration designs.

Literature review

The rapid development of heterogeneous integration technology has made thermal management a critical concern for both academia and industry. As emphasized by Zhou et al.21 in their authoritative work on advanced packaging, three-dimensional stacking and chiplet-based integration significantly improved bandwidth and energy efficiency, but simultaneously rendered heat flow paths highly complex, giving rise to a new bottleneck commonly referred to as the “thermal wall.” In the traditional field of thermal analysis, physics-based modeling approaches long remained the dominant methodology. For instance, Pfromm et al.22 systematically demonstrated how high-fidelity modeling tools could be employed to analyze thermal behavior in 2.5D packaging structures. However, such approaches faced a fundamental dilemma: the fidelity of the physical model increased exponentially with computational cost.

To balance efficiency and accuracy, several studies attempted to develop simplified analytical or semi-analytical models. Zhou et al.23 proposed a fast estimation method based on thermal resistance networks, in which complex structures were simplified into series–parallel combinations of thermal resistances to enable rapid computation. Although this approach substantially improved computational efficiency, its accuracy depended heavily on the modeler’s experience and struggled to precisely capture strong three-dimensional nonlinear thermal effects inherent in heterogeneous integration. As noted by Dang et al.24, a leading expert in thermal management, when packaging structures entered the submicron scale and involved diverse material systems, prediction errors from simplified models could become unacceptable.

In recent years, machine learning techniques provided a new perspective for addressing this challenge. Renno and Petito25 were among the first to apply random forest algorithms to junction temperature prediction in central processing units, achieving remarkable accuracy using only a limited set of input features such as power consumption and operating frequency. Nketiah et al.26 employed recurrent neural networks to process temporal thermal data and predict temperature evolution under dynamic power conditions. These studies convincingly demonstrated the applicability and potential of artificial intelligence in thermal analysis tasks.

As machine learning has penetrated the field of engineering science, its applications have expanded from general-purpose prediction to modeling high-dimensional, complex problems that deeply integrate physical mechanisms. In particular, numerous insightful studies have emerged in the multi-scale performance prediction of materials and structures. For example, Zhang et al.27 improved model reliability under small-sample conditions in dam piping prediction through data augmentation and interpretable ensemble learning. Liu et al.28 employed interpretable machine learning to model the thermal conductivity of polymer nanocomposites at multiple scales, while also quantifying prediction uncertainty. These studies demonstrate the potential of machine learning to handle strongly nonlinear and multi-physics coupled problems. In computational materials design and surrogate modeling, Liu et al.29 developed a networked expert system platform for materials design, enabling efficient management of high-fidelity model calls and data flow. Liu and Lu30 provided a systematic review of machine learning surrogate models applied to stochastic multi-scale modeling of composites, offering a methodological framework for addressing computationally expensive problems. Additionally, in predicting mechanical behavior involving fine geometries and progressive damage, such as Lin et al.’s31 study on composite bolted joints, machine learning has shown potential as a substitute for traditional high-cost simulations.

However, most of these studies focus on improving prediction efficiency or managing high-dimensional data, with weak explicit integration of underlying physical laws. Model interpretability is often treated as a post hoc analysis rather than a design constraint. This limitation is particularly evident in the field of heterogeneous integrated chip thermal management, where existing research primarily predicts temperature or thermal resistance end-to-end using algorithms, without systematically embedding heat transfer knowledge into model architectures or training processes. In contrast, the core contribution of this study is not to use a BP neural network as a black-box predictor. Instead, this study treats it as a flexible framework for embedding physical knowledge. By constructing physics-guided feature systems, imposing sign constraints on weight layers, and designing attention interaction layers, this study explicitly encodes thermal principles into the network. This approach ensures that the model predictions remain accurate while the responses align with physical intuition, providing direct insight for design optimization. The study addresses the lack of physics guidance in existing data-driven methods during early chip design and offers a concrete engineering example of physics-informed machine learning for thermal management.

Research methodology

Overall research framework and technical route

This study adopted a research paradigm that integrated physics-based modeling, data-driven validation, and iterative design optimization, establishing a systematic framework as illustrated in Fig. 1. The proposed framework organically combined heat transfer principles with machine learning techniques, forming a complete closed-loop workflow from problem formulation to engineering application.

Fig. 1
Fig. 1
Full size image

Research framework for thermal resistance prediction using a physics-integrated BP neural network.

The proposed framework started from the three-dimensional stacked structure of heterogeneous integrated chips and analyzed heat conduction paths across multilayer materials to identify key physical factors affecting thermal resistance. Based on physical mechanism analysis, a structured feature system was constructed. These features included not only directly measurable parameters, such as chip dimensions and material thermal conductivities, but also composite features derived from fundamental heat transfer principles. For example, by calculating heat flux density distributions across different material interfaces, critical indicators characterizing interfacial thermal resistance were formulated.

During the training process, particular emphasis was placed on preserving the physical plausibility of the model. In addition to conventional accuracy-driven optimization objectives, physics-based constraint conditions were introduced to ensure that model behavior conformed to fundamental thermodynamic principles. Model validation adopted a multi-level strategy. Once trained, the model was capable of rapidly evaluating the thermal performance of new designs. Engineers only needed to input structural parameters and material properties to obtain accurate thermal resistance predictions within seconds, thereby significantly improving design efficiency.

Construction of a key feature system based on heat transfer mechanisms

To transform the complex physical problem into features learnable by a machine learning model, the core feature system was constructed around four fundamental physical categories of heterogeneous integrated chips: geometry, materials, power distribution, and boundary conditions. These physical factors were translated into a feature set with clear engineering significance based on heat transfer principles. The overall architecture of the feature system is illustrated in Fig. 2.

Fig. 2
Fig. 2
Full size image

Architecture of the key feature system for thermal resistance prediction in heterogeneous integrated chips.

As shown in Fig. 2, geometric features precisely describe the chip’s physical layout and three-dimensional dimensions, which directly determine the spatial paths of heat flow. Key parameters and their design space are based on mainstream 2.5D/3D integration technologies. In this study, the number of vertically stacked layers in the chip is set to 2–6. The thickness of each material layer is defined as follows: active silicon layers 20–100 μm, interposer layers 100–200 μm, and packaging substrates 200–800 μm. The chip length L and width W range from 8 mm × 8 mm to 15 mm × 15 mm. From these dimensions, the chip area \({A}_{chip}\) is calculated as a basic feature:

$${A}_{chip}=L\times\:W$$
(1)

For key vertical interconnect structures such as micro-bumps and through-silicon vias (TSVs), this study extracts core geometric parameters that define their heat dissipation capacity. The micro-bump pitch is set according to the interconnect density between the chip and interposer or between the interposer and package, with values of 40 μm, 55 μm, and 130 μm. Bump diameters and heights are scaled proportionally to the pitch, with diameter ranges of 20–40 μm and height ranges of 25–50 μm. The bump area density \({\rho}_{bump}\) quantifies the richness of heat conduction paths:

$${\rho}_{bump}=\frac{{N}_{bump}}{{A}_{chip}}$$
(2)

where \({N}_{bump}\) is the total number of bumps. TSV features are parameterized in a similar manner.

Material features define the thermal transport capabilities of each layer and the thermal resistance bottlenecks at interfaces. The intrinsic thermal conductivity range for each layer is 0.2–400 W/(m K). To account for the impact of manufacturing processes on actual interface thermal resistance, this study defines a composite feature, the interface quality factor \({Q}_{interface}\). This factor is based on an ideal interface thermal resistance model. First, the ideal contact thermal resistance \({R}_{c,ideal}\) is determined by the theoretical contact conductance \({h}_{c}\) and the contact area \({A}_{contact}\):

$${R}_{c,ideal}=\frac{1}{{A}_{contact}\cdot\:{h}_{c}}$$
(3)

Then, \({Q}_{interface}\) characterizes the closeness of the actual interface to the ideal case:

$${Q}_{interface}=\alpha\:\cdot\:\left(1-\frac{{R}_{c,ideal}}{{R}_{c,actual}}\right)$$
(4)

\({R}_{c,actual}\)\({R}_{c,ideal}\)> 0, and 0 < α ≤ 1. \({R}_{c,actual}\) denotes the estimated actual interfacial thermal resistance obtained through calibration or from literature, while α is a correction factor that accounts for surface roughness and bonding quality. The closer the value of \({Q}_{interface}\) is to 1, the closer the actual interfacial thermal resistance is to the ideal condition (i.e., lower thermal resistance), indicating higher interfacial contact quality.

Power distribution fundamentally determines the heat generation field within the chip and serves as the source term driving the temperature field. The total chip power ranges from 10 to 200 W. The chip layout is simplified as an m×n grid, with each functional unit’s power assigned to its physical location to form a two-dimensional power density matrix Φ(x, y). From this matrix, three key statistical features are extracted: the average power density \(\stackrel{-}{\varphi}\), the maximum local power density \({\varphi}_{max}\) (10–200 W/mm2), and the spatial power non-uniformity index \({U}_{p}\), which quantifies the concentration of power distribution:

$${U}_{p}=\frac{{\varphi}_{max}}{\stackrel{-}{\varphi}}$$
(5)

Boundary conditions parameterize the chip’s thermal exchange with the external environment. The top cooling condition is simplified as an equivalent convective heat transfer coefficient, ranging from 5 to 1000 W/(m2 K), corresponding to natural convection up to forced liquid cooling. The bottom packaging interface is described by an equivalent contact thermal resistance from the chip bottom to the package casing, ranging from 0.1 to 2 K/W.

After constructing the physics-based feature system, this study carefully considered feature selection and dimensionality reduction. Automated reduction techniques were not used because the primary objective of this model is to maintain physical interpretability. Although automated reduction compresses data dimensions, it generates features that cannot be directly mapped to physical entities, weakening the model’s utility for design guidance. Instead, each feature in this physically informed set has clear engineering significance, such as the interface quality factor and equivalent convective coefficient. The total number of features is controlled after physical refinement. Correlation analysis confirmed low linear dependence among features, preventing redundancy at the source. Importantly, the grouped input design and attention interaction layers in this network architecture enable dynamic identification and reinforcement of key physical feature interactions under different operating conditions during training. This approach provides a flexible, context-aware, and adaptive feature processing mechanism.

BP neural network architecture integrating physical knowledge

When selecting the model architecture, this study considered the specific characteristics of the thermal resistance prediction problem for heterogeneous integrated chips and chose the classical BP neural network as the core framework, as illustrated in Fig. 3. While CNNs excel at processing image-like data with spatial locality and translation invariance, and Transformer models are advantageous for capturing long-range dependencies, this study deals with highly structured, non-image numerical feature vectors. These features have been physically analyzed and extracted as scalars or low-dimensional vectors with clear engineering significance, and their intrinsic topological structure has already been deconstructed. Consequently, the inductive biases leveraged by CNNs or Transformers are unnecessary for this problem and may introduce unnecessary complexity. In contrast, BP neural networks, with their fully connected and highly flexible structure, can most directly and effectively learn the complex nonlinear mapping between the engineered physical features and thermal performance metrics. Their transparent structure also facilitates the direct incorporation of the physical constraints and attention mechanisms described earlier, ensuring strong predictive capability while maintaining physical consistency and engineering interpretability.

Fig. 3
Fig. 3
Full size image

Schematic of the BP Neural Network Architecture.

As illustrated in Fig. 3, the proposed architecture adopts a hierarchical design in which the input layer directly corresponds to the constructed feature system. The input features are grouped into four channels according to their physical attributes. The geometric structure channel receives parameters such as layer thickness, chip dimensions, and layout information. The material property channel processes thermal conductivities and interface quality factors. The power distribution channel accepts power-related features, while the boundary condition channel receives heat dissipation parameters. This grouped-input strategy enables the network to recognize the physical meaning of different feature categories at an early stage of learning. The design of the hidden layers reflects a deep integration of physical knowledge. The first hidden layer employs a grouped connection strategy, allowing features within the same physical category to interact internally before cross-category coupling occurs. This design is motivated by the hierarchical nature of heat transfer: heat is first transferred within individual physical subsystems and then couples across subsystems.

This study imposes sign constraints on the network weights, motivated by the fundamental principles of heat conduction and the need for model interpretability in engineering design. In thermal conduction, the relationship between key physical parameters and the overall thermal resistance often exhibits inherent monotonicity. For instance, according to Fourier’s law, increasing the thermal conductivity λ of a material or improving the contact interface quality should monotonically reduce the overall thermal resistance \({R}_{ja}\); conversely, increasing the heat power P or the thickness of a low-conductivity layer should monotonically increase \({R}_{ja}\). These monotonic relationships serve as reliable physical priors. This study encodes this prior knowledge directly into the network by constraining the sign of the weights connecting specific input features to the first hidden layer. This ensures that the network’s input–output response aligns with physical intuition. Compared with other potential physics-informed strategies, the approaches achieves this through simple parameter reparameterization with minimal computational overhead during training.

This guarantees that the model’s sensitivity to design parameters is physically meaningful, which is crucial for designers using the model to guide optimization. For example, they can be confident that improving the thermal interface material will never degrade performance. By restricting the hypothesis space to physically consistent functions, the approach reduces the risk of learning spurious correlations from limited or noisy data, thereby enhancing generalization.

A physics-based constraint mechanism is introduced at this layer by imposing sign constraints on network weights to ensure that model behavior conforms to fundamental thermodynamic principles. Mathematically, for a physical feature known to be negatively correlated with the thermal resistance \({R}_{ja}\), all associated weights \({w}_{ij}\) are constrained to satisfy:

$$w_{{ij}} \ge 0$$
(6)

Conversely, for features known to be positively correlated with thermal resistance, such as geometric thickness or interfacial thermal resistance, the weights are constrained as:

$${w}_{ij}\le\:0$$
(7)

During training, these constraints are implemented using a parameter reparameterization technique. Specifically, each constrained weight \({w}_{ij}\) is expressed as a function of an unconstrained parameter \({\theta}_{ij}\). For non-negative constraints, the transformation is defined as:

$${w}_{ij}=\text{s}\text{o}\text{f}\text{t}\text{p}\text{l}\text{u}\text{s}\left({\theta}_{ij}\right)=\text{l}\text{o}\text{g}(1+\text{e}\text{x}\text{p}({\theta}_{ij}\left)\right)$$
(8)

For non-positive constraints, the transformation is defined as:

$${w}_{ij}=-\text{s}\text{o}\text{f}\text{t}\text{p}\text{l}\text{u}\text{s}\left({\theta}_{ij}\right)$$
(9)

The training process therefore optimizes the unconstrained parameters \({\theta}_{ij}\), while strictly ensuring that the resulting weights \({w}_{ij}\) always satisfy the prescribed sign constraints. This approach allows the network to learn complex nonlinear mappings from data while preserving physically consistent monotonic relationships. For example, it guarantees that an increase in material thermal conductivity always leads to a decrease in the predicted thermal resistance. The above reparameterization approach ensures that, during forward propagation, the weights \({w}_{ij}\) always satisfy the prescribed sign constraints. During backpropagation, gradient computation is performed with respect to the unconstrained parameters \({\theta}_{ij}\). The gradient of the loss function L with respect to \({\theta}_{ij}\) is calculated via the chain rule:

$$\frac{\partial\:L}{\partial\:{\theta}_{ij}}=\frac{\partial\:L}{\partial\:{w}_{ij}}\cdot\:\frac{\partial\:{w}_{ij}}{\partial\:{\theta}_{ij}}$$
(10)

where \(\frac{\partial\:{w}_{ij}}{\partial\:{\theta}_{ij}}\) is the derivative of the reparameterization function. For non-negative constraints, this derivative equals σ(\({\theta}_{ij}\)), i.e., the sigmoid function; for non-positive constraints, it equals − σ(\({\theta}_{ij}\)). The optimizer then updates \({\theta}_{ij}\) directly:

$${\theta}_{ij}\leftarrow\:{\theta}_{ij}-\eta\:\cdot\:\frac{\partial\:L}{\partial\:{\theta}_{ij}}$$
(11)

This procedure allows \({\theta}_{ij}\) to be updated freely, while the forward reparameterization function ensures that the corresponding weights \({w}_{ij}\) always satisfy the physical sign constraints. In this way, constraint enforcement is transformed from a difficult optimization problem into a deterministic forward computation, maintaining strict compliance with physical priors during training without compromising gradient-based optimization efficiency.

The second hidden layer focuses on feature interactions across different physical dimensions. A fully connected structure is adopted to learn complex coupling relationships, such as geometry–material and power–boundary interactions. An attention mechanism is introduced to enable the network to automatically identify which feature interactions are most critical for thermal resistance prediction. The attention weight \({\alpha}_{ij}\) is computed as:

$${\alpha}_{ij}=\frac{\text{e}\text{x}\text{p}\left({\text{W}}_{a}\right[{\text{h}}_{i};{\text{h}}_{j}\left]\right)}{\sum_{k}\:\text{e}\text{x}\text{p}\left({\text{W}}_{a}\right[{\text{h}}_{i};{\text{h}}_{k}\left]\right)}$$
(12)

where \({\text{h}}_{i}\) and \({\text{h}}_{j}\) denote representation vectors of different feature groups, [ ; ] represents vector concatenation, and \({\text{W}}_{a}\) is a learnable parameter matrix. This attention-based design fundamentally enhances the model’s ability to capture context-dependent physical relationships. Unlike standard fully connected layers that apply static weights to all feature interactions, the attention mechanism dynamically recalibrates the importance of each interaction based on the specific design input. This capability is crucial because dominant thermal coupling relationships in heterogeneous integrated chips are highly scenario-dependent. For example, when a design exhibits extremely high local power density, the interaction between power distribution and the vertical thermal conductivity of materials becomes critical for dissipating heat from hotspot regions. Conversely, for a design with poor interface bonding quality, the interaction between material properties and packaging boundary conditions may become the limiting factor. The attention layer enables the model to simulate this adaptive, physics-informed prioritization of interactions.

Furthermore, the resulting attention weights, \({\alpha}_{ij}\), provide valuable interpretability insights, aligning the model’s decision-making with engineering intuition. In this subsequent analysis, the model consistently assigns high attention weights to interactions involving inter-layer thickness and via density in three-dimensional stacked structures. This directly reflects a known physical principle: vias serve as critical vertical heat pathways to alleviate inter-layer thermal accumulation. Therefore, the attention layer not only improves prediction performance by focusing on the most relevant nonlinear couplings but also functions as an embedded physics-guided model explanation tool, revealing which cross-domain interactions the model considers most critical for a given design.

The output layer is designed with engineering practicality in mind. The primary output node directly predicts the total junction-to-ambient thermal resistance \({R}_{ja}\), defined as the ratio of the temperature difference between the chip junction temperature \({T}_{j}\) and the ambient temperature \({T}_{a}\) to the total chip power consumption \({P}_{total}\):

$${R}_{ja}=\frac{{T}_{j}-{T}_{a}}{{P}_{total}}$$
(13)

This quantity constitutes the main optimization target of the model. In addition, auxiliary output nodes are included to predict derivative metrics of high practical interest. The maximum hotspot temperature is defined as:

$$T_{{\max }} = \mathop {\max }\limits_{{(x,y) \in \:\Omega }} \:T(x,y)$$
(14)

where \(T(x,y)\) denotes the temperature at spatial location (x, y) within the chip region Ω. The temperature non-uniformity index is defined as:

$${\Delta}T={T}_{max}-{T}_{min}$$
(15)

where \({T}_{min}\) and \({T}_{max}\) represent the minimum and maximum temperatures within the effective chip area, respectively. A multi-task learning framework is adopted so that these highly correlated tasks share the same underlying feature extraction network. The overall loss function is defined as

$${\mathcal{L}}_{total}={\lambda}_{R}{\mathcal{L}}_{{R}_{ja}}+{\lambda}_{T}{\mathcal{L}}_{{T}_{max}}+{\lambda}_{{\Delta}}{\mathcal{L}}_{{\Delta}T}$$
(16)

where \({\mathcal{L}}_{{R}_{ja}}\), \({\mathcal{L}}_{{T}_{max}}\), and \({\mathcal{L}}_{{\Delta}T}\) denote the loss functions corresponding to each output task, respectively. The balancing parameters \({\lambda}_{R}\), \({\lambda}_{T}\), and \({\lambda}_{{\Delta}}\) are set as fixed hyperparameters. Through a combination of grid search and performance feedback on the validation set, this study determines the final values as \({\lambda}_{R}\) = 1.0, \({\lambda}_{T}\) = 0.5, and \({\lambda}_{{\Delta}}\) = 0.2. This configuration reflects engineering priorities: the total thermal resistance \({R}_{ja}\) is the primary optimization target and is therefore assigned the highest weight. The maximum temperature \({T}_{max}\) and temperature non-uniformity \({\Delta}T\) serve as important auxiliary regularization tasks. Their weights are tuned to promote shared representation learning while avoiding interference with the main task. This design yields dual benefits. On the one hand, shared representations improve data utilization efficiency and accelerate training convergence. On the other hand, learning to predict strongly physics-related indicators such as \({T}_{max}\) and \({\Delta}T\) acts as an effective form of regularization, forcing the network to learn feature representations that are more consistent with realistic thermal distribution patterns. For the choice of activation functions, the Swish function is adopted in place of the traditional ReLU function. The Swish function exhibits smooth and non-monotonic characteristics, making it more suitable for modeling the nonlinear behavior inherent in heat transfer processes. It is defined as

$$f\left(x\right)=x\cdot\:\sigma\:\left(\beta\:x\right)$$
(17)

where the parameter β is learned and adaptively adjusted during training, enabling the activation function to accommodate different feature scales. The loss function design integrates data fitting objectives with physical constraints. The overall loss function is expressed as

$$\mathcal{L}={\mathcal{L}}_{MSE}+{\lambda}_{1}{\mathcal{L}}_{mono}+{\lambda}_{2}{\mathcal{L}}_{energy}$$
(18)

where \({\mathcal{L}}_{MSE}\) represents the mean squared error between the predicted and ground-truth thermal resistance values. \({\mathcal{L}}_{mono}\) denotes the monotonicity constraint loss, which ensures that the thermal resistance response to variations in specific parameters is consistent with known physical laws. \({\lambda}_{1}\) and \({\lambda}_{2}\) are balancing hyperparameters. \({\mathcal{L}}_{energy}\) is a soft energy conservation constraint that enforces approximate consistency between the predicted heat dissipation power and the input total power. This approach does not require the model to explicitly predict or compute the boundary heat flux. Instead, it relies on a simplified physical relationship: under steady-state conditions, the chip’s total power \({P}_{total}\) should approximately equal the dissipated power inferred from the predicted total thermal resistance \({R}_{ja}^{pred}\) and the temperature difference with the environment. Accordingly, the energy-based loss \({L}_{energy}\)​ is defined as the squared relative error between the predicted dissipated power and the input total power:

$${L}_{energy}={\left(\frac{{P}_{total}-\frac{{T}_{j}^{pred}-{T}_{a}}{{R}_{ja}^{pred}}}{{P}_{total}}\right)}^{2}$$
(19)

\({T}_{j}^{pred}\) denotes the junction temperature predicted by the model, and \({T}_{a}\) is the given ambient temperature. This loss penalizes predictions where the “input power” and the “model-implied dissipation capacity” are severely mismatched. In effect, it incorporates the law of energy conservation as a prior, serving as a soft constraint to guide the model toward physically plausible predictions. Equation (16) defines the main task loss \({\mathcal{L}}_{{R}_{ja}}\), which consists of a data-fitting term \({\mathcal{L}}_{MSE}\) and two physics-based regularization terms, \({\mathcal{L}}_{mono}\) and \({\mathcal{L}}_{energy}\). Substituting \({\mathcal{L}}_{{R}_{ja}}\) into Eq. (16) yields the overall loss function used to train the multi-task network:

$${\mathcal{L}}_{total}={\lambda}_{R}({\mathcal{L}}_{MSE}+{\lambda}_{1}{\mathcal{L}}_{mono}+{\lambda}_{2}{\mathcal{L}}_{energy}\:)+{\lambda}_{T}{\mathcal{L}}_{{T}_{max}}+{\lambda}_{{\Delta}}{\mathcal{L}}_{{\Delta}T}$$
(20)

\({\mathcal{L}}_{{T}_{max}}\) and \({\mathcal{L}}_{{\Delta}T}\) are the mean squared error losses for the maximum temperature and temperature non-uniformity prediction tasks, respectively. The network training thus jointly optimizes \({\mathcal{L}}_{total}\), which balances multi-task objectives while enforcing physical constraints.

From the outset, ensuring that the model remains physically consistent under unseen test scenarios was established as a core goal. To this end, this study developed an integrated physics-informed framework. It begins with physics-guided feature engineering, where geometry, material, power, and boundary condition features are physically distilled at the input level, eliminating non-physical parameters and establishing a foundation for physically meaningful learning. Within the network architecture, strict sign constraints are applied to the first hidden layer weights. This ensures that, regardless of input feature combinations, the model’s responses to key parameters obey the fundamental monotonicity dictated by Fourier’s law. These constraints act as hard physical rules during forward propagation and are unconditionally valid, even under extreme operating conditions.

However, monotonicity alone is insufficient to capture complex thermal couplings. Therefore, an attention-driven feature interaction layer is introduced, enabling the model to dynamically assess and emphasize the interactions between different physical dimensions based on specific inputs. This adaptivity allows the model to identify dominant physical processes in scenarios that are underrepresented in the training data and to simulate nonlinear responses that align with actual mechanisms. Simultaneously, the multi-task learning framework, which requires accurate predictions of total thermal resistance, maximum temperature, and temperature non-uniformity, converts deep knowledge of overall heat dissipation and local distribution into a strong regularization constraint. This forces the hidden layers to learn generalized feature representations capable of explaining multiple related physical phenomena, preventing overfitting or the learning of non-physical shortcuts. Consequently, even for design parameters far outside the training distribution, the model produces predictions that are physically plausible, rather than mathematically possible but physically unreasonable.

Experimental design and performance evaluation

Datasets collection

A representative 2.5D heterogeneous integrated chip thermal simulation model is constructed. Key design parameters are combined using scientifically designed sampling strategies, resulting in a complete dataset consisting of 1500 distinct design configurations. For each configuration, high-fidelity thermal simulations are performed to compute steady-state thermal performance. The primary outputs include the overall chip thermal resistance, the maximum operating temperature, and the surface temperature uniformity. Consequently, each data sample explicitly corresponds to a set of design inputs and a set of thermal performance outputs. Prior to model training, all data are standardized and randomly divided into three independent subsets following a 70:15:15 ratio. Among them, 1050 samples are used as the training set to enable the model to learn the complex relationships between design parameters and thermal performance.

The validation set is employed during training to assist in model tuning and to prevent overfitting. The remaining 225 samples constitute the test set, which is exclusively used for the final objective evaluation of the model’s predictive capability. The test set samples are constructed to ensure uniform coverage across the full ranges of all key design parameters, including power density, geometric dimensions, and material properties, while remaining independent of the training set. This guarantees both diversity and representativeness. By analyzing the distribution of test samples within the space defined by the principal features, the set effectively spans design scenarios from typical to extreme boundary conditions. Therefore, performance metrics evaluated on this test set reliably reflect the model’s generalization capability and robustness for entirely new, unseen chip design configurations.

Experimental environment

All experiments are conducted on a unified hardware and software platform. The hardware environment is equipped with two Intel Xeon Gold 6248R processors, 512 GB of system memory, and four NVIDIA RTX A6000 GPUs to accelerate neural network training. The software environment is based on the Ubuntu 20.04 LTS operating system. Model development and experimentation primarily rely on Python 3.9, with the PyTorch 1.12 deep learning framework used for neural network construction, training, and evaluation. Auxiliary data preprocessing, feature engineering, and result visualization are performed using scientific computing libraries including NumPy, Pandas, and Matplotlib. Finite element thermal simulations are batch-executed in the ANSYS 2022 R1 environment.

Parameters setting

This study aims to evaluate the effectiveness of the proposed physics-constrained BP neural network model. To this end, it is compared with a series of representative baseline models. All models are trained and tuned using the same training and validation datasets. This study systematically tested networks of varying depths (1–3 hidden layers) and different neuron counts per layer (16, 32, 64, 128). The [16, 32, 3] architecture achieves the best trade-off between performance and complexity on the validation set. Deeper or wider networks, such as [16, 64, 32, 3], provide negligible performance gains (validation R2 increase < 0.003) while increasing the risk of overfitting, indicating that the nonlinear mapping in the current problem can be effectively captured by a moderately complex network.

The original single hidden layer is further split into two functionally distinct layers. Extensive ablation studies and hyperparameter searches are conducted, testing different neuron combinations for the constraint and interaction layers, including [48, 16], [32, 32], and [32, 16]. The [32, 16] configuration achieves the best predictive performance for the main task Rja on the validation set. This study also evaluates multi-head attention with 2, 4, and 8 heads, finding that 4 heads provide the optimal balance between capturing cross-dimensional interactions and computational efficiency. A grid search determines the Adam optimizer learning rate as 0.0008, with multitask loss weights set to λR = 1.0 and λT = λΔ = 0.5. This configuration ensures stable training convergence and optimal generalization performance. Finally, the key hyperparameter settings are summarized in Table 1.

Table 1 Comparison of models and their key.

Performance evaluation

The predictive performance of the models is quantitatively evaluated using three widely adopted regression metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). Their corresponding formulations are given as follows:

$$\text{M}\text{S}\text{E}=\frac{1}{n}\sum\limits_{i=1}^{n}\:({y}_{i}-{\widehat{y}}_{i}{)}^{2},$$
(21)
$$\text{M}\text{A}\text{E}=\frac{1}{n}\sum\limits_{i=1}^{n}\:|{y}_{i}-{\widehat{y}}_{i}|,$$
(22)
$${R}^{2}=1-\frac{\sum_{i=1}^{n}\:({y}_{i}-{\widehat{y}}_{i}{)}^{2}}{\sum_{i=1}^{n}\:({y}_{i}-\stackrel{-}{y}{)}^{2}}.$$
(23)
$$\overline{y} = \frac{1}{n}\sum\limits_{{i = 1}}^{n} {y_{i} }$$
(24)

\({y}_{i}\) and \({\widehat{y}}_{i}\) represent the ground-truth value and the predicted value of the i-th sample, respectively; \(\overline{y}\) denotes the mean of the ground-truth values, and n is the total number of test samples. Smaller values of MSE and MAE indicate higher prediction accuracy, whereas an R2 value closer to 1 reflects a stronger capability of the model to explain the variance in the data.

Figure 4 compares the performance of different models on the primary task of total thermal resistance prediction. Multiple linear regression achieves an R2 value of only 0.847, confirming the presence of pronounced nonlinear relationships between the thermal resistance of heterogeneous integrated chips and their design parameters, which linear models fail to capture accurately. Support vector regression and random forest models exhibit substantial performance improvements by introducing nonlinear mapping capabilities. The standard BP neural network further increases the R2 value to 0.968, demonstrating the superiority of neural networks in handling high-dimensional nonlinear problems. In contrast, the proposed physics-constrained BP neural network delivers the best performance, achieving an R2 of 0.982, with its MSE (0.021) reduced by approximately 45% compared with that of the standard BP neural network.

Fig. 4
Fig. 4
Full size image

Comparison of total thermal resistance prediction performance (test set).

In addition to total thermal resistance, the performance of all models on the maximum temperature Tmax prediction task is shown in Fig. 5.

Fig. 5
Fig. 5
Full size image

Comparison of maximum temperature prediction performance (test set).

As illustrated in Fig. 5, predicting the maximum temperature is particularly challenging because it represents a local extreme value, influenced by complex interactions among local power density, material interfaces, and neighboring heat sources. The performance ranking of the models remains consistent with that observed for total thermal resistance; however, the R2R^2R2 values of all models are generally slightly lower than those for total thermal resistance prediction, indicating the greater difficulty of the Tmax task. The proposed model still achieves the best performance, with R2 = 0.969 and MAE = 2.43 °C. Considering that practical engineering applications typically require temperature prediction errors within ± 5 °C, this level of accuracy is already of practical significance. This result validates that the feature interaction layer and attention mechanism effectively capture the key factor combinations influencing local hotspots. To further explore the benefits of multi-task learning, this study designed a comparative experiment, and the results are shown in Fig. 6. Model convergence is defined as the point at which the validation loss does not decrease for 10 consecutive epochs.

Fig. 6
Fig. 6
Full size image

Performance gain analysis of multi-task learning on core tasks.

It is evident that the multi-task learning architecture brings dual benefits. In terms of performance, the predicted R2 for total thermal resistance increases by 0.004 compared with the single-task version, and the MSE decreases by 16%. More importantly, the number of training epochs required to achieve optimal performance is reduced by 25%. Although the improvement in the main task’s performance is numerically modest, in the region where high-precision predictive models are approaching performance saturation, this gain is both statistically and engineering-wise significant. More importantly, the multi-task architecture introduces performance trade-offs across tasks, providing valuable insights into the model’s generalization behavior. Analysis indicates that the auxiliary effect of the Tmax prediction task is the most pronounced. When the model is required to accurately predict both \({R}_{ja}\) and Tmax, it is compelled to learn feature representations that capture both the overall thermal resistance and local hotspot formation. This dual requirement acts as a strong regularizer, preventing the network from learning overly specialized or even non-physical mappings that could result from optimizing \({R}_{ja}\) alone, thereby enhancing the generalization of the main task. In contrast, the additional contribution from the temperature non-uniformity (ΔT) task is relatively small, likely because ΔT is highly correlated with Tmax and the overall thermal field, containing limited independent information. Nevertheless, including the ΔT task helps the model capture more nuanced spatial distributions of the temperature field. Therefore, the observed R2 improvement of 0.004 reflects the multi-task learning framework’s ability to drive the model toward more fundamental, physically meaningful, and generalizable thermal feature representations, rather than a simple accumulation of predictive accuracy.

Finally, from an engineering applicability perspective, the study evaluated the model’s robustness across different design complexity levels. The test samples were grouped into low, medium, and high complexity categories based on interconnect density and the number of material layers as composite indicators. The predictive performance of the proposed model in each group was then analyzed. As shown in Fig. 7, prediction accuracy (R2) decreases as complexity increases, which is expected because more complex designs involve more intricate thermal coupling. Nevertheless, even in the high-complexity group, the model achieves excellent R2 values of 0.972 and 0.955 for \({R}_{ja}\) and Tmax, respectively. The performance decline is gradual, with R2 for \({R}_{ja}\) decreasing from 0.990 to 0.972 rather than dropping sharply. This strongly demonstrates the model’s robust generalization capability. The underlying reason is that the model learns the “fundamental principles” of heat transfer through physical constraints and feature interactions, rather than merely memorizing specific patterns in the training set.

Fig. 7
Fig. 7
Full size image

Robustness of the BP model across different design complexity levels.

To evaluate the model’s scalability in practical design workflows, this study conducted a computational efficiency analysis covering the entire pipeline from data generation to model inference. Generating a complete simulation dataset of 1,500 design configurations constitutes the main upfront cost. Using 16 CPU cores in parallel, this process takes approximately 625 h (about 26 days), producing a total dataset of roughly 4.7 GB. Training the proposed model on this dataset requires only about 45 min. Once trained, single predictions for new designs are extremely fast, averaging just 8.2 milliseconds. In comparison, high-fidelity finite element thermal simulations used to generate the training data require an average of 25 min per design for steady-state analysis. Consequently, the trained model achieves an inference speedup exceeding 180,000×. Even accounting for the one-time data generation and training overhead, the total time for processing dozens of design iterations or parameter scans remains far below that of conventional simulation methods. This clearly demonstrates the model’s tremendous potential for enabling rapid design exploration and optimization.

Discussion

The effectiveness and advantages of physical information fusion

The proposed physics-constrained BP neural network achieves significantly better performance in thermal resistance prediction compared with traditional methods. This result validates the effectiveness of deeply integrating physical knowledge into the neural network architecture. Compared with the purely data-driven random forest model of Wang et al.32, this model provides more reliable predictions in data-scarce regions, avoiding outputs that may violate physical principles. This advantage mainly arises from the weight sign constraint mechanism introduced in the network’s first hidden layer, which ensures that the input-output relationships comply with known thermodynamic monotonicity laws, aligning with the principles of physics-informed machine learning proposed by Coenen et al33. Another key advantage of the model is its strong physical interpretability. Compared with CNN-based thermal image analysis methods by Kim et al.34 and Song et al.35, the approach offers clear practical benefits. The model can be built using only simulation data, greatly reducing data acquisition costs, while providing a direct mapping from design parameters to thermal performance. This feature makes it more suitable for integration into automated design workflows.

Concrete demonstration of model interpretability

A direct manifestation of the model’s physical interpretability is its ability to quantify and reveal the impact of key physical features on predictions. For instance, sensitivity analysis of the trained model shows that the interface quality factor has the most significant marginal effect on total thermal resistance. Holding other factors constant, a 10% improvement in this factor leads to an average reduction of approximately 8.2% in predicted total thermal resistance. This intuitively confirms the physical understanding that interfaces are critical thermal bottlenecks in heterogeneous integration. Similarly, for the maximum temperature, local power density contributes the most. Its interaction with the vertical effective thermal conductivity is consistently assigned high weights by the attention mechanism, indicating that the model correctly attributes local hotspots to an imbalance between “heat generation” and “vertical heat dissipation capability.”

Method scalability: cross-architecture and packaging potential

The model and its core framework are expected to generalize to other chip architectures or packaging technologies. Although the current training data are based on a specific 2.5D integration model, the learned physical principles—such as the dominant role of interface thermal resistance, the spatial coupling of three-dimensional heat flow, and the interactions among power, materials, and geometry—are generalizable beyond a particular packaging type. For example, applying this approach to chiplet-based 3D integration or fan-out wafer-level packaging requires only corresponding adjustments or additions to the feature system to describe unique structural parameters, such as chiplet spacing, redistribution layer thickness, or mold compound properties. The model’s physical constraint layer and attention-driven feature interaction layer do not require structural changes.

Handling model boundaries and edge cases

Despite strong performance on the test set, any data-driven model has its limits. For extreme edge cases outside the training distribution, prediction reliability may decrease. The training set covers designs with uniform to moderately non-uniform power distributions. Designs featuring highly localized, ultra-intense micro-hotspots—unseen during training—may lead the model to underestimate the maximum temperature. This is because the model has not learned such extreme nonlinear thermal saturation effects. The model also parameterizes the bottom packaging equivalent thermal resistance and top convective heat transfer coefficient.

However, introducing new cooling schemes outside the training range, such as extremely enhanced microchannel liquid cooling or thermoelectric cooling, may produce equivalent boundary parameters in extrapolated regions of the feature space, causing prediction deviations. The incorporation of physical constraints and multi-task learning enhances robustness against these challenges. Future work could proactively expand the training set to include broader parameter ranges for critical edge cases, or implement a simple uncertainty quantification mechanism. When new design input features lie far from the training feature centroids, the model can provide a confidence warning, suggesting that designers perform high-fidelity simulations for calibration.

Limitations in output dimensionality and design guidance

Although the model achieves high accuracy in predicting key scalar metrics, such as total thermal resistance and peak temperature, it has a notable practical limitation. It does not generate a complete three-dimensional temperature field. Consequently, when the model flags issues such as excessive thermal resistance or hotspot risk, designers cannot obtain direct spatial guidance for layout modification, heat dissipation optimization, or power redistribution. The output is low-dimensional and aggregated. It confirms the presence of a thermal problem but does not describe its spatial distribution or structural origin. Therefore, the model is better suited for rapid assessment and risk screening than for detailed thermal design refinement. Its ability to function as a standalone design tool remains limited.

Future work should address this limitation. A feasible solution is a two-stage prediction framework. The first stage uses the current lightweight model for fast evaluation and early warning. If a design is identified as problematic, a second-stage model can be triggered. For example, a conditional generative adversarial network or a physics-informed neural network (PINN) could reconstruct an approximate three-dimensional temperature field. This reconstruction would support hotspot localization and thermal flow visualization, thus providing spatial design guidance. A simpler alternative is to improve spatial interpretability within the existing architecture. Attention weights or gradient-based attribution methods can identify influential features. These features can then be linked to approximate regions in the chip layout. Such spatially informed explanations would enhance design insight while preserving computational efficiency.

Robustness analysis against noise in physical parameters

In practical chip design, key physical parameters such as material properties and interface thermal resistance often involve measurement uncertainty and manufacturing variability. Therefore, the model’s robustness to noise in these inputs is critical. The core design of this study inherently provides a degree of resilience. First, the physics-informed feature system itself has a denoising effect. Input parameters are transformed via physical formulas into features with clear thermal meaning, which naturally smooths out part of the random noise.

More importantly, the combination of weight sign constraints and the multi-task learning mechanism guides the model to learn robust physical relationships between inputs and outputs, rather than precisely fitting noisy numerical values. For example, even if a material’s thermal conductivity fluctuates due to measurement error, the model respects the hard-coded negative correlation between thermal conductivity and thermal resistance. As a result, the predicted resistance still changes along the correct physical trend, substantially preventing prediction errors or trend reversals caused by input noise. Of course, if input noise exceeds the parameter variation range encountered during training, prediction accuracy may degrade. Future work could explicitly enhance robustness by injecting controlled random noise into the training data or by integrating Bayesian neural networks to quantify the propagation of parameter uncertainty to prediction outputs. This would provide more reliable confidence intervals for engineering decision-making.

Comparison with alternative machine learning methods

The BP neural network was chosen as the core architecture for its excellent nonlinear mapping capability, flexible scalability, and ease of integrating physical priors. Other machine learning methods, such as gradient boosting machines (GBM) and Gaussian processes (GP), are effective and have distinct advantages. GBM often achieves high predictive accuracy and training efficiency on small to medium datasets, but its tree-based structure makes it difficult to directly embed continuous physical constraints. GP inherently provides uncertainty quantification, which is valuable for evaluating prediction confidence, but its computational complexity grows cubically with sample size, limiting scalability for large datasets. GP also does not naturally support structured multi-task learning or attention-driven feature interactions.

In contrast, the core goal of this study is to build an interpretable, physically consistent prediction system capable of handling high-dimensional, complex couplings. The fully connected, end-to-end differentiable structure of the BP neural network provides an ideal framework for systematically injecting physical knowledge—from feature engineering to network-layer constraints to multi-objective learning. While GBM or GP may perform well under specific data conditions, the proposed “deep physical-information integration” paradigm emphasizes structured neural network design to ensure physical interpretability and engineering applicability. Future work could explore hybrid modeling approaches, for instance combining the strong fitting capability of neural networks with the uncertainty estimation of Gaussian processes, to further enhance model information output and reliability.

Conclusion

Research contribution

This study proposes a novel approach that combines physical principles with neural networks to address the thermal resistance prediction problem in heterogeneous integrated chips. The study develops a BP neural network model incorporating heat transfer knowledge, enabling rapid and accurate prediction of chip thermal performance. The model surpasses traditional methods in accuracy and offers clear physical interpretability and practical utility. The main contributions of this work lie in methodological innovation and engineering application. Methodologically, the study departs from conventional machine learning approaches that treat physical problems as black boxes. Starting from fundamental heat transfer laws, a feature system is constructed to capture the complex relationships among chip structure, materials, power distribution, and heat dissipation. These physical insights are explicitly embedded into the network design, such as by imposing weight sign constraints to ensure predictions comply with thermodynamic laws, and by simultaneously learning multiple correlated targets, including thermal resistance and temperature metrics. This deep integration allows the model to fit complex data accurately while avoiding physically implausible predictions. This approach provides a novel physics-informed, data-driven modeling paradigm for chip thermal design. Its core value lies in creating a trustworthy, interpretable, and high-speed thermal digital twin to facilitate the intelligent automation of the design workflow.

Future works and research limitations

The physics-guided feature construction and network constraint embedding framework proposed in this study possesses methodological potential for transfer to other chip reliability prediction tasks. For instance, in electromigration prediction, a feature system can be constructed based on Black’s equation, including current density, temperature, and interconnect geometry, while constraining the physically known negative correlation between current density and time-to-failure within the network. For thermo-mechanical stress prediction, features can be derived from material constitutive equations and coefficients of thermal expansion, with positive correlations enforced between stress, temperature gradients, and Young’s modulus. Future research could extend this framework into a multi-physics reliability co-prediction platform, in which shared chip geometry and material features serve as inputs for parallel prediction of thermal, electrical, and mechanical reliability metrics.

Despite these advances, several limitations remain. The foremost is the model’s reliance on simulation data. Although simulations are carefully calibrated, real-world variability in material properties, interface conditions, and manufacturing processes introduces additional complexity. Therefore, experimental validation using measured data is essential to verify the model’s engineering effectiveness. Future studies could fabricate a series of representative test chips or packaging structures and, under controlled thermal testing conditions, measure steady-state junction temperatures or critical point temperatures under varying power levels. These measurements can then be systematically compared with model predictions to assess deviations and determine the model’s applicability boundaries in real physical systems. Additionally, cross-validation against high-fidelity commercial simulation software on identical designs would further confirm predictive accuracy. Currently, the model predicts steady-state thermal resistance, whereas in practical operation, chip temperatures are dynamic. Future work should extend the model to capture transient thermal behavior, which will also require transient experimental or simulation data for training and validation.

From a machine learning perspective, deeper integration of governing physical equations into the network represents an important frontier. For example, PINN incorporate partial differential equation residuals directly into the loss function, leveraging control equations rather than purely data-driven approaches36. Future studies could explore introducing chip thermal conduction governing equations as soft constraints during training to further enhance physically consistent predictions in data-sparse regions. Moreover, from a solver perspective, it may be possible to use a trained neural network to assist in inverting or solving equivalent thermal control equations via automatic differentiation. This represents a paradigm shift from “replacing solvers with neural networks” to “enabling neural networks and solvers to collaborate and discover new insights.” Achieving this will require addressing challenges in training stability and multi-physics coupling but points toward a promising direction for next-generation intelligent chip thermal design tools.

To further enhance the model’s engineering applicability, future work could also focus on: (1) Incorporating synthetic data generation or physics-based perturbation augmentation strategies to explicitly improve robustness against parameter measurement noise and manufacturing variations. (2) Applying systematic global sensitivity analysis methods (e.g., Sobol indices) to precisely quantify the influence of each input parameter on thermal performance, identify critical design “leverage points,” and guide design optimization. Such approaches have been successfully applied in materials and multi-physics modeling37,38. (3) Systematically extending the framework to a wider range of advanced packaging technologies through the construction of appropriate feature sets and transfer learning, thereby empirically validating and expanding the generality of the proposed method.

Engineering application prospects

The physics-informed thermal resistance prediction model developed in this study offers significant potential for enabling practical chip design workflows. Looking ahead, the model could be deeply integrated into electronic design automation (EDA) toolchains to establish an intelligent, “thermally aware” design environment.

  1. 1.

    Early-stage design exploration plugin: The model can be packaged as a lightweight software module or API and integrated into the architecture exploration and early physical design stages. Designers can receive real-time feedback on thermal performance while adjusting layouts, power budgets, or packaging schemes, enabling a rapid “design–thermal evaluation” iterative loop.

  2. 2.

    Driving thermal design rule checking: The model’s predictive capability can be translated into quantitative thermal design rules. During layout design, EDA tools can invoke the model to automatically screen for potential local hotspots or overall thermal violations, providing early warnings and shifting thermal management from late-stage verification to proactive mitigation.

  3. 3.

    Supporting system-level co-optimization: Within a chip–package–system co-design platform, the model can serve as an efficient thermal surrogate, working alongside electrical performance, timing, and reliability models. This enables multi-physics, multi-objective system-level automated optimization.

Through these avenues, the present work not only provides an advanced predictive tool but also promotes a new intelligent design paradigm driven by the integration of data and physics, offering a comprehensive methodology-to-practice solution to overcome the “thermal wall” challenges inherent in heterogeneous integration.