Adaptive edge-aware graph convolutional with multi-task learning for simultaneous prediction of material properties

Lu, Yunhua; Chen, Mingyue; Zhang, Qingwei; Zhang, Junan; Zhang, Chao; Xu, Shiai; Bi, Qiuyan

doi:10.1038/s41524-025-01917-7

Download PDF

Article
Open access
Published: 24 December 2025

Adaptive edge-aware graph convolutional with multi-task learning for simultaneous prediction of material properties

Yunhua Lu^1,2,
Mingyue Chen¹,
Qingwei Zhang¹,
Junan Zhang¹,
Chao Zhang³,
Shiai Xu^3,4 &
…
Qiuyan Bi³

npj Computational Materials volume 12, Article number: 49 (2026) Cite this article

2153 Accesses
Metrics details

Subjects

Abstract

The targeted design of functional materials often requires the concurrent optimization of multiple interdependent properties. For boron-doped graphene (BDG), both the band gap and work function critically influence performance in electronic and catalytic applications, yet existing machine learning (ML) approaches typically focus on single-property prediction and rely on hand-crafted features, limiting their generality. Here we present an adaptive edge-aware graph convolutional neural network with multi-task learning (AEGCNN-MTL) for simultaneous prediction of multiple material properties. On a DFT-computed BDG dataset of 2613 structures, AEGCNN-MTL achieved high accuracy (R² = 0.9905 for band gap and 0.9778 for work function), and under identical training budgets, outperformed representative single-task GNN baselines. When transferred to the QM9 benchmark, the framework delivered competitive performance across 12 diverse quantum chemical properties, demonstrating strong generalization capability. These results highlight the potential of AEGCNN-MTL as a scalable and accurate tool for high-throughput, multi-property screening and the data-driven discovery of multifunctional materials.

Atomistic Line Graph Neural Network for improved materials property predictions

Article Open access 15 November 2021

DenseGNN: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules

Article Open access 19 December 2024

Rapid and accurate predictions of perfect and defective material properties in atomistic simulation using the power of 3D CNN-based trained artificial neural networks

Article Open access 02 January 2024

Introduction

Graphene, a prototypical two-dimensional (2D) material, has attracted significant scientific interest since its isolation in 2004, owing to its exceptional electronic, mechanical, and thermal properties^1,2,3,4,5,6. However, its intrinsic zero band gap and chemically inert surface severely limit its direct application in semiconducting and catalytic systems^7,8. To address these limitations, a wide range of modification strategies such as heteroatom doping, vacancy engineering, and strain modulation have been extensively explored to modulate the intrinsic electronic and chemical properties of graphene^9,10,11. Among these strategies, boron doping has emerged as an effective approach for modifying the electronic structure and physicochemical characteristics of graphene, resulting in BDG with enhanced catalytic activity and electronic tunability^2,12,13,14. Notably, such structural modifications often lead to concurrent changes in multiple material properties. In particular, the band gap and work function are two critical electronic properties that govern the functionality of devices in fields such as nanoelectronics, photovoltaics, and catalysis^15,16. For instance, Ganose et al. demonstrated that doping SnO₂ could simultaneously modulate its band gap and work function, thereby enhancing its transparent conducting performance in photovoltaic systems¹⁷. Both properties are strongly correlated with the material’s electronic structure and bonding configuration, and can be simultaneously modulated by atomic-level structural perturbations. Thus, a comprehensive understanding of how multiple material properties co-evolve under atomic-scale perturbations is essential for the targeted design of graphene-based functional materials.

DFT has been extensively employed to calculate material properties with high accuracy, such as band gap, formation energies, adsorption energies, and mechanical properties^18,19,20. However, its high computational cost, particularly in large-scale or high-throughput screening scenarios, poses significant challenges to its scalability. ML has emerged as a powerful complementary paradigm that enhances the efficiency of materials discovery pipelines. ML models trained on data obtained from DFT simulations or experimental measurements have shown the ability to predict material properties at a fraction of the computational cost, with comparable accuracy^7,21,22,23. However, conventional ML models often rely heavily on manually engineered descriptors, which are designed based on domain-specific heuristics and may fail to capture the full complexity of atomic interactions and structural diversity.

Graph neural networks (GNNs) have recently gained significant attention in materials science due to their ability to represent materials as graphs, in which atoms are treated as nodes and chemical bonds as edges^24,25,26. This graph-based representation enables GNNs to effectively capture both structural and chemical information, facilitating the end-to-end modeling of complex relationships between atomic configurations and material properties²⁷. A pioneering example is the Crystal Graph Convolutional Neural Network (CGCNN) proposed by Xie et al.²⁸, which employed local neighbor-based message passing to accurately predict crystal properties such as formation energies, band gap, and elastic moduli, while maintaining model interpretability. However, CGCNN is limited in its ability to account for long-range interactions or global state information, which are essential for accurately modeling collective electronic correlation effects. MEGNet introduced global state vectors to encode external conditions such as temperature and pressure, thereby enhancing the model’s transferability across diverse material classes²⁹. Nevertheless, MEGNet relies on static and relatively simplistic edge features, which may be insufficient to model subtle interatomic interactions, thereby limiting its expressiveness. Due to the planar geometry of 2D materials, variations in bond angles and torsional deformations are inherently restricted, rendering the angular-dependent encodings in models like ALIGNN³⁰, DimeNet++³¹, and GemNet³² marginally informative. In addition, such models often incur substantial computational overhead due to the use of complex directional filters or hierarchical graph constructs. This overhead hinders scalability in high-throughput screening scenarios while yielding only limited performance improvements for low-dimensional systems with geometrically constrained topologies.

Despite the growing adoption of MTL in computational materials science, existing frameworks remain hindered by intrinsic challenges in scalability, cross-domain generalization, and structural representation. For instance, models such as MT-MD DimeNet++³³ and DeepFM³⁴, although effective within specific property domains, rely on task-specific architectural assumptions or manually engineered features, while H-CLMP³⁵ employs a composition-driven contrastive learning strategy without incorporating explicit structural information. Although these methods have demonstrated success in certain applications, they are inherently constrained by limited structural awareness and lack the capacity to jointly model multiple interdependent properties across materials with diverse atomic topologies. This reveals an unmet need for a generalizable and structure-aware framework that can effectively learn from the coupled evolution of multiple properties across complex atomic environments. Although all tasks share the same molecular-graph input, multi-task learning remains challenging: band gap is governed by local bond topology, whereas work function reflects long-range surface charge, so a shared representation drifts to a compromise that hurts both. Without task-specific decoders or balanced gradient terms, the network also suffers from conflicting gradients and unstable convergence. Misra et al. designed the Cross-stitch networks³⁶, they mitigate this by allocating separate backbones each task, but parameter count grows linearly with the number of properties and becomes prohibitive for data sets such as QM9.

These highlight the necessity for a unified and robust GNN framework that can achieve accurate and scalable multi-property prediction across materials with diverse structural and compositional complexity. In this work, we propose AEGCNN-MTL, a novel architecture that effectively encodes edge-level chemical features and captures long-range atomic interactions, and supports the simultaneous prediction of multiple interrelated material properties. The model is evaluated on both a newly constructed BDG dataset and the widely used QM9 benchmark, demonstrating superior accuracy, generalization ability, and scalability in multi-property prediction tasks.

The main contributions of this study are summarized as follows:

1.
We introduce a novel framework combining an Adaptive Edge-aware Graph Convolution and a Multi-head Attention Layer, which collaboratively models local chemical interactions and long-range atomic dependencies, thereby enhancing the representation of complex molecular and crystalline structures.
2.
We design a multi-task collaborative learning structure that enhances training stability and generalization by combining shared latent representations, task-specific decoder architectures, and synchronization-aware regularization strategies, and demonstrate robust cross-domain generalization on QM9 through correlation-guided task grouping across 12 quantum-chemical properties.
3.
We develop a high-quality BDG prediction dataset encompassing diverse doping concentrations and configurations, providing a comprehensive data foundation for advancing AI-driven performance modeling methods for two-dimensional materials.

Results

Results on the BDG dataset

This subsection details the performance evaluation of the AEGCNN-MTL model on the constructed BDG dataset, which comprises 2613 samples of two-dimensional graphene supercell structures featuring diverse boron doping configurations. Each sample incorporates DFT calculated values for band gap and work function, serving as dual-task targets. Figure 1 illustrates the regression performance of the AEGCNN-MTL model on the BDG dataset for two key electronic properties. In both subplots, the x-axis represents the DFT-calculated reference values, and the y-axis denotes the corresponding predictions made by the model. Blue and red markers indicate training and test samples, respectively. In Fig. 1a, the predicted band gap values show a well-aligned distribution along the identity line, demonstrating that the model can accurately approximate DFT-calculated band gaps across the full range of values. The dispersion of testing samples is limited, indicating effective generalization beyond the training set. The absence of outliers or skewed regions suggests that the model captures the influence of boron doping on the electronic band structure in a consistent manner. In Fig. 1b, a similarly strong correlation is observed for the prediction of work function. The data points follow the diagonal trend, with a slightly greater spread compared to the band gap subplot. Despite this, the alignment remains clear for both training and testing sets. This indicates that the model maintains reliable prediction performance even for surface-sensitive properties like the work function, which are often more challenging to model due to their sensitivity to local atomic environments.

**Fig. 1: Regression scatter plots of band gap and work function predictions by the AEGCNN-MTL model on the BDG dataset.**

To further assess the effectiveness of AEGCNN-MTL on the BDG dataset, we conducted comparative experiments against several widely used GNN baselines, including GATConv³⁷, GraphSAGE³⁸, GINConv³⁹, GCNII⁴⁰, and SchNet⁴¹. We did not include GemNet³², ALIGNN³⁰, or equivariant Transformers⁴² on the BDG benchmark because these architectures are primarily optimized for three-dimensional materials where angular or directional descriptors are essential. The chosen baselines span complementary paradigms and are widely used in both graph learning and materials informatics: GATConv represents attention-based message passing, GraphSAGE reflects inductive neighborhood sampling, GINConv is theoretically maximally expressive for distinguishing graph structures, GCNII addresses over-smoothing in deep GCNs, and SchNet is a chemistry-specific model that incorporates interatomic distances into message passing. All models were trained and evaluated under identical data splits, preprocessing, and optimization settings to ensure fairness. The results are summarized in Table 1. Although all baselines achieve good band gap accuracy, values of R2 greater than 0.90, their performance on work function scatters widely values of R2 from 0.75 to 0.97. GATConv attains an R-squared of 0.9898 for band gap prediction, whereas work function performance is much weaker for several baselines, with GCNII and SchNet reaching 0.8939 and 0.7472, respectively. AEGCNN-MTL simultaneously delivers the lowest MAE for both properties, 0.0031 eV for band gap and 0.0073 eV for work function, and records the highest average R2 of 0.9842, demonstrating that the proposed multi-task architecture yields genuine gains over strong single-task baselines.

Table 1 Performance of GNN models on the BDG dataset

Full size table

Furthermore, we evaluated the same architecture AEGCNN-MTL in two modes: single-task and multi-task. As shown in Table 2, for band gap the multi-task mode reduces MAE from 0.0049 to 0.0031 and RMSE from 0.0168 to 0.0131, with R² increasing from 0.9870 to 0.9905. For work function, MAE decreases from 0.0084 to 0.0073, while RMSE changes from 0.0260 to 0.0270 and R² from 0.9799 to 0.9778. These results provide a direct within-architecture comparison of single-task and multi-task training on the BDG dataset.

Table 2 Prediction performance comparison of single-task and multi-task models on the BDG dataset

Full size table

Generalization evaluation on the QM9 dataset

On the QM9 dataset, we evaluated the generalization capability of the proposed AEGCNN-MTL model. The 12 QM9 properties were divided into four groups according to their correlations and physical meanings (Fig. 2 for the correlation analysis and Table 3 for the grouping criteria).

**Fig. 2: Correlation and distribution analysis of molecular properties in the QM9 dataset.**

Table 3 Task grouping of 12 molecular properties in the QM9 dataset

Full size table

All experiments adhere to the same training/test splits as applied to the QM9 dataset, with performance evaluated using the MAE metric. Baseline models included several classical graph neural networks such as SchNet⁴¹, MEGNet²⁹, PhysNet⁴³, DimeNet⁴⁴, ET⁴⁵, R-MAT²⁶, and GeoT⁴⁶, as shown in Table 4. The performance of AEGCNN-MTL within each group is analyzed as follows. Group A includes isotropic polarizability (α) and heat capacity at 298 K (C_v), which are moderately correlated and primarily influenced by the overall molecular size and bonding flexibility. The model achieved MAEs of 0.052 ${a}_{0}^{3}$ for α and 0.030 cal/mol·K for C_v, comparable to those reported by strong baselines such as DimeNet and GeoT.

Table 4 Comparison of MAE for 12 molecular property prediction tasks on the QM9 dataset

Full size table

These results suggest that, group B contains four thermodynamic properties, atomization energy at 0 K (U₀), at 298 K (U), enthalpy (H), and Gibbs free energy (G), which are closely related and show extremely high pairwise correlations (Pearson r > 0.99). AEGCNN-MTL yielded MAEs of 10.3, 10.5, 10.2, and 10.7 meV for U₀, U, H, and G, respectively. These results are comparable to those of leading single-task models. Group C includes HOMO, LUMO, and the HOMO-LUMO energy gap (${\triangle }_{\varepsilon }$), which describe the frontier orbital characteristics of molecules. Among them, HOMO and Δ_ε exhibit a particularly strong Pearson correlation (r = 0.89). Within this group, AEGCNN-MTL achieved MAEs of 23.1 meV for HOMO and 32.1 meV for HOMO-LUMO energy gap, one of which is the lowest reported MAE across all baselines evaluated. As for LUMO, the model achieved an MAE of 22.0 meV, which outperforms the majority of baseline models, including SchNet (34.0 meV), MEGNet (31.0 meV), PhysNet (24.7 meV), and R-MAT (29.0 meV). These results demonstrate that, when the target properties are physically and statistically coupled, multi-task learning can lead to meaningful improvements in prediction accuracy through shared representation learning. Group D includes dipole moment (μ), electronic spatial extent (R²), and zero-point vibrational energy (ZPVE). These properties exhibit weak correlation with other tasks in the dataset, respectively. In preliminary trials, joint training of these tasks with others resulted in unstable convergence and reduced accuracy. As a result, AEGCNN-MTL employed its single-task mode for this group. Under this configuration, the model achieved MAEs of 0.053 D for μ, 0.832 ${a}_{0}^{3}$ for R², and 1.90 meV for ZPVE. While these values are slightly higher than those of some baselines, the outcomes remain within an acceptable range. The relatively lower performance in Group D, compared to Groups B and C, is primarily attributed to the absence of substantial inter-task correlations, which constrained the model’s ability to benefit from its multi-task learning architecture. This finding reinforces the importance of task-relatedness for effective shared representation learning and highlights the strength of AEGCNN-MTL in exploiting interdependent tasks for improved predictive performance.

Ablation Studies

To ascertain the individual contributions of key modules within the AEGCNN-MTL architecture, a series of ablation experiments were conducted on the BDG dataset. These experiments systematically constructed and compared five sub-models (M1-M4 and the full model) to evaluate the roles of the AEGC, MHA, and task-specific modeling components (Set2Set pooling and independent Feedforward Networks) in multi-task crystal property prediction.

Table 5 summarizes the module composition of each model. Model M1 serves as the baseline architecture, employing traditional GCN with mean pooling and a shared feedforward network. Model M2 leverages Set2Set pooling to generate richer and more expressive graph-level representations by adaptively aggregating node features across the entire graph structure. Model M3 integrates the AEGC module, which strengthens the network’s capability to capture local structural information by incorporating edge-weighted interactions that reflect the underlying chemical environments. Model M4 employs a distance-modulated attention mechanism to effectively capture long-range dependencies within the crystal structure, enabling the model to learn global relational patterns critical for property prediction. Finally, the complete AEGCNN-MTL framework integrates task-specific FFNs, allowing the model to generate personalized predictions tailored to each target property within the multi-task learning setting.

Table 5 Ablation model configurations and module composition

Full size table

Table 6 presents the performance results for the E_gap and WF regression tasks, evaluated using MAE, Root RMSE, and R2 metrics. The incorporation of Set2Set pooling (M2) significantly reduced the MAE by approximately 29% for E_gap and 26% for WF compared to M1, indicating the effectiveness of Set2Set pooling in capturing graph-level structural variations. The introduction of the AEGC (M3) led to further improvements, with MAE reduced by an additional 16% for E_gap and 5% for WF compared to M2, highlighting the benefit of edge-weighted interactions in local structural modeling. The attention mechanism (M4) resulted in noticeable enhancements in R2, especially for the band gap task, reflecting the advantage of modeling global dependencies for handling long-range structural effects. The full AEGCNN-MTL model, equipped with task-specific decoders, delivered the most exceptional overall performance, with MAE values reduced to 0.0031 for E_gap and 0.0073 for WF.

Table 6 Performance comparison of ablation models on band gap and work function prediction tasks

Full size table

Figure 3 presents the ablation analysis of AEGCNN-MTL on the BDG dataset, showcasing the progressive impact of individual architectural modules on the prediction performance of two key electronic properties: band gap and work function. The figure illustrates both MAE and RMSE values across five model configurations M1 to M4, and the complete AEGCNN-MTL using bar and line plots. Specifically, blue and orange bars represent MAE for band gap and work function, respectively, while green and purple bars depict their corresponding RMSE values. A clear downward trend in both MAE and RMSE is observed as the model evolves from M1 to the complete version, confirming that each module contributes to reducing prediction error.

**Fig. 3: Ablation results of AEGCNN-MTL on the BDG dataset.**

Discussion

The results of this study demonstrate that the proposed AEGCNN-MTL framework can achieve accurate and robust prediction of multiple material properties by combining adaptive edge-aware convolution, distance-modulated attention, and task-specific decoders within a unified multi-task architecture. On the proprietary BDG dataset, the model achieves state-of-the-art prediction performance for both band gap and work function, consistently surpassing representative GNN baselines. Moreover, a direct comparison of single-task and multi-task training within the same architecture reveals that shared representation learning provides a marked advantage for band gap prediction, while maintaining competitive accuracy for the more surface-sensitive work function. These findings suggest that multi-task learning can be particularly beneficial when multiple properties share common structural descriptors, whereas its gains may be limited when the target properties are governed by distinct local environments.

The evaluation on the QM9 dataset provides further insights into the scope of multi-task learning. For property groups with strong physical and statistical correlations, such as the thermodynamic energy set (U₀, U, H, G) and the frontier orbital set (HOMO, LUMO, HOMO-LUMO energy gap), the model benefits significantly from parameter sharing and yields lower prediction errors than strong single-task baselines. This indicates that the proposed architecture can effectively capture latent physical regularities underlying related properties and exploit them for joint learning. In contrast, properties with weak coupling, such as dipole moment (μ), electronic spatial extent (R²), and zero-point vibrational energy (ZPVE), exhibit limited benefits from joint optimization. In these cases, single-task training proves more stable and accurate, underscoring the importance of task-relatedness as a prerequisite for successful multi-task modeling. Importantly, the ability of AEGCNN-MTL to flexibly accommodate both joint and independent training modes within the same framework mitigates the risk of negative transfer and enhances its adaptability across diverse property sets.

The ablation studies further elucidate the contributions of individual architectural components. The introduction of Set2Set pooling improves graph-level representation quality by adaptively aggregating structural information across nodes. The adaptive edge-aware convolution strengthens local chemical environment modeling through edge-weighted interactions, while the distance-modulated attention mechanism enhances the capture of long-range structural dependencies. Finally, the task-specific decoders ensure that property-specific features are retained during multi-task optimization, preventing overly generic representations. The progressive improvement observed across these modules confirms that the superior performance of the full AEGCNN-MTL model arises from the synergistic integration of local, global, and task-specific learning mechanisms.

In this work, we proposed an AEGCNN-MTL to tackle the challenge of simultaneously predicting multiple material properties. The model introduces edge feature encoding and a dynamic attention-based weighting mechanism to effectively integrate topological and relational information during message propagation. In addition, AEGCNN-MTL incorporates task-specific Set2Set pooling and a gated attention fusion module to better capture inter-task dependencies, demonstrating strong predictive performance and interpretability. Comprehensive evaluations demonstrate the efficacy of AEGCNN-MTL. On our newly constructed BDG dataset, the model achieves state-of-the-art accuracy for the simultaneous prediction of band gap (R² = 0.9905) and work function (R² = 0.9778), significantly outperforming single-task models and highlighting the benefits of shared representation learning and task co-regularization. More importantly, AEGCNN-MTL exhibits exceptional generalization capability. When evaluated on the diverse QM9 benchmark encompassing 12 quantum chemical properties, it delivers performance comparable to or surpassing specialized state-of-the-art single-task GNNs across most tasks. Given the modularity and scalability of AEGCNN-MTL, it can be as a powerful tool for high-throughput screening and accelerated discovery of materials with tailored multi-functional properties. Future work will focus on enriching the edge encoding with explicit domain knowledge, exploring the model’s potential for inverse materials design, and integrating it with generative frameworks to enable the autonomous discovery of novel multi-property-optimized materials.

Methods

Construction of Boron-doped graphene dataset

To systematically investigate the effects of varying doping concentrations and configurations on the electronic properties of BDG, we constructed a high-quality dataset using high-throughput DFT calculations. The dataset encompasses a broad range of representative doping scenarios. All configurations were constructed using a 4 × 4 × 1 graphene supercell containing 32 carbon atoms, which was used as the structural template for boron substitution. As shown in Fig. 4, a diverse set of doping configurations was generated by substituting 1 to 8 carbon atoms with boron in unique, non-redundant combinations, resulting in doping concentrations ranging from approximately 3.1% to 25.0%. For each concentration level, multiple spatially distinct configurations were considered, resulting in 747 unique BDG structures. (a-h) Atomic models illustrate BDG configurations with different numbers of boron atoms (highlighted in green) substituted into a 4 × 4 × 1 graphene supercell (carbon atoms shown in purple). These structures exemplify the diversity of doping configurations and concentrations explored in this study.

**Fig. 4: Representative atomic structures of BDG with varying boron concentrations.**

To leverage the inherent high spatial symmetry of the graphene lattice, we further applied a symmetry expansion strategy by employing mirror operations to generate equivalent configurations. As illustrated in Fig. 5, this symmetry expansion substantially improves the dataset’s coverage of the configurational space. In Fig. 5a, original BDG configuration with multiple boron atoms (green) is substituted into a 4 × 4 × 1 graphene supercell (carbon atoms in purple). In Fig. 5b–d, new configurations obtained by applying different mirror symmetry operations (black dashed lines) to the original structure, illustrating how symmetry expansion increases the diversity of equivalent doping arrangements in the dataset. After eliminating redundant structures, we finalized a comprehensive BDG dataset comprising 2613 samples.

**Fig. 5: Generation of symmetry-equivalent boron-doped graphene (5 boron atoms) configurations via lattice symmetry operations.**

Computational methods and parameter settings

All first-principles calculations were performed using the Vienna Ab initio Simulation Package (VASP) within the framework of DFT. To avoid the spurious interaction between graphene layers, a vacuum thickness in the supercell was set to 15.0 Å. The planar lattice constant of graphene supercell was fixed at 4a₀, where a₀ = 2.468 Å for the calculated lattice constant of graphene. The interaction between ions and valence electrons was described by the projector-augmented wave method, and the exchange-correlation energy was treated with the Perdew Burke Ernzerhof (PBE) functional under the generalized gradient approximation. The plane-wave basis set was truncated with a kinetic energy cutoff of 450 eV, which was validated through convergence testing. The electronic self-consistent field (SCF) calculations were performed with an energy convergence criterion of 1 × 10⁻⁶ eV, while geometry optimization was conducted until the residual atomic forces were less than 0.01 eV/Å. Although the PBE functional is known to underestimate absolute band gap values, it has been shown to reliably reproduce the doping-dependent band gap trends in boron-doped graphene systems, serving as a good approximation for electronic property modulation in such materials⁴⁷.

The electronic band gap (E_gap) was computed for all structures as the energy difference between the conduction band minimum (CBM) and valence band maximum (VBM), extracted from the calculated electronic band structures:

$${E}_{{gap}}={E}_{{CBM}}-{E}_{{VBM}}$$

(1)

where E_CBM and E_VBM denote the energies of the valence band maximum and conduction band minimum, respectively.

The work function (Φ) was computed via the vacuum slab method as the difference between the vacuum electrostatic potential and the Fermi level:

$$\Phi ={V}_{{vacuum}}-{E}_{F{ermi}}$$

(2)

where V_vacuum is the average vacuum potential in the region perpendicular to the graphene surface (obtained through planar averaging), and E_Fermi is the Fermi level from the SCF calculation.

The calculated band gap and work functions across different doping concentrations agree well with previous reports^2,3,4. Figure 6 presents the relationships between the number of dopant atoms and the calculated properties. Figure 6a reveals a general increasing trend in band gap with rising boron concentration, suggesting a strong correlation between doping level and electronic structure modulation. Notably, configurations containing 4 and 8 dopants exhibit the broadest band gap distributions, with maximum values exceeding 1.2 eV, indicating a pronounced sensitivity to atomic configuration at higher doping levels. Figure 6b shows a monotonic increase in work function from approximately 4.45 eV to over 5.5 eV with increasing boron content, consistent with the p-type doping effect and associated upward shift in surface potential. The observed variability in both band gap and work function arises from the diversity of local atomic environments, which modulate π-electron delocalization and influence the surface electrostatic potential. While both the band gap and work function originate from the electronic structure, they represent fundamentally different physical quantities. The band gap is a bulk property defined by the energy difference between the conduction and valence-band edges, whereas the work function is surface-sensitive and defined relative to the vacuum electrostatic potential. This distinction is reflected in the dataset: the band gap exhibits large variance at intermediate doping levels, while the work function increases nearly monotonically with boron concentration. The contrasting statistical behaviors demonstrate that their simultaneous prediction is a nontrivial multi-task challenge, rather than a simple outcome of their shared electronic origin.

**Fig. 6: Boxplot of band gap and work function under different boron doping numbers.**

QM9 dataset

The QM9 dataset is a widely used benchmark in quantum chemical modeling and machine learning, comprising approximately 134,000 small organic molecules containing up to nine heavy atoms (C, N, O, and F). All molecular structures and their quantum chemical properties were calculated using DFT methods⁵. The dataset offers comprehensive molecular property data, including structural parameters, orbital energies, thermodynamic functions, and electronic responses, covering over 20 physicochemical properties. It serves as a crucial foundation for research in molecular representation, property prediction, transfer learning, and multi-task learning^6,48,49. In this work, we select 12 representative quantum chemical properties from QM9 as prediction targets for multi-task learning. To better understand the interdependencies among prediction targets and to inform subsequent model design, we applied Pearson correlation analysis to evaluate the statistical relationships between quantum chemical properties in the QM9 dataset.

As illustrated in Fig. 2, the upper triangle presents pairwise Pearson correlation coefficients, color-coded from blue (strongly negative) to red (strongly positive), indicating the degree of linear association between each property pair. The diagonal plots show the marginal distributions of each property, while the lower triangle displays scatter plots that capture nonlinear trends and distributional patterns. This tri-level visualization facilitates a detailed assessment of both global statistical dependencies and local variances within the dataset. Several notable correlation patterns are evident. Thermodynamic properties, including u₀, u₂₉₈, h₂₉₈, and g₂₉₈, exhibit near-perfect linear correlations (r ≈ 1.00), reflecting their shared physical basis in atomization energy computations. Likewise, strong correlations are observed among frontier orbital features LUMO, HOMO, and the energy gap ∆ϵ, with the LUMO, ∆ϵ pair showing the highest correlation (r = 0.89). In contrast, properties such as dipole moment (μ), heat capacity (Cᵥ), and electronic spatial extent (R²) display weak or negligible correlations with other variables (|r | < 0.3), suggesting limited information redundancy. These observations indicate that highly correlated tasks are suitable for grouped multi-task training, leveraging shared representations to enhance learning efficiency. Conversely, tasks exhibiting weak correlations or large disparities in numerical scale may introduce interference or imbalance when trained jointly, and are thus better handled independently.

Guided by this correlation analysis and further informed by the physical interpretation, units, and statistical ranges of each property, we divided the 12 prediction targets into four semantic groups, as outlined in Table 3. Each task group reflects a specific category of physical or chemical properties to facilitate multi-task modeling and analysis. Group A consists of α (isotropic polarizability) and Cᵥ (heat capacity), which are both response properties despite their weak coupling. Group B encompasses thermodynamic quantities (u₀, u₂₉₈, h₂₉₈, g₂₉₈), which form a strongly interdependent set. Group C includes orbital energy-related features (HOMO, LUMO, ∆ϵ), and Group D comprises tasks such as μ, R², and ZPVE, which demonstrate minimal inter-task correlation and exhibit considerable differences in physical nature and data magnitude. Consequently, Group D tasks were treated as standalone single-task models to avoid performance degradation due to negative transfer. This grouping strategy ensures semantic coherence within task clusters and provides a structured basis for heterogeneous multi-task learning.

AEGCNN-MTL was designed to address the challenges of predicting multiple properties in molecular and crystalline systems. The architecture of the proposed AEGCNN-MTL model encompasses feature embedding, graph structure encoding, multi-task decoders, and a joint optimization mechanism. As illustrated in Fig. 7, the AEGCNN-MTL framework follows a “shared encoder + task-specific decoder” design principle and is composed of three functional stages: feature embedding, task-shared representation learning, and task-specific prediction heads.

**Fig. 7: Overall architecture and modular decomposition of the AEGCNN-MTL model.**

In the first stage, atomic graphs constructed from molecular or crystalline structures are processed through a multi-channel embedding module that encodes atomic number, degree, spatial position, and bond type into learnable feature vectors. These embeddings form the input to the task-shared backbone. The second stage comprises two complementary modules: an Adaptive Edge-aware Graph Convolution module and a Multi-head Attention Layer module. The AEGC module dynamically integrates edge features and node features using an attention mechanism, enabling refined local message passing sensitive to chemical bonding environments. Following this, the MHA module captures long-range dependencies via an attention mechanism, which is further modulated by interatomic distances encoded through radial basis functions. Together, these two components enable joint modeling of local and global structural contexts, yielding expressive graph-level representations. Finally, the third stage employs task-specific decoders to address the heterogeneity among different prediction targets. Each property is processed through a dedicated pathway comprising a feedforward network, a Set2Set pooling layer, and a linear output head. This design ensures that the model can disentangle shared representations while accommodating task-specific feature distributions. To mitigate optimization conflicts across tasks, a synchronized loss function is introduced to balance task-specific gradients and stabilize multi-task training.

Adaptive Edge-aware graph convolution

In the AEGCNN-MTL model, each molecular graph is constructed based on atomic connectivity, where nodes represent atoms and edges denote interatomic bonds. The encoding strategy for nodes and edges adopted in AEGCNN-MTL is illustrated in Fig. 8. Node features consist of three key components: atomic type (z), degree (d), and positional encoding (PE). These features are transformed into ddd-dimensional embeddings using multilayer perceptrons (MLP), as defined below:

$${h}_{z}={Em}{b}_{z}\left(z\right),{h}_{d}={Em}{b}_{d}\left(d\right),{h}_{{pe}}={MLP}\left({PE}\right)$$

(3)

where Emb_z and Emb_d denote the embedding functions for atomic type and degree, respectively, and MLP represents a multilayer perceptron applied to the positional encoding.

**Fig. 8: Architecture of the AEGC module.**

The initial representation of node i was obtained by summing these embeddings:

$${x}_{i}^{\left(0\right)}={h}_{z}+{h}_{d}+{h}_{{pe}}$$

(4)

We introduce the AEGC layer to enhance the model’s capacity to capture fine-grained atomic interactions and bond-specific dependencies within molecular graphs. As a core component of the GNN backbone, AEGC integrates node-level semantics with edge-level structural cues. This fusion enables the generation of localized, context-aware representations critical for downstream property prediction.

The AEGC module enables unified message passing by incorporating both node and edge features, as illustrated in Fig. 8. Distinctively, the AEGC layer incorporates an attention mechanism that adaptively adjusts the weights assigned to messages exchanged between neighboring atoms. In this process, node features are projected into query (Q), key (K), and value (V) vectors, while edge features are embedded via a learnable MLP and incorporated into the key and value branches. This dynamic attention weighting enables the model to better capture variations in chemical environments and to learn context-sensitive local topological representations. By explicitly modeling edge-aware interactions, the AEGC module enhances the network’s representational capacity for capturing bond-order-sensitive patterns and structural anisotropy.

Specifically, for each edge (i, j) in the graph, node features were first projected into Q, K, and V vectors:

$${q}_{i}={W}_{q}{x}_{i},{k}_{j}={W}_{k}{x}_{j},{v}_{j}={W}_{v}{x}_{j}$$

(5)

If an edge feature e_ij existed, it was further embedded via an MLP to enrich the key and value representations:

$${\widetilde{k}}_{j}={k}_{j}+{{MLP}}_{{edge}}\left({e}_{{ij}}\right),{\widetilde{v}}_{j}={v}_{j}+{{MLP}}_{{edge}}\left({e}_{{ij}}\right)$$

(6)

The attention weight α_ij between nodes i and j was computed using scaled dot-product attention, followed by normalization across all neighbors of node:

$${\alpha }_{{ij}}={softmax}\left(\frac{{q}_{i}\cdot {\widetilde{k}}_{j}}{\sqrt{d}}\right)$$

(7)

The aggregated message for node i was a weighted sum of its neighbors’ values:

$${m}_{i}=\mathop{\sum }\limits_{j\in N\left(i\right)}{\alpha }_{{ij}}\cdot {\widetilde{v}}_{j}$$

(8)

The final representation of node i is updated via a nonlinear transformation, residual connection, and batch normalization:

$${x}_{i}^{{\prime} }={BN}\left({x}_{i}+{{MLP}}_{{update}}\left({m}_{i}\right)\right)$$

(9)

To model the evolution of edge features, AEGC included an edge update path. The updated edge embedding depends on both node features and the original edge attributes:

$${e}_{{ij}}^{{\prime} }={{MLP}}_{e-{update}}\left(\left[{x}_{i}||{x}_{{\rm{j}}}||{e}_{{ij}}\right]\right)$$

(10)

Through this design, the AEGC module enables structure-aware modeling via node-edge interactions, enhancing expressive power and physical consistency on complex crystal structures and heterogeneous molecular graphs, thereby supporting accurate property prediction in multi-task learning.

Multi-head attention layer

To capture long-range and higher-order interactions, we introduce a multi-head attention layer informed by interatomic geometric distances. The input node representations are projected into multiple attention heads, each of which linearly transforms the features into separate Q, K, and V vectors. This layer was constructed based on a Transformer framework⁵⁰. This module addresses the receptive field constraints of traditional graph neural networks, which rely primarily on local neighborhood aggregation. By explicitly incorporating interatomic distance information, the model is able to capture global interactions between distant atoms, which is particularly important for modeling phenomena such as conjugated systems, long-range polarization, and electronic delocalization.

As illustrated in Fig. 9, the MHA module takes the node-level representations x_local as input and first projects them into Q, K, and V vectors through three independent linear transformations. These vectors are computed in parallel across attention heads, enabling the model to learn diverse interaction patterns from different relational perspectives. Simultaneously, the Euclidean distances between atom pairs are embedded using radial basis functions (RBF) to produce continuous, multi-scale geometric descriptors. These RBF-based embeddings are incorporated into the attention mechanism to modulate interaction strengths based on interatomic proximity. Within each attention head, distance-aware key and value vectors are used to compute attention weights, which are further modulated by a cosine-based cutoff function. This mechanism enables the model to prioritize short-range interactions while preserving physically meaningful long-range dependencies. The resulting weighted messages are aggregated and fused across heads via concatenation or summation. To ensure training stability and information retention, the output is further processed via residual connections and a feedforward MLP, followed by layer normalization.

**Fig. 9: Architecture of the multi-head attention layer.**

Specifically, for each node input feature x_i, linear transformations were applied to generate the Q, K, and V vectors representations, as previously defined in Eq. (5). Interatomic Euclidean distances d_ij were expanded using radial basis functions (RBF), yielding a rich multi-scale representation:

$${f}_{{ij}}={RBF}\left({d}_{{ij}}\right)$$

(11)

The extended RBF vectors were further mapped into the key vectors and value vectors by nonlinear transformations:

$$\Delta {k}_{{ij}}={{MLP}}_{k}\left({f}_{{ij}}\right),\Delta {v}_{{ij}}={{MLP}}_{v}\left({f}_{{ij}}\right)$$

(12)

The attention weight α_ij was modulated by a distance-decay function and a learnable scalar coefficient α, resulting in:

$${\alpha }_{{ij}}^{\left(h\right)}=Si{LU}\left(\frac{{q}_{i}^{\left(h\right)}\cdot \left({k}_{j}^{\left(h\right)}+\Delta {k}_{{ij}}^{\left(h\right)}\right)}{\sqrt{{d}_{h}}}\right)\cdot C\left({d}_{{ij}}\right)\cdot \alpha$$

(13)

where C(d_ij) is a cosine-shaped distance cutoff function used to suppress the influence of long-range interactions, and α is a learnable parameter that controls the sensitivity to distance decay.The output of each attention head was computed as the weighted aggregation of values from neighboring nodes:

$${m}_{i}^{\left(h\right)}={\sum }_{j\in N\left(i\right)}{\alpha }_{{ij}}^{\left(h\right)}\cdot \left({v}_{j}^{\left(h\right)}+\Delta {v}_{{ij}}^{\left(h\right)}\right)$$

(14)

The outputs from all attention heads were concatenated and linearly projected to obtain the final node embedding:

$${y}_{i}={W}_{o}\cdot C{oncat}\left({m}_{i}^{\left(1\right)},\ldots ,{m}_{i}^{\left(n\right)}\right)$$

(15)

Task-specific decoders and synchronous loss optimization

To address the inherent heterogeneity across different physicochemical property prediction tasks, AEGCNN-MTL adopts a modular decoding strategy coupled with a synchronized loss optimization mechanism. This design enables the model to effectively decouple shared and task-specific representations and to adapt to property-specific learning objectives. Importantly, even when tasks share the same input modality, they can depend on different aspects of the structure-property relationship. The band gap reflects bulk electronic structure and is sensitive to local bonding environments and the periodic potential, whereas the work function is defined as the difference between the vacuum electrostatic level and the Fermi level and is strongly influenced by surface dipoles and near-surface charge redistribution. Consequently, their predictive signals may arise from partially distinct cues. Using task-specific decoders helps retain task-dependent information while still enabling efficient sharing of common representations.

As illustrated in Fig. 10, the diagram depicts the decoding pipeline and the synchronized loss optimization strategy adopted in the AEGCNN-MTL framework. The left section shows the task-specific prediction branches, which include a feedforward network (FFN), Set2Set pooling, property-specific projection layers, and output heads corresponding to each target property. The right section displays the loss computation module, which comprises mean squared error (MSE) loss evaluation, task-wise statistics estimation, and adaptive loss weighting based on learnable coefficients β and γ.

**Fig. 10: Architecture of the multi-task decoding and loss coordination module.**

In the left module, task-specific decoders receive shared graph representations x_global as input. Each decoder comprises a stack of feedforward networks, which perform task-dependent transformation of node features:

$${x}_{i}^{\left(l+1,\,t\right)}={{FF}}_{t}^{l}\left({x}_{i}^{\left(l,\,t\right)}+{x}_{i}^{{global}}\right)$$

(16)

After stacking multiple layers of feedforward networks, each task-specific output layer used Set2Set pooling to generate task-specific graph-level representations, followed by a linear layer and task-specific output head:

$${g}_{t}=L{inear}\left(S{et}2S{et}\left(\left({\left\{{x}_{i}^{\left(L,t\right)}\right\}}_{i=1}^{N}\right)\right)\right),\,{\hat{y}}_{t}={H{ead}}_{t}\left({g}_{t}\right)$$

(17)

This architecture ensures that both shared and task-specific patterns are effectively captured, thereby enhancing the adaptability and generalization of the model in multi-task learning scenarios.

To promote balanced and coordinated learning, we designed a multi-task synchronous loss function with three components:

$${L}_{{total}}=\bar{L}+\beta \cdot D+\gamma \cdot N$$

(18)

where:Task-averaged loss:$\bar{L}=\frac{1}{T}{\sum }_{t=1}^{T}{L}_{t}$ evaluates the average loss over all tasks and reflects overall performance; Deviation penalty:$D=\frac{1}{T}{\sum }_{t=1}^{T}|{L}_{t}-\bar{L}|$ penalizes large discrepancies in learning progress across tasks and suppresses task-wise instability; Non-descent penalty:$N=\frac{1}{T}{\sum }_{t=1}^{T}Re{LU}({L}_{t}{-}{L}_{t}^{prev})$ discourages loss rebounds over epochs to promote stable convergence; Dynamic weights: $\beta ={softplus}\left(\log {\beta }^{{\prime} }\right),\gamma ={softplus}\left(\log {\gamma }^{{\prime} }\right)$ are learnable non-negative parameters that adaptively balance the contributions of the three terms.

By simultaneously regulating task equilibrium, synchronizing gradient behavior, and leveraging dynamic adaptability, this loss function effectively mitigates inter-task interference and negative transfer during optimization, thereby improving collaborative generalization across all tasks.

Hyperparameter Settings

To comprehensively evaluate the effectiveness of the proposed model in material property prediction tasks, we performed systematic experiments on two distinct datasets: BDG and QM9, which vary in scale and task characteristics. All models were developed using the PyTorch Geometric library and trained a single NVIDIA RTX3090 GPU.

The BDG dataset, comprising 2613 samples, was employed to evaluate the model’s performance under small-sample conditions, with a focus on the concurrent prediction of two pivotal electronic properties E_gap and WF in two-dimensional boron-doped graphene materials. Given the limited sample size, a lightweight model architecture was selected to mitigate the risk of overfitting. The specific hyperparameter configurations for the BDG dataset are delineated in Table 7.

Table 7 Hyperparameter Settings for BDG and QM9 Experiments

Full size table

In contrast, the QM9 dataset served as a large-scale benchmark for molecular property modeling. In this study, tasks were grouped based on Pearson correlation coefficients among properties, and the neural network depths were tailored for each task group to reflect the varying degrees of modeling complexity and representational needs. The principal hyperparameter settings for the QM9 dataset are summarized in Table 7. All listed hyperparameters were used for experiments on the BDG and QM9 datasets. “Model Dimension” refers to the feature dimension of the neural network layers. In the QM9 multi-task experiments, the number of network layers was set to 6 for task groups A and D, 12 for task group B, and 4 for task group C. All experiments used the AdamW optimizer with the specified learning rate scheduling and weight decay strategies.

Data availability

The QM9 dataset used in this study is publicly available and can be accessed from the Quantum Machine repository (https://figshare.com/collections/Quantum_chemistry_structures_and_properties_of_134_kilo_molecules/978904). The boron-doped graphene (BDG) dataset generated during the current study is not publicly available due to project-specific restrictions, but can be obtained from the corresponding author upon reasonable request.

Code availability

The code used for data processing and model training in this study is not publicly available but can be obtained from the corresponding author upon reasonable request.

References

Novoselov, K. S. et al. Electric field effect in atomically thin carbon films. Science 306, 666–669 (2004).
Article CAS PubMed Google Scholar
Bhushan, R. et al. Microwave graphitic nitrogen/boron ultradoping of graphene. npj 2D Mater. Appl. 8, 19 (2024).
Article CAS Google Scholar
Chen, X., Yao, Y., Yao, H., Yang, F. & Ni, J. Topological p + i p superconductivity in doped graphene-like single-sheet materials BC₃. Phys. Rev. B 92, 174503 (2015).
Article Google Scholar
Dieb, M., Hou, Z. & Tsuda, K. Structure prediction of boron-doped graphene by machine learning. J. Chem. Phys. 148, 241716 (2018).
Article Google Scholar
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Article CAS PubMed PubMed Central Google Scholar
Nandi, S., Vegge, T. & Bhowmik, A. MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods. Sci. Data 10, 783 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jiang, X. et al. Interpretable machine learning applications: A promising prospect of AI for materials. Adv. Funct. Mater. 35, 2507734 (2025).
Article CAS Google Scholar
Koepfli, S. M. et al. Controlling photothermoelectric directional photocurrents in graphene with over 400 GHz bandwidth. Nat. Commun. 15, 7351 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, W. et al. Heteroatom-doped flash graphene. ACS Nano. 16, 6646–6656 (2022).
Article CAS PubMed Google Scholar
Thiemann, F. L., Rowe, P. N., Zen, A., Müller, E. A. & Michaelides, A. Defect-dependent corrugation in graphene. Nano Lett. 21, 7231–7237 (2021).
Article Google Scholar
Guo, M., Qian, Y., Qi, H., Bi, K. & Chen, Y. Experimental measurements on the thermal conductivity of strained monolayer graphene. Carbon 157, 185–190 (2020).
Article CAS Google Scholar
Yu, X. et al. Boron-doped graphene for electrocatalytic N₂ reduction. Joule 2, 1610–1622 (2018).
Article CAS Google Scholar
Chowdhury, S., Jiang, Y., Muthukaruppan, S. & Balasubramanian, R. Effect of boron doping level on the photocatalytic activity of graphene aerogels. Carbon 128, 237–248 (2018).
Article CAS Google Scholar
Liu, J., Liang, T., Tu, R., Lai, W. & Liu, Y. Redistribution of π and σ electrons in boron-doped graphene from DFT investigation. Appl. Surf. Sci. 481, 344–352 (2019).
Article CAS Google Scholar
Belfar, A. Simulation study of the a-Si:H/nc-Si:H solar cells performance sensitivity to the TCO work function, the band gap and the thickness of i-a-Si:H absorber layer. Sol. Energy 114, 408–417 (2015).
Article CAS Google Scholar
Zhang, Q. et al. Bandgap renormalization and work function tuning in MoSe₂/hBN/Ru(0001) heterostructures. Nat. Commun. 7, 13843 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ganose, A. M. & Scanlon, D. O. Band gap and work function tailoring of SnO₂ for improved transparent conducting ability in photovoltaics. J. Mater. Chem. C. 4, 1467–1475 (2016).
Article CAS Google Scholar
Formalik, F., Shi, K., Joodaki, F., Wang, X. & Snurr, R. Q. Exploring the structural, dynamic, and functional properties of metal–organic frameworks through molecular modeling. Adv. Funct. Mater. 34, 2308130 (2024).
Article CAS Google Scholar
Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).
Article PubMed Google Scholar
Zhang, Y. et al. Screening dual variable-valence metal oxides doped calcium-based material for calcium looping thermochemical energy storage and CO₂ capture with DFT calculation. J. Energy Chem. 107, 170–182 (2025).
Article CAS Google Scholar
Fedik, N. et al. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat. Rev. Chem. 6, 735–755 (2022).
Google Scholar
Huang, G., Huang, F. & Dong, W. Machine learning in energy storage material discovery and performance prediction. Chem. Eng. J. 492, 152294 (2024).
Article CAS Google Scholar
Wang, C. et al. A guided review of machine learning in the design and application for pore nanoarchitectonics of carbon materials. Mater. Sci. Eng. R. Rep. 165, 101010 (2025).
Article Google Scholar
Reiser, P. et al. Graph neural networks for materials science and chemistry. Commun. Mater. 3, 93 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fung, V., Zhang, J., Juarez, E. & Sumpter, B. G. Benchmarking graph neural networks for materials chemistry. npj Comput. Mater. 7, 84 (2021).
Article CAS Google Scholar
Omee, S. S. et al. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns 3, 100491 (2022).
Article PubMed PubMed Central Google Scholar
Ciano, G., Rossi, A., Bianchini, M. & Scarselli, F. On inductive–transductive learning with graph neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9335498 (2022).
Article Google Scholar
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Article CAS PubMed Google Scholar
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).
Article CAS Google Scholar
Choudhary, K. & DeCost, B. Atomistic line graph neural network for improved materials property predictions. npj Comput. Mater. 7, 185 (2021).
Article Google Scholar
Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115. https://doi.org/10.48550/arXiv.2011.14115.
Gasteiger, J., Becker, F. & Günnemann, S. GemNet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 34, 13268–13280 (2021).
Google Scholar
Liang, C. et al. Multi-task mixture density graph neural networks for predicting catalyst performance. Adv. Funct. Mater. 34, 2404392 (2024).
Article CAS Google Scholar
Feng, M. et al. Finding the optimal CO₂ adsorption material: Prediction of multi-properties of metal–organic frameworks (MOFs) based on DeepFM. Sep. Purif. Technol. 302, 122111 (2022).
Article CAS Google Scholar
Kong, S., Guevarra, D., Gomes, C. P. & Gregoire, J. M. Materials representation and transfer learning for multi-property prediction. APL Mater. 8, 021409 (2021).
CAS Google Scholar
Misra, I., Shrivastava, A., Gupta, A. & Hebert, M. Cross-stitch networks for multi-task learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 3994–4003 (2016).
Veličković, P. et al. Graph Attention Networks. ICLR. https://doi.org/10.48550/arXiv.1710.10903 (2018).
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30, 1024–1034 (2017).
Google Scholar
Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? ICLR. https://doi.org/10.48550/arXiv.1810.00826.
Chen, M., Wei, Z., Huang, Z., Ding, B. & Li, Y. Simple and Deep Graph Convolutional Networks (GCNII). ICML. https://doi.org/10.48550/arXiv.2007.02133.
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet – A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Article PubMed Google Scholar
Thölke, P. & De Fabritiis, G. TorchMD-NET: Equivariant transformers for neural-network-based molecular potentials. arXiv 2202.02541. https://doi.org/10.48550/arXiv.2202.02541.
Unke, O. T. & Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Article CAS PubMed Google Scholar
Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. Proc. 8th Int. Conf. Learn. Represent. (ICLR). https://doi.org/10.48550/arXiv.2003.03123.
Thölke, P. & Fabritiis, G. D. TorchMD-NET: Equivariant transformers for neural network-based molecular potentials. Mach. Learn. Sci. Technol. 3, 015022 (2022).
Google Scholar
Kwak, B. et al. GeoT: a geometry-aware transformer for reliable molecular property prediction and chemically interpretable representation learning. ACS Omega 8, 39759–39769 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rani, P. & Jindal, V. K. Designing band gap of graphene by B and N dopant atoms. RSC Adv. 3, 802–812 (2013).
Article CAS Google Scholar
Sheshanarayana, R. & You, F. Knowledge distillation for molecular property prediction: A scalability analysis. Adv. Sci. https://doi.org/10.1002/advs.202503271 (2018).
Chang, J. & Zhu, S. MGNN: Moment graph neural network for universal molecular potentials. npj Comput. Mater. 11, 55 (2025).
Article CAS Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. https://doi.org/10.48550/arXiv.1706.03762.

Download references

Acknowledgements

This research is supported by the key project of science and technology research program of Chongqing Education Commission of China (KJZD-K202501109), the National Natural Science Foundation of China (U22A20434), Scientific research foundation of Ministry of Industry and Information Technology of the People's Republic of China (TC220A04A-43).

Author information

Authors and Affiliations

School of Artificial Intelligence, Chongqing University of Technology, Chongqing, China
Yunhua Lu, Mingyue Chen, Qingwei Zhang & Junan Zhang
Salt Lake Chemical Engineering Research Complex, Qinghai University, Xining, China
Yunhua Lu
School of Chemical Engineering, Qinghai University, Xining, China
Chao Zhang, Shiai Xu & Qiuyan Bi
School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, China
Shiai Xu

Authors

Yunhua Lu
View author publications
Search author on:PubMed Google Scholar
Mingyue Chen
View author publications
Search author on:PubMed Google Scholar
Qingwei Zhang
View author publications
Search author on:PubMed Google Scholar
Junan Zhang
View author publications
Search author on:PubMed Google Scholar
Chao Zhang
View author publications
Search author on:PubMed Google Scholar
Shiai Xu
View author publications
Search author on:PubMed Google Scholar
Qiuyan Bi
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L. contributed to data curation, preprocessing, and feature engineering. M.C. performed model implementation, experiments, and result analysis. Q.Z. conceived and supervised the study, designed the methodology, and was responsible for project administration and funding acquisition. J.Z. assisted in algorithm development and experimental validation. C.Z. contributed to data interpretation and visualization. S.X. provided guidance on material science applications and contributed to the theoretical framework. Q.B. assisted with manuscript editing and literature review. All authors discussed the results, contributed to the writing of the manuscript, and approved the final version for submission.

Corresponding author

Correspondence to Qingwei Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lu, Y., Chen, M., Zhang, Q. et al. Adaptive edge-aware graph convolutional with multi-task learning for simultaneous prediction of material properties. npj Comput Mater 12, 49 (2026). https://doi.org/10.1038/s41524-025-01917-7

Download citation

Received: 11 August 2025
Accepted: 05 December 2025
Published: 24 December 2025
Version of record: 23 January 2026
DOI: https://doi.org/10.1038/s41524-025-01917-7