Systematic softening in universal machine learning interatomic potentials

Deng, Bowen; Choi, Yunyeong; Zhong, Peichen; Riebesell, Janosh; Anand, Shashwat; Li, Zhuohan; Jun, KyuJung; Persson, Kristin A.; Ceder, Gerbrand

doi:10.1038/s41524-024-01500-6

Download PDF

Article
Open access
Published: 10 January 2025

Systematic softening in universal machine learning interatomic potentials

npj Computational Materials volume 11, Article number: 9 (2025) Cite this article

12k Accesses
55 Citations
13 Altmetric
Metrics details

Subjects

Abstract

Machine learning interatomic potentials (MLIPs) have introduced a new paradigm for atomic simulations. Recent advancements have led to universal MLIPs (uMLIPs) that are pre-trained on diverse datasets, providing opportunities for universal force fields and foundational machine learning models. However, their performance in extrapolating to out-of-distribution complex atomic environments remains unclear. In this study, we highlight a consistent potential energy surface (PES) softening effect in three uMLIPs: M3GNet, CHGNet, and MACE-MP-0, which is characterized by energy and force underprediction in atomic-modeling benchmarks including surfaces, defects, solid-solution energetics, ion migration barriers, phonon vibration modes, and general high-energy states. The PES softening behavior originates primarily from the systematically underpredicted PES curvature, which derives from the biased sampling of near-equilibrium atomic arrangements in uMLIP pre-training datasets. Our findings suggest that a considerable fraction of uMLIP errors are highly systematic, and can therefore be efficiently corrected. We argue for the importance of a comprehensive materials dataset with improved PES sampling for next-generation foundational MLIPs.

Learning from models: high-dimensional analyses on the performance of machine learning interatomic potentials

Article Open access 20 July 2024

Geometry-enhanced pretraining on interatomic potentials

Article 05 April 2024

Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling

Article Open access 26 February 2024

Introduction

Artificial intelligence (AI) is increasingly shifting the paradigm of scientific discovery to accelerate research and solve real-world scientific challenges¹. While ab-initio quantum mechanical simulation methods, such as density functional theory (DFT), offer the theoretical foundation to investigate material and chemical science problems at the atomic scale, their computational demands limit their applicability in both spatial and temporal scales. Recent advancements in machine learning interatomic potentials (MLIPs)^2,3 have enabled the opportunity to scale up quantum mechanical methods to million atoms simulations such as water, copper⁴, and biomolecules⁵.

Alongside improvements in atomic environment descriptors and graph neural networks that enhance the expressivity of MLIP models^3,6, universal machine learning interatomic potentials (uMLIPs) have demonstrated another avenue by taking advantage of pre-training on large and comprehensive material datasets^{7,8,9,10,11,12,13}. These uMLIPs enable out-of-box atomic modeling covering the entire periodic table as well as providing robust machine-learning foundations for fine-tuning downstream tasks. While uMLIPs hold considerable promise, a critical challenge lies in their ability to reliably generalize to complex and diverse chemical environments, particularly those that deviate significantly from the pre-training data distribution. Several recent benchmark efforts have tested the uMLIPs’ ability to identify stable materials¹⁴, surface energies¹⁵, lattice relaxations and vibrational properties¹⁶, etc. A systematic understanding of the ability of uMLIPs to extrapolate to common atomic-modeling tasks, especially those with atomic environments that are out of distribution (OOD), remains an open question with implications for their real-world applicability in material discovery and design.

In this work, we systematically investigate the extrapolative capabilities of three foundational uMLIPs – M3GNet⁷, CHGNet⁸, and MACE-MP-0¹⁰ (hereafter referred to as MACE) – across a diverse suite of material modeling tasks, including surface energies, defect energies, solid-solution energetics, phonon vibrational modes, and ion migration barriers. Across all benchmark tests for all uMLIP models, our analysis shows consistent underpredictions of energies and forces. To quantify and explain thse underpredictions, we investigate the behavior of uMLIPs in high-energy transition states and reveal a systematic potential energy surface (PES) softening behavior in the uMLIPs as illustrated in Fig. 1. We attribute the PES softening issue to the combination of the biased sampling of near-ground-state configurations in the uMLIP pre-training datasets¹⁷, which primarily comprise DFT ionic relaxation trajectories near PES local energy minima. The uMLIPs trained predominantly on small energy and force labels suffer from distribution shifts and experience increased but systematic prediction errors in high-energy PES regions which are important for the kinetics of rare events, such as ion migrations, and for the energy of defects with undercoordinated atoms, such as vacancies and surfaces.

**Fig. 1: Potential energy surface softening in uMLIPs.**

We demonstrate that this systematic PES softening can be effectively mitigated by fine-tuning with a minimal amount of data points. We find that a simple linear correction derived from a single DFT reference label is sufficient to remove much of the PES softening issue in a specific chemical system of interest, significantly enhancing the performance and robustness of uMLIPs for a given application. We rationalize this observation by arguing that a considerable amount of prediction errors in pretrained uMLIPs are highly systematic, and therefore can be efficiently corrected by modifying a limited fraction of the model parameters with only a small amount of data augmentation. Our work provides a theoretical foundation for the widely observed data-efficient performance boosts achieved by fine-tuning uMLIPs and highlights the advantage of atomic modeling with large and comprehensive foundational AI models.

Results

Machine learning interatomic potentials framework

MLIPs approximate the total energy of a system as a sum of atomic contributions, each dependent on the positions and chemical identities of the atoms in their local environment:

$$E=\mathop{\sum }\limits_{i}^{n}\phi ({\{{\overrightarrow{r}}_{j}\}}_{i},{\{{C}_{j}\}}_{i}),\quad {\overrightarrow{f}}_{i}=-\frac{\partial E}{\partial {\overrightarrow{r}}_{i}}$$

(1)

ϕ is a learnable function that maps the set of position vectors ${\{{\overrightarrow{r}}_{j}\}}_{i}$ and chemical species ${\{{C}_{j}\}}_{i}$ of the neighboring atoms j to the energy contribution of atom i. The force ${\overrightarrow{f}}_{i}$ acting on each atom is calculated as the derivative of the total energy with respect to its position. In the training process, the parameters of the MLIP model are optimized to minimize the discrepancy between the predicted energies and forces and the corresponding reference values from the DFT labels.

The design of the atomic environment descriptor function ϕ is crucial to developing accurate and efficient MLIPs. To capture the essential physics and chemistry of the system, ϕ should be informative and satisfy proper translational and rotational symmetries. This is typically achieved through the use of graph representations¹⁸, high-order interactions^6,7, the preservation of SE(3)/E(3)-equivariance using tensor products based on spherical harmonics^3,10, Fourier basis¹⁹, or Cartesian-coordinates-based atomic density expansion²⁰. Additionally, the incorporation of chemical information, such as charge²¹ or atomic magnetic moment⁸, has been shown to enhance the predictive power of MLIPs.

In addition, recent efforts have been made to pre-train MLIPs on large open-sourced materials datasets such as the Materials Project¹⁷, which primarily consists of DFT ionic relaxation trajectories of various compounds and elements across the periodic table. While initial benchmarks have shown the promising applicability of universal MLIPs in predicting bulk materials energetics^14,16, their performance and limitations in OOD atomic configurations require more benchmarking as the energy of these configurations is often directly relevant for practical materials behavior. The following sections present a systematic assessment of the uMLIPs’ ability to extrapolate to low-symmetry OOD atomic configurations that are crucial for atomic-modeling tasks.

Surface energies

Surface energies play an important role in determining the stability and morphology of materials, especially at the nano-scale where the surface-to-volume ratio is significant. Accurate prediction of surface energies is crucial for various applications such as catalysis²², corrosion²³, adhesion²⁴, nucleation²⁵, and thin film growth²⁶. In this section, we assess uMLIP’s performance in predicting surface energies, which are calculated as

$${\gamma }_{{\rm{surface}}}=\frac{{E}_{{\rm{slab}}}-{E}_{{\rm{bulk}}}}{2{A}_{{\rm{slab}}}},$$

(2)

where E_slab/E_bulk are the relaxed energies of the slab/bulk structures that can be obtained independently using DFT or MLIP methods in a large supercell approach. A_slab denotes the surface area of the slab.

he energies of 147 surfaces with multiple Miller indices of 29 elements and binary compounds are evaluated, including Si, Cu, Al₂O₃, LiF, ZnS, etc. The DFT and uMLIP calculation details are listed in the Methods section and Supplementary Table 1 lists the full set of elements and compounds with their corresponding prediction errors. Figure 2a shows the uMLIP surface energies versus the DFT surface energies for the three uMLIPs tested, where MAE stands for the model’s mean absolute error. MACE exhibits relatively better performance compared to CHGNet and M3GNet, achieving a MAE of 0.032 eV/Å². All three uMLIPs consistently underestimate the surface energies compared to DFT, except for a few predictions made by MACE and M3GNet. The trend in our result is consistent with the recent evaluation of Focassio et al.¹⁵ on the surface energies of elemental crystals.

**Fig. 2: uMLIP performance on surfaces, defects, and solid solutions.**

Defect energies

We also analyze the accuracy of uMLIPs in calculating point defect energies, which is crucial for understanding a material’s vacancy formation²⁷, dopabilities²⁸, mechanical properties²⁹, and ionic mobilities³⁰. Specifically, we perform benchmarks for point defects including vacancies, interstitials, and anti-site defects. In metallic systems, the point defect energy can be calculated from the energy of a defect structure referenced to the corresponding perfect structure and the external chemical potential of the species added or removed

$${E}_{i}^{{\rm{point}}\,{\rm{defect}}}={E}_{i}^{{\rm{defect}}}-{E}^{{\rm{bulk}}}-\Sigma {\mu }_{i}\Delta {N}_{i},$$

(3)

where μ_i is the chemical potential of the species i forming the defect and ΔN_i is the number of atoms of i added (+ 1) or removed (− 1) at the defect. To avoid additional errors in the defect energy introduced by the equilibrium chemical potentials determined from the phase diagram, we used the energy of the pure elemental phases μ_i for this benchmark section. This choice does not affect the benchmark, but only shifts the value of the point defect energy.

Figure 2b presents a comparison between uMLIP and DFT defect energies for 129 point defects across 32 chemical systems, including AlNi, CaSn₃, Cu₃Au, NaPb₃, NaAg₄, etc. Calculation details are listed in the Methods section and the complete list of materials is provided in Supplementary Table 2. Interestingly, the uMLIP calculated defect energies are mostly underestimated, similar to the trend observed in the surface energies in Fig. 2a.

Solid-solution energetics

Thermodynamic modeling of solubility in solid-state systems such as metallic alloys³¹ and high-entropy ceramics³² requires accurate energetics to capture the dependence of the energy on substitutional arrangements^33,34. This dependence, relative to k_BT, determines the temperature scale at which mixing or order/disorder transitions occur³⁵. In this section, we use the mixing of Ca²⁺ and Mg²⁺ in the Ca_xMg_2−xO₂ rocksalt as an example to examine the ability of uMLIPs to predict the behavior of the solid solution. The end members of the system, MgO and CaO are both rocksalts and the phase diagram has been previously studied both experimentally³⁶ and computationally³⁷.

We explore different possible Ca-Mg cation arrangements in the rocksalt at various CaO-MgO ratios and evaluate the corresponding energies (see Methods). These 0K formation energies are shown in Fig. 2c, where each point corresponds to the energy of a specific Ca-Mg cation arrangement at a given Ca fraction. The predicted formation energies from all uMLIPs are positive, consistent with the low T immiscibility of CaO and MgO³⁷. We observe a systematic underprediction of the mixing energies and the energy difference between the uMLIPs and DFT at a specific Ca fraction. Among the uMLIPs, CHGNet’s predictions closely approximate those of DFT, followed by those of M3GNet and MACE. We note that an underprediction of the formation energy would lead to an underestimation of the solubilization temperature in phase diagram calculations and an overestimation of the solubility limits at a given temperature³⁵.

Ion migration barriers

The migration barrier for an ion to move through a crystal structure forms the basis for evaluating the diffusion constants in a material and as such is critical to understand its functional or processing behavior. An accurate description of ion mobility is directly relevant in various applications, such as lithium-ion conductors for battery technologies³⁸, and proton conductors for fuel cells³⁹, etc. Because the migration barrier is determined by the extrapolation of the energy along the path between two stable sites, it is by definition also a poorly sampled configuration when uMLIPs are only fitted to local equilibrium configurations.

We employ uMLIPs and DFT to conduct a comprehensive assessment of 470 Mg-ion migration pathways in 110 distinct structures including oxides, halides, and sulfides⁴⁰. For all ion migration paths, we generate an initial guess of the minimum energy pathway based on the DFT charge density⁴¹ and subsequently evaluate it with the approximate nudged elastic band (ApproxNEB) method⁴²(see method section). ApproxNEB is different from regular NEB in that it does not perform a relaxation of the pathway but solely evaluates the energy along the predefined trajectory⁴². Figure 3a presents the energy landscape of one Mg ion migration path in ${{{\rm{V}}}_{2}{{\rm{O}}}_{3}({{\rm{SO}}}_{4})}_{2}$ (Materials Project ID mp-28207), where the energies of the initial and final images have been referenced to 0. The kinetically resolved activation (KRA) migration barrier is defined as the highest energy along the reaction coordinate after the reference, which presents the relevant migration barrier in kinetic theories⁴³. While all three uMLIPs are shown to capture the overall shape of the DFT energy along the path, we observe systematic energy underpredictions of uMLIPs resulting in underpredictions of KRA migration barriers. MACE achieves the best performance with a 0.34 eV MAE against DFT, followed by CHGNet (0.39 eV) and M3GNet (0.49 eV). The parity plot of uMLIP barriers vs. DFT barriers is provided in Supplementary Fig. 1 and shows that the majority of uMLIP barriers are underpredicted, similar to the result of the surface and defect benchmarks. Figure 3b presents the distribution of the energy barrier difference between uMLIPs and DFT, from which we observe that all three uMLIPs show negative shifts in barrier predictions.

**Fig. 3: Underpredicted ion migration barriers in DFT and uMLIPs.**

Phonon Properties

Accurate descriptions of vibrational properties and phonon spectra are crucial for understanding a wide range of material characteristics, such as thermodynamic⁴⁴, mechanical⁴⁵, and thermal transport properties⁴⁶. Predicting phonon frequencies represents a stringent test of the MLIPs’ ability to capture the subtle energy and force landscape around equilibrium configurations. Compared to the previous modeling benchmarks, phonon properties assess the uMLIPs’ accuracies in PES regions that are closer to training distribution. In this section, we benchmark the uMLIPs’ performance on phonon frequencies by applying the finite displacement method⁴⁷ to calculate harmonic phonons.

Figure 4a shows an example of uMLIP and DFT calculated phonon frequency on CsF (Materials Project ID mp-1784), where the solid red lines represent DFT phonon frequencies without non-analytical corrections (NAC) taken from the PhononDB^48,49 and the dashed lines show uMLIP phonon frequencies. While the overall shapes of the phonon bands are generally well-captured by the uMLIPs, a systematic reduction of the vibrational frequencies (i.e., the frequency magnitude difference of the branches at a given wave vector) is observed across all models compared to the DFT reference, particularly for the optical modes predicted by M3GNet (blue dashed line). The reduced vibrational frequency is an indication that the forces described by uMLIPs are systematically lower than the DFT values.

**Fig. 4: Softened phonon vibration modes in uMLIPs.**

To quantify this softening behavior, we evaluate the ratio between the maximum phonon frequencies predicted by the uMLIPs and the corresponding DFT value for a diverse set of 229 materials (see Supplementary Table 3) from the PhononDB^48,49. The distribution of these ratios is shown in Fig. 4b, which demonstrates that the majority (>90%) of materials are found to be softened in uMLIPs compared to DFT, with the phonon frequency underpredicted. The result suggests that both the energy and force described by uMLIPs are softened for almost all chemical systems.

PES softening scale for high-energy states

By definition, a machine learning model with only random errors should have its prediction error distribution centered at 0. However, all three uMLIPs are shown to not satisfy such criterion in both OOD atomic configurations and PES regions that are closer to equilibrium like phonons. These consistent underpredictions can come from two possible causes: (1) Systematic underpredictions of energies and forces that soften the PES. (2) Ionic relaxations that optimize the output energy towards lower values due to modified PES minima created by random errors. While the latter cause arises from random errors that are challenging to eliminate, the former cause arises from systematic errors that can be easily quantified by separating out relaxations and directly benchmarking uMLIPs against DFT at fixed checkpoints in the PES.

To quantify the extent of systematic softening in uMLIP PES, we propose the softening scale parameter, which is calculated as the linear fitted slope of uMLIPs vs. DFT forces in a material. As an example, Fig. 5a shows an exemplary parity plot of uMLIPs vs. DFT forces from sampled high-energy OOD atomic configurations of ${{{\rm{Li}}}_{6}{{\rm{Zn}}}_{2}{{\rm{In}}}_{2}({{\rm{IO}}}_{3})}_{16}$ (derived from Materials Project ID mp-973966). These OOD atomic configurations are sampled away from the energy minimum in the PES, by applying high-temperature molecular dynamics (MD) simulations(see method section). The corresponding forces of each sampled state are subsequently evaluated using static calculations with uMLIPs and DFT.

**Fig. 5: The PES softening scale from shifted force predictions.**

The systematic PES softening effect shows up in Fig. 5a by the clockwise tilting of the distribution away from the diagonal. The slope of this distribution, extractable by linear regression, can be defined to be the PES softening scale. In Fig. 5a, we provide the fitted slopes and force MAEs of the three uMLIPs. When the softening scale is 1, the MLIP’s force distribution aligns with the diagonal, indicating that the curvature of the MLIP-PES systematically agrees with DFT with only random errors present. A softening scale smaller than 1.00 indicates a systematic underprediction of energy and forces that leads to an overall smoother PES curvature as illustrated in Fig. 1.

To investigate how broadly across chemistry the PES softening occurs, we collected 1000 different compounds from the WBM materials dataset by Wang et al.⁵⁰, which was generated by elemental substitution of Materials Project compounds and therefore contains only crystalline structures that are not included in the pre-training dataset of the three uMLIPs. For each of these compounds, 10 high-energy states away from the PES energy minimum are sampled with a 1000K MD simulation, and the softening scale is extracted from a linear fit with uMLIP and DFT forces. Figure 5b presents the distribution of the PES softening scale for these 1000 WBM compounds, and shows that for the majority (>90%) of the compounds, the softening scale is smaller than 1 for all 3 uMLIPs we have tested. This result indicates the systematic softening behavior is universal across all chemical systems in current uMLIP models.

Data-efficient fine-tuning

Within a local PES region of a specific chemical system, the softening issue appears as a tilted distribution of forces in the parity plot as shown in Fig. 5a for mp-973966. Intuitively, one can rotate the distribution back to the diagonal to reset the softening scale to 1 hereby reducing the prediction error. In this scenario, we define cMAE as the linearly corrected mean absolute error if the uMLIP force distributions were rotated back to align with the diagonal. As shown in Fig. 5a, the cMAEs are considerably reduced from the original MAE from 0.220/0.190/0.176 eV/Å to 0.184/0.162/0.155 eV/Å for M3GNet/CHGNet/MACE, respectively. This observation suggests that a considerable fraction of force errors from uMLIP are likely to be systematic and can be easily corrected locally to reduce force errors.

Mathematically, rotating the force distribution is equivalent to multiplying every force value by a scalar, which can be realized by multiplying the MLIP energy by a scalar term

$$\begin{array}{rcl}{E}^{{\rm{corr}}}&=&c* {\bf{MLIP}}(\{{\overrightarrow{r}}_{i}\},\{{C}_{i}\}),\\ {f}_{i}^{{\rm{corr}}}&=&-\frac{\partial {E}^{{\rm{corr}}}}{\partial {\overrightarrow{r}}_{i}}=c* {f}_{i}.\end{array}$$

(4)

It is noted that the above formulation is equivalent to fine-tuning a MLIP by fixing all model weights except a scalar linear layer, which essentially modifies only the scalar parameter c in Equation (4). Since only a scalar parameter requires modification, only one single label (1 force component) is needed for the training. Since the crystal cell typically consists of multiple atoms, with each atom carrying three force components, a single training structure already contains enough information for the proposed linear correction. In the left part of Fig. 6a, we show the result when pre-trained CHGNet is fine-tuned with an added hypothetical scalar linear layer (see Methods), trained on only a single high-energy configuration of mp-973966. The test forces, which originate from the same set of atomic arrangements in Fig. 5a, are labeled in orange and the training forces from the single additional configuration are labeled in red. The linear corrected CHGNet exhibits a softening scale of 0.965 and a force MAE of 0.166 eV/Å, improved from 0.859 and 0.190 eV/Å in the pre-trained CHGNet as shown in Fig. 5. The estimated cMAE is 0.162 eV/Å when the softening is corrected to 1, which is close to the force MAE of 0.166 eV/Å that is achieved by fine-tuning the scalar linear layer. Hence, a linear correction with one high-energy OOD configuration indeed operates as a rotation of the force distribution back to the diagonal, substantially eliminating the systematic softening error and considerably reducing the force MAE.

**Fig. 6: Data efficient fine-tuning demonstrated by linear correction.**

We propose that the cMAE derived from the linear correction serves as an approximate lower bound for the expected error reduction from fine-tuning uMLIPs. In Supplementary Fig. 5, we show that the errors in materials modeling tasks, such as surface calculation, can be similarly reduced after a linear correction with one label. Consequently, the proposed linear correction serves as a baseline for fine-tuning error reduction in uMLIPs. In practice, a typical fine-tuning process involves hundreds and thousands of structure labels that can further reduce the MAE of the model. We tested fine-tuning the pretrained CHGNet by optimizing all model parameters with 10 training structures, and the resulting force parity plot is shown on the right of Fig. 6a. Compared to the linear correction with only one configuration, the right panel in Fig. 6a shows that a very small dataset of 10 training structures further reduces the MAE to 0.125 eV/Å, which proves the linear-corrected cMAE approximates a safe lower bound. By statistically evaluating the distribution of force MAEs and cMAEs for the 1000 WBM structures, we present their fine-tuning error-reduction lower-bounds in Fig. 6b. From the observed distribution, considerable error reduction (~15%) can be adequately achieved with a simple linear correction.

These results suggest a theoretical explanation for the commonly observed data-efficient performance boost that is achievable by fine-tuning foundational uMLIPs compared to training randomly initialized MLIPs. The data efficiency arises from the observation that a significant part of the MAEs in pre-trained uMLIPs are highly systematic, which can be efficiently amended by optimizing a fraction of model parameters with a small amount of data. The linear correction demonstrates the fine-tuning mechanism in the extreme case of one structure label and one trainable parameter. In practice, one doesn’t necessarily need to fine-tune only a linear layer since a realistic fine-tuning dataset is far larger and richer than a single structure label. In Supplementary Figs. 3 and 4, we present a comparison between the force error in the fine-tuned CHGNet models to those trained from scratch. The result demonstrates that the fine-tuning process can achieve significantly higher data efficiency compared to training MLIPs from scratch. Our theory of systematic error correction provides a mechanistic explanation of these advantages of foundational MLIPs.

Discussion

The design and discovery of novel materials raises the need for advanced simulation tools capable of efficiently and accurately describing the intricate details of atomic interactions. MLIPs offer a potential solution to bridge the gap between quantum mechanical accuracy and affordable computation cost by learning and emulating complex atomic interactions. Recent work on pre-training foundational MLIPs with comprehensive material datasets has opened up the possibility for out-of-box use of robust universal interatomic potentials^7,8,10,12,13.

Unlike DFT, MLIPs cannot by default be expected to perform well in a configurational space where they have not been trained. We therefore benchmark the performance of three uMLIPs for multiple modeling tasks including surfaces, defects, solid-solution energetics, phonon vibration modes, ion migration barriers, and more general high energy states. These states are under-represented in the widely-used pre-training dataset^7,8,17 that only consists of bulk crystalline materials. For the properties tested in this work, we observe a universal softening of the PES, characterized by the uMLIPs’ underprediction of energies and forces.

The uMLIP datasets are primarily drawn from Materials Project¹⁷ ionic relaxation trajectories and are therefore largely distributed around the energy minima of the PES. Consequently, the uMLIPs are exposed to a limited range of atomic configurations and force gradients, leading to difficulties in accurately capturing the energy landscapes and steep gradients associated with OOD states and processes like ion migrations and phase transformations.

We found similar signs of softening in the published literature, though less attention was dedicated to an in-depth examination of the softening issue. Pandey et al.⁵¹ and Bartel⁵² presented an extrapolation issue arising from a distribution shift when training a CGCNN¹⁸ energy predictor with ICSD data⁵³. The CGCNN model trained with only experimental stable materials experienced a six-fold increased prediction MAE when applied to hypothetical crystal structures in the Materials Project¹⁷. Furthermore, the Google DeepMind’s GNoME uMLIP exhibited pronounced softening tendencies when trained on the M3GNet dataset⁷, as evidenced in Supplementary Information of ref. ¹², similar to our observation in Fig. 5a. After being trained on the expanded dataset of 89 million structures, the softening issue in GNoME was shown to be mitigated but not fully eliminated, which is shown in Supplementary Figs. S34–S37 from ref. ¹². These examples underscore the universality of the PES softening issue across various models and datasets, highlighting the importance of the systematic benchmark and analysis undertaken by our study to address this challenge.

Another possible cause of underpredicted energies arises from modified PES minima. Even if the uMLIP would be unbiased with only random errors, uMLIP ionic relaxation may further relax the atoms into positions that are at lower energy, resulting in underpredictions of the relaxed energies. To illustrate this, consider an unbiased uMLIP. When comparing the result of static energy calculations the uMLIP energy predictions would be unbiased compared to the DFT energies. However, when ionic relaxations are performed with the uMLIP on these DFT-relaxed structures, the relaxation may further displace the atoms away from the DFT minimum and reduce the energy. As a result, the expected uMLIP relaxed energy may show a bias to be lower than the DFT energy, even when only random error is present. Compared to the systematic softening discussed in the current manuscript, these erroneous relaxations are much more challenging to resolve as multiple factors are involved: optimization algorithm, relaxation convergence criterion, PES training and validation error, etc. As most physical properties are determined by the energy difference of various configurations, this error arising from the “re-optimization” of the atomic position will affect both states from which the energy difference is derived. For example, in an NEB calculation, the re-optimization error can affect the initial state of the ion as well as the saddle point. Nonetheless, if the random error is larger in high-energy configurations, one expects the re-optimization error to be larger in the high-energy configurations, effectively showing up as softening.

The observed limitations of current uMLIPs raise questions about the effect of model size and expressive capacity on their ability to capture the intricate details of the PES⁵⁴. The MACE model with 4.69 Million parameters, which is around 11 times the size of the CHGNet and 21 times the size of M3GNet, shows improved MAE and decreased softening compared to the smaller uMLIPs. The better performance of larger uMLIPs aligns with the previous study by Frey et al.⁵⁵ on the scaling of model performance as a function of MLIP capacity. The observed relationship between model capacity and performance prompts further inquiry into the extent to which the parameter size of current uMLIPs influences the PES softening issue, and whether the softening can be minimized by scaling to a larger, yet reasonable model size without expanding the dataset. In Supplementary Figs. 6 and 7, we show the distribution of softening scale and force MAEs for two additional uMLIPs: CHGNet-matgl and M3GNet-matgl, which were also pre-trained using Materials Project database. The CHGNet-matgl with increased model size and M3GNet-matgl with enhanced sampling⁵⁶ demonstrate decreased softening effect and improved force predictions. Furthermore, previous investigations by Xu et al.⁵⁷ showed the extrapolation behavior of neural networks tends to be linear, which aligns with our observation of underpredicted curvatures in the uMLIP PES. While the scope of current work does not explicitly investigate the effect of model size and design, further studies could provide a better explanation of the number of model parameters needed to describe a universal potential energy surface.

Fortunately, we demonstrate the softening issue can be effectively resolved by including a minimal amount of high-energy OOD training points in fine-tuning. Our result not only provides a guideline to avoid softening issues when applying uMLIPs to atomic modeling, but more importantly, derives an explanation for the frequently observed data-efficient fine-tuning of foundational MLIPs. Our result suggests that a significant portion of the MAE in uMLIPs is highly systematic and therefore can be efficiently corrected by a small amount of data. In addition to the robustness of uMLIPs that has been acknowledged as an advantage obtained from pre-training^8,12, our study elucidates another benefit of fine-tuning foundational MLIPs – the data-efficient systematic error correction that is unavailable for training a randomly initialized MLIP. Our study serves as a guideline for researchers attempting to fit interatomic potentials for their systems of interest.

In summary, our work presents an in-depth analysis of the softening effect of uMLIPs observed in a series of materials benchmarks, from which we provide guidelines for the fine-tuning effects of uMLIPs. With the observed limitation of current uMLIPs, we advocate the need for an improved next-generation dataset for training foundational atomic models, and more investigation in the role of model complexity. Despite significant efforts dedicated to model design and training strategies, less emphasis has been placed on constructing comprehensive and well-curated open-source materials datasets⁵⁸. Most current foundational models still rely on datasets that were not originally generated for machine learning purposes. Apart from diversifying the chemical space, our findings highlight the importance of ensuring a comprehensive sampling of the PES in generating a reliable MLIP dataset. We believe a next-generation foundational atomic dataset with improved sampling will be pivotal for the development of MLIP and atomistic simulations.

Methods

uMLIP versions

The table below shows the details and versions of the uMLIPs tested Table 1.

Table 1 uMLIP Model Specifications

Full size table

Materials modeling tasks

For surface energy calculations, stoichiometric and symmetric slabs are generated with up to a maximum Miller index of 2 in three directions. Minimum slab thickness of 10 Å and minimum vacuum length of 10 Å are used for DFT to ensure convergence of surface energy⁵⁹. When relaxing the slab, in-plane lattice vectors are fixed to their bulk value. The ionic relaxations are converged to a maximum interatomic force criteria of 0.05 eV/Å for all uMLIPs.

For defect energy calculations, defects in elemental phases as well as binary metallic compounds are considered. The defect structures are fully relaxed and referenced to the bulk energy. The off-stoichiometric defect energies (ex: vacancy defect) are referenced to the chemical potential of the pure elemental phase, instead of any chemical potential corresponding to multi-phase equilibria in the phase diagram. This is done deliberately to avoid additional errors associated with calculating the phase diagram using the uMLIPs. For all uMLIPs, the ionic relaxations are converged while a maximum interatomic force is 0.05 eV/Å.

For solid-solution calculation in Ca_xMg_2−xO₂, we randomly select different Ca-Mg orderings (up to 52 number of configuration) at each Ca concentration and evaluate the energy of the configuration with ionic relaxation with DFT or uMLIPs.

For phonon calculations, we use the phonopy workflow as implemented in atomate2⁶⁰ with relaxation convergence and supercell settings identical to those used in Batatia et al.¹⁰. The DFT referenced data are taken from the PhononDB^48,49. We restrict benchmarking materials without magnetism and U-corrections. Moreover, we removed the non-analytic corrections (NAC) from the PBEsol phonons which are derived from the Born effective charges as these are unavailable from uMLIPs which have no concept of electronic structure. In practice, a future hybrid uMLIP-DFT workflow could perform a single DFT static at the uMLIP relaxed structure to obtain Born charges and post-hoc apply non-analytic corrections to the uMLIP phonon spectrum. However, such a hybrid workflow while necessary in practice, would not affect the results of this benchmark concerned specifically with the ML-obtainable parts of the spectrum.

The ion migration barrier DFT data are collected from the work of Rutt et al.⁴⁰, in which the ApproxNEB algorithm⁴² was used to evaluate Mg²⁺ ion migration barriers. The key difference between ApproxNEB with regular NEB⁶¹ is that ApproxNEB relaxes each image along the migration path independently, while NEB relaxes the migration path collectively. In the ApproxNEB method, an initial guess of the ion migration path is interpolated based on the charge density of the host structure. The energies associated with suggested image structures are calculated by the constrained relaxation that fixes the moving ion and lattice vectors. The ApproxNEB method was shown to provide a comparable barrier within 20 meV error of NEB and reduce the computational time significantly for materials where the path is not too complex⁴².

The high-energy states are sampled by high-temperature molecular dynamics. The atomic configurations in Fig. 5a are sampled from a 1000 K ab-initio MD run, and the 1000 materials in Fig. 5b are selected from the WBM dataset⁵⁰ and sampled with CHGNet MD run. For each structure selected, a 20 ps, 1000 K molecular dynamics simulation is performed under constant number of particles, volume, and temperature (NVT) ensemble with the pre-trained CHGNet, and 10 structures are subsequently sampled from each MD trajectory⁵⁶. + 3% strain and a − 3% strain are applied along three lattice dimensions for 4 out of the 10 structures to sample strained configurations. All the force MAEs and fine-tuning are calculated with the three-dimensional force components rather than the absolute magnitude of forces.

Fine-tuning

Every fine-tuning and linear correction experiment in the current manuscript is trained separately for each material system. For the fine-tuning of CHGNet uMLIP, the models are trained with energy, force, and stress labels with 0.1-100-0.1 loss fractions under the mean squared error (MSE) loss criterion. The structures and labels are taken from a DFT ab-initio MD trajectory data of ${{{\rm{Li}}}_{6}{{\rm{Zn}}}_{2}{{\rm{In}}}_{2}({{\rm{IO}}}_{3})}_{16}$ (mp-973966) from Jun et al.⁶², where 100 structures are reserved for the test set, as shown by the orange points in Fig. 6a. The train-validation ratio is set to be 9:1. As a result, 9 out of the 10 training structures in the right panel of Fig. 6a are actually used for gradient back-propagations. The Adam optimizer⁶³ is used with a learning rate of 1e-3 that cosine decays to 1e-5 in 100 epochs. The model checkpoint of best validation force MAE is collected for test set predictions. For the model trained with only 1 structure, the last-epoch checkpoint is used instead.

The linear correction of CHGNet is realized by adding a hypothetical scalar linear before the energy prediction. The weight of the scalar linear layer is initialized to be 1, therefore not influencing the energy prediction before being optimized. During the linear correction, all CHGNet model parameters are frozen except for the added scalar linear layer.

DFT calculations

DFT calculations were performed with the Vienna ab initio simulation package (VASP) using the projector-augmented wave method^64,65. All calculation settings are generated using pymatgen MPRelaxSet to ensure all DFT results are compatible with Materials Project DFT calculations⁶⁶. All the calculations were converged to at least 10⁻⁵ eV in total energy for electronic steps and 0.02 eV/Å in interatomic forces for ionic steps.

Data availability

The dataset used to extract the softening scales of uMLIPs is available at https://doi.org/10.6084/m9.figshare.27307776⁶⁷.

References

Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Article CAS PubMed Google Scholar
Wang, H., Zhang, L., Han, J. & Weinan, E. Deepmd-kit: A deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Commun. 228, 178–184 (2018).
Article CAS Google Scholar
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Article CAS PubMed PubMed Central Google Scholar
Guo, Z. et al. Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms. In Proc. of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 205–218 (2022).
Musaelian, A., Johansson, A., Batzner, S. & Kozinsky, B. Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12 (2023).
Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. ICLR, arXiv preprint arXiv:2003.03123 (2020).
Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci. 2, 718–728 (2022).
Article PubMed Google Scholar
Deng, B. et al. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Article Google Scholar
Choudhary, K. et al. Unified graph neural network force-field for the periodic table: solid state applications. Digit. Discov. 2, 346–355 (2023).
Article CAS Google Scholar
Batatia, I. et al. A foundation model for atomistic materials chemistry. arXiv preprint arXiv:2401.00096. https://arxiv.org/abs/2401.00096 (2023).
Takamoto, S. et al. Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements. Nat. Commun. 13, 2991 (2022).
Article CAS PubMed PubMed Central Google Scholar
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 1–6 (2023).
Zhang, D. et al. Pretraining of attention-based deep learning potential model for molecular simulation. npj Comput. Mater. 10, 94 (2024).
Article Google Scholar
Riebesell, J. et al. Matbench discovery – an evaluation framework for machine learning crystal stability prediction. arXiv preprint arXiv:2308.14920. https://arxiv.org/abs/2308.14920 (2023).
Focassio, B., Freitas, L. P. M. & Schleder, G. R. Performance assessment of universal machine learning interatomic potentials: Challenges and directions for materials’ surfaces. ACS Appl. Mater. Interfaces. https://doi.org/10.1021/acsami.4c03815 (2024).
Yu, H., Giantomassi, M., Materzanini, G., Wang, J. & Rignanese, G.-M. Systematic assessment of various universal machine-learning interatomic potentials. Mater. Genome Eng. Adv. 2, 3 (2024).
Google Scholar
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Article CAS PubMed Google Scholar
Luo, S., Chen, T. & Krishnapriyan, A. S. Enabling efficient equivariant operations in the fourier basis via gaunt tensor products. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=mhyQXJ6JsK (2024).
Cheng, B. Cartesian atomic cluster expansion for machine learning interatomic potentials. npj Comput. Mater. 10, 157 (2024).
Article CAS Google Scholar
Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. Accurate fourth-generation machine learning potentials by electrostatic embedding. J. Chem. Theory Comput. 19, 3567–3579 (2023).
Article CAS PubMed Google Scholar
Chanussot, L. et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Article CAS Google Scholar
Obot, I., Macdonald, D. & Gasem, Z. Density functional theory (DFT) as a powerful tool for designing new organic corrosion inhibitors. part 1: An overview. Corros. Sci. 99, 1–30 (2015).
Article CAS Google Scholar
Han, Y. et al. Surface energies, adhesion energies, and exfoliation energies relevant to copper-graphene and copper-graphite systems. Surf. Sci. 685, 48–58 (2019).
Article CAS Google Scholar
Sun, W., Kitchaev, D. A., Kramer, D. & Ceder, G. Non-equilibrium crystallization pathways of manganese oxides in aqueous solution. Nat. Commun. 10, 573 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fichthorn, K. A. & Scheffler, M. Island nucleation in thin-film epitaxy: A first-principles investigation. Phys. Rev. Lett. 84, 5371–5374 (2000).
Article CAS PubMed Google Scholar
Gurylev, V. & Perng, T. P. Defect engineering of zno: Review on oxygen and zinc vacancies. J. Eur. Ceram. Soc. 41, 4977–4996 (2021).
Article CAS Google Scholar
Broberg, D. et al. High-throughput calculations of charged point defect properties with semi-local density functional theory—performance benchmarks for materials screening applications. npj Comput. Mater. 9, 72 (2023).
Article CAS Google Scholar
Ahangari, M. G. et al. Effect of various defects on mechanical and electronic properties of zinc-oxide graphene-like structure: A DFT study. Vacuum 165, 26–34 (2019).
Article CAS Google Scholar
Kang, K. & Ceder, G. Factors that affect li mobility in layered lithium transition metal oxides. Phys. Rev. B 74, 094105 (2006).
Article Google Scholar
van de Walle, A. Multicomponent multisublattice alloys, nonconfigurational entropy and other additions to the Alloy Theoretic Automated Toolkit. Calphad 33, 266–278 (2009).
Article Google Scholar
Cation-disordered rocksalt-type high-entropy cathodes for Li-ion batteries. Nat. Mater. 20, 214–221 (2021).
Barroso-Luque, L. et al. Cluster expansions of multicomponent ionic materials: Formalism and methodology. Phys. Rev. B 106, 144202 (2022).
Article CAS Google Scholar
Zhong, P., Xie, F., Barroso-Luque, L., Huang, L. & Ceder, G. Modeling intercalation chemistry with multiredox reactions by sparse lattice models in disordered rocksalt cathodes. PRX Energy 2, 043005 (2023).
Article Google Scholar
Ceder, G. A derivation of the ising model for the computation of phase diagrams. Comput. Mater. Sci. 1, 144–150 (1993).
Article CAS Google Scholar
DOMAN, R. C., BARR, J. B., McNALLY, R. N. & ALPER, A. M. Phase equilibria in the system cao—mgo. J. Am. Ceram. Soc. 46, 313–316 (1963).
Article CAS Google Scholar
Jung, I.-H., Decterov, S. A. & Pelton, A. D. Critical thermodynamic evaluation and optimization of the CaO–MgO–SiO2 system. J. Eur. Ceram. Soc. 25, 313–333 (2005).
Article CAS Google Scholar
Deng, Z., Radhakrishnan, B. & Ong, S. P. Rational composition optimization of the Lithium-Rich Li3OCl1–x Br x anti-perovskite superionic conductors. Chem. Mater. 27, 3749–3755 (2015).
Article CAS Google Scholar
Du, P. et al. Cooperative origin of proton pair diffusivity in yttrium substituted barium zirconate. Commun. Phys. 3, 200 (2020).
Article CAS Google Scholar
Rutt, A. et al. Expanding the material search space for multivalent cathodes. ACS Appl. Mater. Interfaces 14, 44367–44376 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shen, J.-X., Horton, M. & Persson, K. A. A charge-density-based general cation insertion algorithm for generating new li-ion cathode materials. npj Comput. Mater. 6, 161 (2020).
Article CAS Google Scholar
Rong, Z., Kitchaev, D., Canepa, P., Huang, W. & Ceder, G. An efficient algorithm for finding the minimum energy path for cation migration in ionic materials. J. Chem. Phys. 145, 074112 (2016).
Article PubMed Google Scholar
Ven, A. V. D., Ceder, G., Asta, M. & Tepesch, P. D. First-principles theory of ionic diffusion with nondilute carriers. Phys. Rev. B 64, 184307 (2001).
Article Google Scholar
Walle, A. V. D. & Ceder, G. The effect of lattice vibrations on substitutional alloy thermodynamics. Rev. Mod. Phys. 74, 11–45 (2002).
Article Google Scholar
Togo, A. & Tanaka, I. First principles phonon calculations in materials science. Scr. Mater. 108, 1–5 (2015).
Article CAS Google Scholar
Yue, S. et al. Phonon softening near topological phase transitions. Phys. Rev. B 102, 235428 (2020).
Article CAS Google Scholar
Parlinski, K., Li, Z. Q. & Kawazoe, Y. First-principles determination of the soft mode in cubic ZrO2. Phys. Rev. Lett. 78, 4063–4066 (1997).
Article CAS Google Scholar
Togo, A. First-principles Phonon calculations with Phonopy and Phono3py. J. Phys. Soc. Jpn. 92, 012001 (2023).
Article Google Scholar
Togo, A., Chaput, L., Tadano, T. & Tanaka, I. Implementation strategies in phonopy and phono3py. J. Phys.: Condens. Matter 35, 353001 (2023).
CAS Google Scholar
Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. npj Comput. Mater. 7, 12 (2021).
Article CAS Google Scholar
Pandey, S., Qu, J., Stevanović, V., John, P. S. & Gorai, P. Predicting energy and stability of known and hypothetical crystals using graph neural network. Patterns 2, 100361 (2021).
Article PubMed PubMed Central Google Scholar
Bartel, C. J. Data-centric approach to improve machine learning models for inorganic materials. Patterns 2, 100382 (2021).
Article CAS PubMed PubMed Central Google Scholar
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. Sect. B 58, 364–369 (2002).
Article Google Scholar
Ko, T. W. & Ong, S. P. Recent advances and outstanding challenges for machine learning interatomic potentials. Nat. Comput. Sci. 1–3 (2023).
Frey, N. C. et al. Neural scaling of deep chemical models. Nat. Mach. Intell. 1–9 (2023).
Qi, J., Ko, T. W., Wood, B. C., Pham, T. A. & Ong, S. P. Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling. npj Comput. Mater. 10, 43 (2024).
Article Google Scholar
Xu, K. et al. How neural networks extrapolate: From feedforward to graph neural networks. arXiv preprint arXiv:2009.11848 (2020).
Barroso-Luque, L. et al. Open materials 2024 (omat24) inorganic materials dataset and models. arXiv preprint arXiv:2410.12771 (2024).
Sun, W. & Ceder, G. Efficient creation and convergence of surface slabs. Surf. Sci. 617, 53–59 (2013).
Article CAS Google Scholar
Ganose, A. et al. atomate2. https://github.com/materialsproject/atomate2 (2024).
Henkelman, G., Uberuaga, B. P. & Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 113, 9901–9904 (2000).
Article CAS Google Scholar
Jun, K. et al. Lithium superionic conductors with corner-sharing frameworks. Nat. Mater. 21, 924–931 (2022).
Article CAS PubMed Google Scholar
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Article CAS Google Scholar
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).
Article CAS Google Scholar
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).
Article CAS Google Scholar
Deng, B. Wbm high energy states. figshare. Dataset (2024). https://figshare.com/articles/dataset/WBM_high_energy_states/27307776.

Download references

Acknowledgements

This work was funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-AC0205CH11231 (Materials Project program KC23MP). The work was also supported by the computational resources provided by the Extreme Science and Engineering Discovery Environment (XSEDE), supported by National Science Foundation grant number ACI1053575; the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory; and the Swift Cluster resource provided by the National Renewable Energy Laboratory (NREL). The authors would also like to thank Tsz Wai Ko and Yuanqi Du for helpful discussions.

Author information

Authors and Affiliations

Department of Materials Science and Engineering, University of California, Berkeley, CA, USA
Bowen Deng, Yunyeong Choi, Peichen Zhong, KyuJung Jun, Kristin A. Persson & Gerbrand Ceder
Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Bowen Deng, Yunyeong Choi, Peichen Zhong, Shashwat Anand, Zhuohan Li, KyuJung Jun, Kristin A. Persson & Gerbrand Ceder
Cavendish Laboratory, University of Cambridge, Cambridge, UK
Janosh Riebesell

Authors

Bowen Deng
View author publications
Search author on:PubMed Google Scholar
Yunyeong Choi
View author publications
Search author on:PubMed Google Scholar
Peichen Zhong
View author publications
Search author on:PubMed Google Scholar
Janosh Riebesell
View author publications
Search author on:PubMed Google Scholar
Shashwat Anand
View author publications
Search author on:PubMed Google Scholar
Zhuohan Li
View author publications
Search author on:PubMed Google Scholar
KyuJung Jun
View author publications
Search author on:PubMed Google Scholar
Kristin A. Persson
View author publications
Search author on:PubMed Google Scholar
Gerbrand Ceder
View author publications
Search author on:PubMed Google Scholar

Contributions

B.D. and G.C. conceived the initial idea. Z.L. and B.D. performed benchmarks on surface energies. S.A. and B.D. performed benchmarks on defect energies. P.Z. and B.D. performed benchmarks on solid solutions. Y.C. performed benchmarks on ion migration barriers. J.R. performed benchmarks on phonons. B.D. performed benchmarks on high-energy states. B.D. conceived and analyzed the softening scale and linear corrections. K.P. and G.C. offered insight and guidance throughout the project. All authors contributed to discussions and approved the paper.

Corresponding author

Correspondence to Gerbrand Ceder.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, B., Choi, Y., Zhong, P. et al. Systematic softening in universal machine learning interatomic potentials. npj Comput Mater 11, 9 (2025). https://doi.org/10.1038/s41524-024-01500-6

Download citation

Received: 23 May 2024
Accepted: 20 December 2024
Published: 10 January 2025
DOI: https://doi.org/10.1038/s41524-024-01500-6

This article is cited by

Benchmarking machine learning models for predicting lithium ion migration
- Artem D. Dembitskiy
- Innokentiy S. Humonen
- Semen A. Budennyy
npj Computational Materials (2025)
Activation entropy of dislocation glide in body-centered cubic metals from atomistic simulations
- Arnaud Allera
- Thomas D. Swinburne
- David Rodney
Nature Communications (2025)
A framework to evaluate machine learning crystal stability predictions
- Janosh Riebesell
- Rhys E. A. Goodall
- Kristin A. Persson
Nature Machine Intelligence (2025)
Machine learning and data-driven methods in computational surface and interface science
- Lukas Hörmann
- Wojciech G. Stark
- Reinhard J. Maurer
npj Computational Materials (2025)
General reactive element-based machine learning potentials for heterogeneous catalysis
- Changxi Yang
- Chenyu Wu
- P. Hu
Nature Catalysis (2025)