Abstract
There has been an ongoing race for the past several years to develop the best universal machine learning interatomic potential. This progress has led to increasingly accurate models for predicting energy, forces, and stresses, combining innovative architectures with big data. Here, we benchmark these models on their ability to predict harmonic phonon properties, which are critical for understanding the vibrational and thermal behavior of materials. Using around 10 000 ab initio phonon calculations, we evaluate model performance across various phonon-related parameters to test the universal applicability of these models. The results reveal that some models achieve high accuracy in predicting harmonic phonon properties. However, others still exhibit substantial inaccuracies, even if they excel in the prediction of the energy and the forces for materials close to dynamical equilibrium. These findings highlight the importance of considering phonon-related properties in the development of universal machine learning interatomic potentials.
Similar content being viewed by others
Introduction
One of the most impactful applications of artificial intelligence methods to the field of materials science has been the introduction of machine learning interatomic potentials (MLIPs)1,2,3,4. These are by now capable of delivering energies and forces at the level of density functional theory (DFT), or beyond, at a computational cost that is often several orders of magnitude lower. As such, they are now accelerating or even replacing the expensive DFT calculations, truly enabling the in-silico design and development of complex materials.
Various representation methods for crystal structures (embedding techniques) have been proposed in the past years5,6,7,8,9. These methods encode crystal structure information into learnable features, thereby improving data efficiency for the models. Furthermore, new machine learning models and strategies were developed and improved. These advancements gained significant momentum with the introduction of message passing neural network frameworks10, which were later enhanced by incorporating continuous-filter convolutions for message passing11. Message passing addressed the issue of exponentially expanding descriptor sizes in earlier machine learning models, enabling the prediction of much larger and more complex systems.
The training of MLIPs has also been facilitated by the continuous accumulation of DFT calculations over the decades, and by the creation of comprehensive databases, such as the Materials Project12, the Open Quantum Materials Database13, Aflowlib14, Alexandria15, NOMAD16, etc. These databases contain materials with almost all chemical elements and in all types of crystal structures. They also provide a variety of computed properties, including total energies, forces, stresses, etc. not only for compounds at dynamical equilibrium, but also for geometry optimization or molecular dynamics paths.
Until recently, MLIPs were typically trained for specific chemical systems and were often limited to a narrow range of geometries and atomic arrangements. This paradigm shifted in 2019 with the introduction of the Materials Graph Network (MEGNet)17, a framework designed for universal machine learning in materials science. Universal MLIPs (uMLIPs) are foundational models capable of handling all chemistries and crystal structures. MEGNet already demonstrated relatively low prediction errors across a wide array of properties in both molecules and crystals. Its performance was significantly enhanced by incorporating atomic coordinates, lattice vectors in crystals, and 3-body interactions18, enabling uMLIPs to predict ground-state geometries with a mean absolute error of 0.035 eV/atom in the energy, when compared to DFT. Further advancements, such as the use of higher-order body messages, have resulted in models that are accurate, fast, and highly parallelizable19. Since then, there has been a surge of developments, with new and improved models being published at an almost monthly rate20,21,22,23,24.
In spite of the rapid progress of uMLIPs, challenges remain. Since these models are mostly trained and evaluated on existing datasets12,15,23, containing mainly equilibrium or near-equilibrium geometries, they struggle to reproduce meta-stable or highly distorted structures25. To resolve this problem, further information on the off-equilibrium structures from molecular dynamic results can be used26. Alternatively, by gradually distorting the optimized geometries, one can step away from the minima of the potential energy surface27. Models trained on such augmented datasets show superior performance at predicting equilibrium structures and energies27. Moreover, compared to those trained without off-equilibrium data, models trained with augmented datasets perform better on predicting the first derivatives of the energy27. While multiple evaluations of uMLIPs can be found in the literature28,29,30, direct phonon prediction capabilities have not been comprehensively characterized.
Here we benchmark seven uMLIP models, specifically M3GNet, CHGNet, MACE-MP-0, SevenNet-0, MatterSim-v1, ORB, and eqV2-M, for the calculation of phonon properties. These properties are obtained from the second derivatives (i.e., the curvature) of the potential energy surface, and therefore sample a small neighborhood around the dynamically stable minima. Phonons are extremely important in materials science, as they are fundamental in determining the free energy (and therefore thermodynamic stability), dynamical stability, thermal properties, etc. We note that all seven models are also featured in the Matbench Discovery leaderboard17,19,20,21,23,25,27,31 (ranked 12th, 11th, 10th, 8th, 3rd, 2nd, and 1st, respectively, at the time of writing).
M3GNet17 is one of the pioneering uMLIPs and still remains a key model in the field. It employs three-body interactions and incorporates atomic positions, enabling the calculation of forces through the automatic differentiation of the neural network. CHGNet23 is another of the earlier models, but it still demonstrates excellent performance while having one of the smallest architectures with just over 400 thousand parameters. MACE-MP-019 utilizes the atomic cluster expansion32 as a local descriptor, reducing the number of necessary message-passing steps while maintaining efficiency. SevenNet-021, built upon NequIP8, focuses on parallelizing the message-passing process. This approach preserves NequIP’s data efficiency, accuracy, and equivariant character. MatterSim-v131 builds upon M3GNet, leveraging active learning and efficient sampling across the chemical space. Its goal is to enhance the accuracy of energy and force predictions over a broader range of scenarios while maintaining a straightforward architecture that is easy to fine-tune. The ORB model20 combines the smooth overlap of atomic positions6 with a graph network simulator33. Finally, eqV2-M27 is using the model developed by ref. 22 utilizing equivariant transformers to achieve higher-order equivariant representations. An important detail to note is that the ORB and eqV2-M models predict forces as a separate output rather than deriving them as energy gradients as the other five models.
Results
Dataset and its properties
To benchmark phonon properties we use the dataset developed in the MDR database34. This dataset includes around 10 000 non-magnetic semiconductors, covering a wide range of elements across the periodic table. Moreover, the phonon calculations were performed with VASP, ensuring a high degree of compatibility with the training sets used in the construction of the uMLIPs. Unfortunately, this phonon dataset was originally constructed with the Perdew-Burke-Ernzerhof (PBE) for solids (PBEsol)35 approximation to the exchange-correlation functional. This is certainly a very reasonable choice, as the PBEsol functional exhibits superior structural36,37 and phonon38 properties when compared to the standard PBE39. However, as all uMLIPs were trained on PBE data, a direct comparison to PBEsol phonons can be ambiguous. To mitigate this problem, we recalculated the entire phonon dataset from ref. 34 with the PBE functional (see Section IV). In the following, we not only present comparisons of uMLIP calculations with PBE data, but we also include the difference between PBE and PBEsol. This gives us an estimate of the variability of the results as a function of the approximation to the exchange-correlation function, that we use as an absolute scale to assess the quality of the uMLIPs.
As illustrated in Fig. 1a, the dataset contains mostly ternary and quaternary compounds. Additionally, we observe that the majority of the compounds belong to the monoclinic and orthorhombic crystal systems, followed by approximately equal proportions of trigonal and tetragonal systems. Cubic systems are less common, with hexagonal systems representing the smallest proportion. Ultimately, these characteristics are inherited from the Materials Project database12 and the Inorganic Crystal Structure Database (ICSD)40. Finally, triclinic systems are absent from the MDR database, likely because of the extra computational cost that arises from the reduced symmetry.
In Fig. 2 we plot the frequency of the chemical elements in the dataset. We can see that almost all the periodic table is well represented (with a few exceptions like Tc that is radioactive or Eu and Gd for which VASP has convergence problems). We also observe a significant abundance of structures containing oxygen. However, certain compounds, such as those containing Mo and W, as well as the magnetic 3d elements (from V to Ni) are underrepresented. These biases in the MDR database34 are also, to some extent, inherited from the Materials Project database12, but should not be relevant for the benchmark we present here. Although the dataset is predominated by oxides, the band gaps of the whole set still covers a large range, as illustrated in Fig. 1c.
Relative performances of uMLIPs
We start by discussing the errors in the geometry relaxations, as shown in Table 1 and Fig. 3. The “Failed” column in Table 1 indicates for how many systems a model failed to converge the forces to below 0.005 eV/Å. We can see that CHGNet and MatterSim-v1 models appear to be the most reliable, with 0.09% and 0.10% unconverged structures, respectively. The M3GNet, SevenNet-0 and MACE-MP-0 models have a similar number of unconverged structures, while the ORB and eqV2-M models exhibit a much larger failure rate. The most unreliable model for this dataset is eqV2-M, for which 0.85% structural calculations were unable to converge. In general, there are two main reasons behind the failures, either the geometry optimization path explored regions of the potential energy surface where the uMLIP yielded unphysical forces, or there were high frequency errors in the forces that prevented the relaxation algorithm to converge to the required precision. This latter reason is behind the very large failure rate for the two models where the forces are not the exact derivatives of the energy. CHGNet shows notably higher error in energy predictions, which is expected given that we did not apply the energy correction procedure typically used during CHGNet’s training.
Looking at Fig. 3 we see that, as expected, PBEsol leads to a contraction of the unit cell, correcting the underbinding that is typical of the PBE approximation. The large majority of the systems show a difference between the PBE and PBEsol volume per atom between 0 and –2 Å3/atom. All uMLIPs exhibit MAE(V) that are smaller than the mean absolute difference between PBE and PBEsol. Among them, the eqV2-M model emerges as the most accurate, closely followed by ORB. Indeed, these two uMLIPs show remarkable performances for the vast majority of the compounds in the dataset, with errors that are quite small in both absolute and relative terms. MatterSim-v1 and SevenNet-0 show solid performances, although with mean errors four times larger than the two best models. Finally, M3GNet, MACE-MP-0, and CHGNet have wider error distributions, with MAE in the range of 0.4–0.5 Å3/atom. These results confirm that both eqV2-M and ORB are the best models for geometry optimization, and that they can already be used to essentially replace DFT calculations for this task.
We now turn our attention to phonon related properties. We chose to look at the maximum phonon frequency (reported in Kelvin, with 1 K = 0.695 cm−1), the phonon density of state (DOS), the average of the sound velocity on the 3 accoustic branches, the vibrational entropy, the Helmholtz free energy, and the heat capacity at constant volume, the last three calculated at the temperature of 300 K. The maximum phonon frequency allows us to detect systematic errors in the prediction of the concavity of the potential energy surface, especially important as it is well known that some uMLIPs have the tendency to yield too soft phonons. The phonon DOS provides information regarding the general prediction of phonon modes with respect to frequency, while the sound velocity help identify errors in the acoustic branches in the vicinity of Γ. It should be noted that for the phonon DOS we remove values below 0.1 states/THz. The vibrational entropy and the Helmholtz free energy are important properties as they are essential to determine thermodynamic stability and phase diagrams as a function of temperature. Finally, the heat capacity is an important thermal property that can be directly measured experimentally.
We note that maximum phonon frequency was calculated from the values at the q-points commensurate to the supercell matrix, whereas the DOS and thermodynamic properties were obtained on an denser q-grid by applying Fourier interpolation (see Section IV). However, as the q-grids are consistent across DFT and uMLIPs calculations, the interpolation error should be systematic and should not affect the benchmark.
We aggregated the errors for all models in Table 2 and in Fig. 4. We first notice that the deviation between the PBE and PBEsol results is small but not negligible. This observation reinforces the necessity of using a consistent functional between the training and benchmarking stages. The difference between PBE and PBEsol exhibits a rather narrow distribution in all 6 properties, especially when compared to the MAE of most of the uMLIPs. There are also systematic differences: for example, the maximum phonon frequencies in PBEsol are higher than those with PBE, which can be understood by the contraction of the cell and subsequent hardening of the force constants. PBEsol also leads to larger values of the free energy (on average of the order of 10 kJ/mol), and to smaller values of the entropy and the heat capacity.
Based on the errors we can roughly classify the seven models into three categories. The first contains ORB and eqV2-M, which have very large errors in phonon-related properties (see Fig. 4). In fact, phonon frequencies are grossly underestimated, and are often even imaginary as we will see in the following.
In the second category we have, in increasing order of accuracy, M3GNet, CHGNet, MACE-MP-0 and SevenNet-0 (see Fig. 4). The errors of these models are on average considerably larger than the difference between PBE and PBEsol. Moreover they all exhibit systematic errors, underestimating the phonon frequencies and the free energy, and overestimating the entropy and the heat capacity. From the four models, the most accurate is clearly SevenNet-0, while the older M3GNet and CHGNet show the larger errors. In spite of the difference in topologies, these four models are all trained in the same dataset, so it is not surprising that their results are somewhat similar. This again demonstrates that training data is at least as important as the representation of the crystal structure or the topology of the model to develop a uMLIP.
Finally, MatterSim-v1 stands out as the most accurate uMLIP for the calculation of phonons. Not only does it not exhibit any strong systematic error, with all distributions essentially centered at zero, but also the dispersion of the errors is extremely small, leading to values of MAE considerably smaller than the difference between PBE and PBEsol. This indicates that MatterSim-v1 can be used to calculate phonon properties of semiconductors with an accuracy comparable to DFT codes, although at a very small fraction of the computational cost. It is very interesting to note that although MatterSim-v1 is based upon the simple M3GNet, its performance exceeds much more complicated models such as SevenNet-0 or eqV2-M that are based on equivariant networks. The key in this case is the scalability of M3GNet, which allows for an increase in the number of parameters and the efficient use of larger amounts of training data.
To have a better understanding of the general behavior of the uMLIPs, we plot in Fig. 5 the distribution of the maximum frequencies predicted. Most compounds have maximum frequencies in the range of 500–2000 K, with a few containing very light elements going up to 5500 K. The softening of the phonon frequencies by M3GNet, CHGNet, MACE-MP-0 and SevenNet-0 is evident, in particular for the first two. ORB and eqV2-M, on the other hand, exhibit completely distorted distributions peaking at zero, showing that the force constants obtained with these models are unphysical.
Another important performance metric is dynamical stability, a crucial stability descriptor utilized by many high-throughput searches of inorganic materials41,42,43,44. A compound is dynamically stable when it is in a true minimum of the potential energy surface and not in a maximum or a saddle point. In practice, it is assured by the absence of imaginary phonon frequencies in the spectrum. Unfortunately, it is well known that numerical inaccuracies often lead to small imaginary frequencies close to the Γ-point. To avoid this problem, we consider a structure to be dynamically stable if frequencies are all real across the Brillouin zone except at Γ where we allow the three acoustic modes to have small imaginary frequencies (with a threshold of −50 K). This criterion was applied to all q-points commensurate with the supercell matrix (but not to the interpolated q-points).
The elements of the confusion matrix, when compared to the PBE, are listed in Table 3. Most compounds that are stable in the PBE are also stable in PBEsol, and vice-versa, with the differences coming mostly from the difficulty associated to small imaginary frequencies as mentioned above. MatterSim-v1 and MACE-MP-0 are the most reliable with a percentage of true positives at 95%. M3GNet, SevenNet-0 and CHGNet are somewhat less accurate, especially in what concerns the percentage of true positives. Finally, the eqV2-M and ORB models perform very poorly, with more than 80% of the unstable systems being false negatives.
Discussion
We created a dataset that includes phonon properties of almost 10,000 semiconductors obtained with DFT. These calculations were performed with the PBE approximation, the same approximation employed in the datasets used for the training of uMLIPs. This allows us to benchmark, without ambiguities, phonon properties calculated with uMLIPs.
In what concerns the equilibrium geometry, ORB and eqV2-M are extremely accurate and convincingly outperform all other models. This can be understood from the fact that the models output both the energy and the forces, and are trained in a very large dataset, leading to very small errors at equilibrium. Regarding phonons, however, the situation is completely different. ORB and eqV2-M yield very low quality phonons, often imaginary. We believe that the reason for this problem is that these models are non-conservative. In fact, contrary to all other models, in ORB and eqV2-M the forces are not calculated by performing the derivative of the energy with respect to the atomic positions, but they are output directly by the network. This avoids the costly computational step of evaluating the derivatives though back-propagation and the extra freedom allows for a more accurate prediction of energy and forces. Unfortunately, it also leads to inevitable errors especially for the small displacements required for the calculation of phonons. This problematic behavior has also been reported and analyzed in45. The problem can be alleviated, but far from resolved, by using larger displacements in the frozen-phonon workflow. Of course, this can lead to further problems, such as the overestimation of the anharmonic contributions.
Phonon properties calculated with MatterSim-v1, and to a lesser extent SevenNet-0, are of very high quality. Other models fare somewhere in between, exhibiting both a larger dispersion of the errors, and systematic deviations with respect to the reference PBE values.
We should note that not only the performance of the models, but also their computational efficiency, should be taken into account when choosing a uMLIP for a specific application. From the models tested here, M3GNet is by far the fastest, running in a single CPU core more efficiently than any of the other models in a full GPU. On the other extreme we have eqV2-M and MACE-MP-0, convincingly the slowest of the pack, while the rest of the models fall in between.
Our benchmark highlights the importance of considering specific optimization goals for individual metrics and understanding the trade-offs involved. Furthermore, it shows that uMLIPs are ready to be used not only for the calculation of geometries and energies, but also of response properties, that are essential for a variety of material applications. We hope our critical assessment of phonon properties will guide future training efforts and encourage the use of our dataset to further develop uMLIPs.
Methods
Ab initio dataset
To recalculate the MDR dataset34 with PBE39 exchange-correlation functional, we used the code VASP46,47. We used all parameters consistent with the MDR dataset, with the exception of the approximation to the exchange-correlation functional that was changed from PBEsol to PBE. We followed the same workflow as MDR, but before the stringent geometry relaxation we applied a pre-relaxation step with energy and force convergence criteria of 10−7 eV/cell and 10−5 eV/Å, respectively. For the stringent relaxation step, in accordance to the MDR calculations, we used a higher energy and force convergence criteria of 10−8 eV/cell and 10−8 eV/Å, respectively. Next, the force constants were obtained by applying the finite displacement method as implemented in the PHONOPY python package48,49.
uMLIP evaluation
For all the uMLIP models, we perform the geometry relaxation and force set calculations starting from the PBE geometry using the Atomic Simulation Environment (ASE)50. To keep the space group symmetry of the PBE structure, we employ the ASE symmetrizer FRETCHCELLFILTER. The structure optimization is done using the fast inertial relaxation engine (FIRE)51, with force convergence criteria set to 0.005 eV/Å for all models.
To calculate the thermodynamic properties, i.e. the vibrational entropy, the Helmhotz free energy, and the heat capacity, the phonon density of states is obtained by Fourier interpolation from the coarse calculated q-grid into a denser 20 × 20 × 20 grid as in the MDR database. We set a temperature of 300 K to compute the thermodynamic properties.
The phonon density of states has been calculated using the same grid as that employed for the thermal properties. Values that are below 0.1 states/THz from PBE and model prediction were removed. The sound velocity is calculated using group velocities near the Γ point, that are calculated using small q-vectors oriented along each axis (x, y, and z). For each phonon branch, we extract the directional component of the group velocity corresponding to the axis along which the q-vector was oriented, specifically the xx, yy, and zz components. We then calculate the average of these directional components across all acoustic branches to obtain the average sound velocity.
All models considered in this paper are open source. In Table 4 we list their training set sizes, data sources, and the number of parameters.
Data availability
The phonon dataset is available in Alexandria which can be accessed and/or downloaded from https://alexandria.icams.rub.de/ under the terms of the Creative Commons Attribution 4.0 License.
Code availability
All code developed in this work is freely available at https://github.com/hyllios/utils/tree/main/.
References
Behler, J. Perspective: Machine learning potentials for atomistic simulations. J. Chem. Phys. 145, 170901 (2016).
Graser, J., Kauwe, S. K. & Sparks, T. D. Machine learning and energy minimization approaches for crystal structure predictions: A review and new horizons. Chem. Mater. 30, 3601–3612 (2018).
Schmidt, J., Marques, M. R. G., Botti, S. & Marques, M. A. L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 5, 83 (2019).
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. In International Conference on Learning Representations (2020).
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Gastegger, M., Schütt, K. T. & Müller, K.-R. Machine learning of solvent effects on molecular spectra and reactions. Chem. Sci. 12, 11473–11483 (2021).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Precup, D. & Teh, Y. W. (eds.) Proceedings of the 34th International Conference on Machine Learning, vol. 70 of Proceedings of Machine Learning Research, 1263–1272 (PMLR, 2017).
Schütt, K. et al. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In Guyon, I.et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017).
Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Kirklin, S. et al. The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015).
Curtarolo, S. et al. AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).
Schmidt, J. et al. Improving machine-learning models in materials science through large datasets. Mater. Today Phys. 48, 101560 (2024).
Scheidgen, M. et al. NOMAD: A distributed web-based platform for managing materials science research data. J. Open Source Softw. 8, 5388 (2023).
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).
Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci. 2, 718–728 (2022).
Batatia, I., Kovacs, D. P., Simm, G., Ortner, C. & Csanyi, G. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Koyejo, S.et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 11423–11436 (Curran Associates, Inc., 2022).
Neumann, M. et al. Orb: A fast, scalable neural network potential. Preprint at https://arxiv.org/abs/2410.22570 (2024).
Park, Y., Kim, J., Hwang, S. & Han, S. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations. J. Chem. Theory Comput. 20, 4857–4868 (2024).
Liao, Y.-L., Wood, B. M., Das, A. & Smidt, T. EquiformerV2: Improved equivariant transformer for scaling to higher-degree representations. In The Twelfth International Conference on Learning Representations (2024).
Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Choudhary, K. & DeCost, B. Atomistic line graph neural network for improved materials property predictions. npj Comput. Mater. 7, 185 (2021).
Riebesell, J. et al. Matbench discovery – a framework to evaluate machine learning crystal stability predictions. Preprint at https://arxiv.org/abs/2308.14920v3 (2024).
Liao, Y.-L., Smidt, T., Shuaibi, M. & Das, A. Generalizing denoising to non-equilibrium structures improves equivariant force fields. Preprint at https://arxiv.org/abs/2403.09549 (2024).
Barroso-Luque, L. et al. Open materials 2024 (OMat24) inorganic materials dataset and models. Preprint at https://arxiv.org/abs/2410.12771 (2024).
Focassio, B., M. Freitas, L. P. & Schleder, G. R. Performance assessment of universal machine learning interatomic potentials: Challenges and directions for materials’ surfaces. ACS Appl. Mater. Interfaces 17, 13111 (2024).
Póta, B., Ahlawat, P., Csányi, G. & Simoncelli, M. Thermal conductivity predictions with foundation atomistic models. Preprint at https://arxiv.org/abs/2408.00755 (2024).
Yu, H., Giantomassi, M., Materzanini, G., Wang, J. & Rignanese, G.-M. Systematic assessment of various universal machine-learning interatomic potentials. Mater. Genome Eng. Adv. 2, e58 (2024).
Yang, H. et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures. Preprint at https://arxiv.org/abs/2405.04967 (2024).
Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).
Sanchez-Gonzalez, A. et al. Learning to simulate complex physics with graph networks. Preprint at https://arxiv.org/abs/2002.09405 (2020).
National Institute for Materials Science Japan. MDR phonon calculation database. https://mdr.nims.go.jp/collections/8g84ms862?locale=en. Accessed: November 04, 2024.
Perdew, J. P. et al. Restoring the density-gradient expansion for exchange in solids and surfaces. Phys. Rev. Lett. 100, 136406 (2008).
Csonka, G. I. et al. Assessing the performance of recent density functionals for bulk solids. Phys. Rev. B 79, 155107 (2009).
Hussein, R., Schmidt, J., Barros, T., Marques, M. A. L. & Botti, S. Machine-learning correction to density-functional crystal structure optimization. MRS Bull. 47, 765–771 (2022).
He, L. et al. Accuracy of generalized gradient approximation functionals for density-functional perturbation theory calculations. Phys. Rev. B 89, 064305 (2014).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Bergerhoff, G., Hundt, R., Sievers, R. & Brown, I. D. The inorganic crystal structure data base. J. Chem. Inf. Model. 23, 66–69 (1983).
Haastrup, S. et al. The computational 2D materials database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5, 042002 (2018).
Zhu, Z. et al. A high-throughput framework for lattice dynamics. npj Comput. Mater. 10, 258 (2024).
Choudhary, K. et al. High-throughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. npj Comput. Mater. 6, 64 (2020).
Cerqueira, T. F. T., Sanna, A. & Marques, M. A. L. Sampling the materials space for conventional superconducting compounds. Adv. Mater. 36, 2307085 (2023).
Bigi, F., Langer, M. & Ceriotti, M. The dark side of the forces: assessing non-conservative force models for atomistic machine learning. Preprint at https://arxiv.org/abs/2408.00755 (2024).
Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).
Togo, A. First-principles phonon calculations with Phonopy and Phono3py. J. Phys. Soc. Jpn. 92, 012001 (2023).
Togo, A., Chaput, L. & Tanaka, I. Distributions of phonon lifetimes in Brillouin zones. Phys. Rev. B 91, 094306 (2015).
Hjorth Larsen, A. et al. The atomic simulation environment-"a Python library for working with atoms. J. Phys.:Condens. Matter 29, 273002 (2017).
Bitzek, E., Koskinen, P., Gähler, F., Moseler, M. & Gumbsch, P. Structural relaxation made simple. Phys. Rev. Lett. 97, 170201 (2006).
Acknowledgements
A.L. and M.A.L.M. acknowledge funding from the Horizon Europe MSCA Doctoral network grant n.101073486, EUSpecLab, funded by the European Union. S.B. and D.S. acknowledge financial support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the project BO4280/11-1. H.C.W and M.A.L.M would like to thank the NHR Center PC2 for providing computing time on the Noctua 2 supercomputers.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
A.L. and M.A.L.M developed the high-throughput workflow; A.L., D.S., and M.A.L.M performed the response calculations; A.L. and M.A.L.M. performed the machine learning validations; H.-C. W., S.B., and M.A.L.M directed the research. All authors participated equally in the interpretation of the results and in the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
S.B. declares to be an Associate Editor for npj Computational Materials. This role has not influenced the peer review or editorial process for this manuscript, which has been handled independently according to the journal’s standard procedures for manuscripts with editorial board member authorship.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Loew, A., Sun, D., Wang, HC. et al. Universal machine learning interatomic potentials are ready for phonons. npj Comput Mater 11, 178 (2025). https://doi.org/10.1038/s41524-025-01650-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524-025-01650-1
This article is cited by
-
Materials Graph Library (MatGL), an open-source graph deep learning library for materials science and chemistry
npj Computational Materials (2025)