Abstract
Understanding the anharmonic phonon properties of crystal compounds—such as phonon lifetimes and thermal conductivities—is essential for investigating and optimizing their thermal transport behaviors. These properties also impact optical, electronic, and magnetic characteristics through interactions between phonons and other quasiparticles and fields. In this study, we develop an automated first-principles workflow to calculate anharmonic phonon properties and build a comprehensive database encompassing more than 6500 inorganic compounds. Utilizing this dataset, we train a graph neural network model to predict thermal conductivity values and spectra from structural parameters, demonstrating a scaling law in which prediction accuracy improves with increasing training data size. High-throughput screening with the model enables the identification of materials exhibiting extreme thermal conductivities—both high and low. The resulting database offers valuable insights into the anharmonic behavior of phonons, thereby accelerating the design and development of advanced functional materials.
Similar content being viewed by others
Introduction
In recent years, the integration of traditional materials science approaches, rooted in fundamental principles, with data-driven methodologies—collectively known as Materials Informatics (MI)—has rapidly advanced, leading to significant breakthroughs in the development of materials for batteries1,2,3, catalysts4, magnetic systems5, and beyond. For inorganic materials, large-scale computational databases have served as the backbone of MI efforts, including the Materials Project (2013)6,7,8 with data on 170,000 materials, OQMD (2013)9,10 with 1.2 million materials, and AFLOW (2014)11 with 3.5 million materials. More recently, a series of emerging databases have expanded this landscape, such as a database dedicated to \({Fm}\bar{3}m\) cubic structures with over 200,000 entries, the Carolina Materials Database (2020)12,13, DeepMind’s GNoME containing 40 million novel crystal structures (2024)14, and META’s OMat24 with 1.1 billion density functional theory (DFT) calculation entries (2024)15. However, these databases primarily focus on crystal structures and properties derived from relatively straightforward calculations, such as electronic band structures and band gaps.
In contrast, databases centered on lattice thermal properties, which dominate heat transport in non-metallic materials, remain relatively scarce. Existing simulation-based resources largely provide harmonic phonon properties or lattice thermal conductivity estimates based on approximations—for example, Phonondb16 offers harmonic properties for ~10,000 materials, and AFLOW employs the quasiharmonic Debye approximation17. Experiment-based databases, such as Starrydata18 and AtomWork19, compile thermal conductivity and thermoelectric data from the literature. However, these data are significantly influenced by extrinsic factors such as grain size20,21, carrier density, composition22,23, impurities24,25, defects, strain26,27,28,29, and uncertainty of the measurement30. Such factors are often undocumented and difficult to control, posing challenges for reliable predictive modeling. Therefore, a first-principles-based database of anharmonic phonon properties is essential for accurately capturing intrinsic thermal behavior, including phonon lifetimes and thermal conductivity, without relying on empirical assumptions.
Complementing these efforts, a team at Microsoft has recently developed an extensive database of anharmonic phonon properties for ~246,000 materials31, using machine learning potentials32. While this represents a significant step forward in material research, the available material space for machine learning potential is limited to relatively simple systems due to the focus on high thermal conductivity as the target property—specifically, binary compounds with up to four atoms per primitive cell and ternary compounds composed of group 13–16 elements with up to seven atoms per primitive cell. Additionally, machine learning potentials are trained on data derived from first-principles calculations; therefore, their ability to inherently discover entirely new materials may be limited. Therefore, there remains a need for a first-principles-based database that spans both simple and structurally complex materials.
A first-principles database of anharmonic phonon properties is valuable not only for predicting thermal behaviors of materials but also for understanding a wide range of other material properties. Phonons interact with various particles and excitations—such as electrons33, magnons34,35, photons36,37, plasmons38, and polaritons39,40—affecting39,40 mechanical, electrical, electronic, optical, and magnetic properties. This highlights the importance of detailed phonon-property datasets that comprehensively capture vibrational properties of solids, particularly describing anharmonic phonon properties based on theoretical calculations using consistent computational approaches/parameters. Such a database will offer critical insights into diverse material behaviors and accelerate the discovery of novel functional materials.
First-principles approaches for calculating anharmonic phonon properties in condensed materials have been actively pursued for many years, triggered by the development of computational methods using DFT around 201041,42,43. In standard first-principles phonon analysis, three-phonon scattering rates are evaluated via quantum perturbative theories under the relaxation time approximation44,45,46,47 to solve the Boltzmann transport equation (BTE)48. This approach has been widely applied and has become a rigorous and foundational numerical application for understanding and predicting thermal transport in materials. Building on this framework, a variety of methods have been developed or integrated into computational packages to enhance the accuracy of phonon property calculations, particularly for systems with extreme thermal transport behaviors. Iterative46,49,50 and direct51,52 solutions to the BTE offer improved treatment of phonon-phonon interactions by considering the effects of both normal and Umklapp scattering rates, whereas the relaxation time approximation considers only Umklapp scattering as resistive. Furthermore, four-phonon interactions53,54 in non-metallic systems have been shown to play a significant role in determining their thermal transport behaviors.
At finite temperatures, phonon renormalization modifies harmonic force constants, a process that can be accounted for using first-order self-consistent phonon theory55,56,57 and its improved variant incorporating the bubble self-energy corrections58. The phonon gas model, which treats phonons as heat-carrying particles that scatter and propagate like molecules in a gas, is extended by the unified phonon theory—also known as the Wigner heat transport formulation59––, which provides a framework for analyzing phonon transport in both the particle (Peierls transport) and wave (coherent transport) pictures.
In addition to phonon-phonon interactions, other scattering mechanisms and intrinsic factors can also play a significant role in thermal transport. Electron-phonon interactions can be accurately analyzed using first-principles methods60,61,62. Weak and strong impurity scatterings can be effectively treated using the perturbative24 or T-matrix approaches63,64, respectively. Additionally, intrinsic structural fluctuations at finite temperatures, particularly in complex compounds, can be captured through a combination of cluster expansion and Monte Carlo simulations22,23,65. Although this current study employs a fundamental approach based on three-phonon interactions within the relaxation time approximation, the resulting data provide a solid foundation for advanced calculations including more complex scattering effects.
With the advancement of computational methods, the development of thermofunctional materials has been accelerated through the integration of informatics and data science. Early studies in this field employed high-throughput calculations with simplified models to identify materials with Peierls lattice thermal conductivities \({\kappa }_{{\rm{p}}}\approx 1.0\,[{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}]\)66, and Bayesian optimization techniques were used to discover materials with \({\kappa }_{{\rm{p}}} < 0.5\,[{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}]\)67. However, access to anharmonic phonon property data remains limited. To circumvent this, researchers have used harmonic phonon properties68 and other material descriptors69, focusing on specific materials such as half-Heuslers70 and chalcogenides71, and have developed thermal conductivity databases based on approximations72, including the Callaway model73 and minimum thermal conductivity model74.
In parallel, various techniques have emerged to estimate higher-order force constants at a practical computational cost as the number of displacement patterns required by the finite-displacement method increases rapidly with the order of the force constants. Approaches such as compressive sensing56,75, projector-based methods for constructing orthonormal basis sets76, and machine learning potentials77,78 have been explored. Furthermore, fine-tuned models79 derived from foundation models80 have demonstrated improved accuracy. In addition to force constants, the analysis of high-order anharmonic phonon properties—such as four-phonon scattering and phonon renormalization—remains computationally intensive81. To address this, machine learning approaches have been introduced, including transfer learning to estimate four-phonon scattering rates using three-phonon scattering data82.
Driven by this need for a first-principles-based anharmonic phonon property database and building on recent advancements in phonon analysis, we developed an automated computational framework for first-principles phonon calculations that streamlines the workflow and reduces computational complexity. Using this framework, we constructed a large-scale database comprising anharmonic phonon properties for over 6500 materials, systematically capturing phonon transport characteristics across a wide range of material classes. Leveraging this dataset, we applied machine learning techniques to predict key anharmonic phonon properties, including Peierls lattice thermal conductivity and its spectral distribution. This integrated approach not only deepens our understanding of anharmonic phonon behavior but also accelerates the data-driven discovery of novel functional materials across various application domains.
Results
Automation of anharmonic phonon analysis
We developed automation software named “auto-kappa” (https://github.com/masato1122/auto-kappa) for performing first-principles anharmonic phonon calculations. Given the complexity of phonon analysis, the software automatically addresses key challenges, including precise structure optimization to minimize residual stress and procedures to eliminate imaginary frequencies associated with unstable phonon modes. Specifically, these include structure optimization using an equation of state and increasing the supercell size for force calculations. The automated workflow for anharmonic phonon calculations is summarized in Fig. 1a, with detailed computational procedures described in the Methods section. While remarkable efforts have been made to automate similar processes for analyzing anharmonic phonon properties83,84,85, several challenges were still encountered in the high-throughput calculations of this study. These included the need for automatic adjustment of VASP and ALAMODE parameters (e.g., the cutoff length for force constants and the treatment of the non-analytical correction), job parameters (e.g., the number of parallel processes and the type of parallelization), and the complexity of obtaining relaxed structures, as illustrated in Fig. 1a. To overcome these challenges, we enhanced auto-kappa to resolve them automatically. Using the developed software, we have calculated the Peierls lattice thermal conductivity (\({\kappa }_{{\rm{p}}}\)) based on the relaxation time approximation as well as the coherence lattice thermal conductivity (\({\kappa }_{{\rm{c}}}\)). Although the software includes an implementation of the self-consistent phonon approach to account for phonon renormalization, the dataset used in this study was generated using the conventional method based on three-phonon interactions within the relaxation time approximation.
a Automated workflow implemented in the developed software, auto-kappa. b Example output generated by auto-kappa for rock salt NaCl (mp-22862). The results include phonon dispersion with participation ratio and DOS, representative atomic distances for force constants (FCs), temperature- and grain-size-dependent thermal conductivity, mode-dependent phonon scattering rates and lifetimes, spectral and cumulative thermal conductivity as functions of mean free path and frequency, and Grüneisen parameters. In addition, a computational time chart, thermodynamic properties, and various text files—such as displacement–force datasets, force constants, and input/output scripts for simulations—are generated.
Using the developed software, we constructed Phonix, a database of anharmonic phonon interactions, comprising 6641 materials and 7342 data points, including 701 duplicate material entries. The name Phonix highlights its broader scope, extending beyond the phonon–phonon interactions examined here to future coverage of interactions with diverse quasiparticles and nanostructures. The database includes input files, intermediate data, output results, and generated figures—as illustrated in Fig. 1b. These comprise: phonon dispersion with participation ratio, density of states (DOS), the relationship between force constants and their representative atomic distance (maximum distance among corresponding atoms for the anharmonic case), temperature- and grain-size-dependent thermal conductivity, mode-dependent phonon scattering rates and lifetimes, cumulative and spectral thermal conductivity as a function of mean free path and frequency, Grüneisen parameters, thermodynamic properties (temperature-dependent specific heat, entropy, and internal and free energies), harmonic and anharmonic force constants, displacement-force datasets used for calculating the force constants, and input/output scripts for first-principles (VASP86) and phonon (ALAMODE45) calculations. Naturally, for materials exhibiting imaginary frequencies, only harmonic property data are included. The target materials in this study consist of all entries from the Phonondb dataset (version 2018-04-07), comprising 10,034 materials, and non-metallic, non-magnetic materials from the Materials Project (version 2022.10.28), comprising 11,418 materials after excluding overlaps with Phonondb. In total, the dataset includes 21,452 unique materials. Although the full phonon analysis has not yet been completed for every material—primarily due to the high computational cost associated with rigorous structural optimization and the use of larger supercells (see Methods for details)—we have successfully calculated anharmonic phonon properties for over 6500 materials. While we have also obtained a significantly larger set of harmonic phonon properties, including those for materials with unstable phonon modes with imaginary frequencies, this study focuses exclusively on the anharmonic phonon properties, which represent the more compelling aspect of our database. The complete database will be made available on ARIM-mdx87. We would also like to emphasize that the database released with this paper represents only the first version, and we are continuously working to improve both the quality and quantity of the data.
Database analysis
First, we analyzed the crystal structures of the materials for which anharmonic phonon properties were computed. As shown in Fig. 2a, the dataset encompasses a wide range of materials. Among the most populated space groups, as shown at the top of Fig. 2a, space group 14 includes quartz-like structures such as SiO₂; space group 62 includes the anatase phase of TiO₂, commonly used as a photocatalyst; space group 166 contains well-known topological insulators and thermoelectric materials like Bi₂Te₃ and Bi₂Se₃; and space group 225 comprises rock salt structures such as NaCl, PbTe, and PbSe.
a Distribution of space groups and crystal systems (top), and the number of atoms in the primitive cell (bottom) for the crystal structures in the database. b Relationship between lattice thermal conductivity (\({\kappa }_{{\rm{lat}}}\)) and volume per atom, along with the distribution of \({\kappa }_{{\rm{lat}}}\) at 300 K. c Comparison of the Peierls (\({\kappa }_{{\rm{p}}}\)) and coherence (\({\kappa }_{{\rm{c}}}\)) contributions to \({\kappa }_{{\rm{lat}}}\). Solid and dotted lines represent \({\kappa }_{c}={\kappa }_{p}\) and \({\kappa }_{c}=0.1\times {\kappa }_{p}\), respectively.
Although the current dataset is limited to non-metallic and non-magnetic materials, it is not constrained by the size of the primitive cell, as shown in the bottom panel of Fig. 2a. Some materials include more than 100 atoms, with the maximum reaching 160 atoms. Among these, five out of seven materials with the highest atom counts belong to space group 62. However, most materials in the database contain fewer than 30 atoms, with half containing fewer than 16 atoms.
Regarding elemental diversity, the Phonix materials contain elements from a broad range of groups, as shown in Supplementary Fig. S1. While transition metals appear less frequently—likely due to the exclusion of magnetic materials in the current version of Phonix—and group 18 elements are present only as single-element systems, all elements from periods 1 to 6 and groups 1 to 17 of the periodic table, except for Po and At, are represented in the Phonix materials. The broad diversity in space groups and structural complexity highlights the versatility of the database as a platform for exploring and developing a wide spectrum of inorganic materials. Notably, only 287 out of 6641 crystal structures (4.3%) satisfy the search criteria employed by the Microsoft database (MatterK31), while the specific materials contained in their database are not publicly accessible. We believe that both types of databases play complementary and essential roles: databases based on first-principles calculations are crucial for expanding our knowledge toward unexplored materials, while those based on machine learning potentials are important for interpolating data within the known materials space.
Subsequently, the distribution of thermal conductivity was analyzed. Throughout this paper, we used the thermal conductivity at 300 K obtained using the densest q-mesh in auto-kappa—1500 q-points⋅Å3/atom —for all discussions. Lattice thermal conductivity (\({\kappa }_{{\rm{lat}}}\)) generally decreases with increasing volume per atom (\({V}_{{\rm{atom}}}\))67. According to Phonix, \({\kappa }_{{\rm{lat}}}\), including both the Peierls (\({\kappa }_{{\rm{p}}}\)) and coherence (\({\kappa }_{{\rm{c}}}\)) contributions, at 300 K exhibited the following relationship: \({\log }_{10}({\kappa }_{{\rm{lat}}})\propto \alpha {\log }_{10}({V}_{{\rm{atom}}})\), where the coefficient \(\alpha\) was estimated to be −1.89, as illustrated in Fig. 2b. The average \({\kappa }_{{\rm{lat}}}\) at 300 K was 2.4 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), as shown in Fig. 2b. Half of the materials exhibited κlat(300 K) values between 0.95 and 6.3 Wm−1K−1, while 95% fell within the range of 0.15 to 39 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\). Among the high-thermal-conductivity (high-κ) materials, 0.17% (11 materials) exhibited \({\kappa }_{\mathrm{lat}} > 1000\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), 0.38% (25) exceeded 500 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), and 1.02% (67) exceeded 200 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), as listed in Supplementary Table S1. In the list of calculated materials exhibiting \({\kappa }_{{\rm{lat}}} > 200\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), shown in Supplementary Fig. S2, the majority (28 out of 67) were polymorphs of carbon or SiC. Meanwhile, to the best of our knowledge, the following materials, including their polymorphs, have not been synthesized experimentally and have rarely been discussed as high-κ materials: triclinic Hg(BiS2)2 (ID: mp-554921, space group: 12, \({\kappa }_{p,\{{xx},{yy},{zz}\}}=\mathrm{292,2.5}\), and 943 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), cubic HC (mp-1079612, 199, \({\kappa }_{p,{ave}}=306\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), cubic BiB (mp-1006880, 216, \({\kappa }_{p,{ave}}=235\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), and trigonal CsHoS2 (mp-505158, 166, \({\kappa }_{p,\{{zz},{xx}\left({yy}\right)\}}=22\) and 657 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)). It is also noteworthy that the triclinic Hg(BiS2)2 and trigonal CsHoS2 exhibit highly anisotropic heat conduction with \({\kappa }_{p,{zz}}/{\kappa }_{p,{yy}}=392\) and \({\kappa }_{p,{xx}\left({yy}\right)}/{\kappa }_{p,{zz}}=30\), respectively. The value of Hg(BiS2)2 is comparable to—or even exceeds—that of graphite88. On the other hand, among low-κ materials, 0.23% (15 materials) exhibited \({\kappa }_{{\rm{lat}}} < 0.1\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) (see Supplementary Fig. S3), 15% (966) exhibited \({\kappa }_{{\rm{lat}}} < 0.5\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), 28% (1815) exhibited \({\kappa }_{{\rm{lat}}} < 1.0\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), and 71% (4685) exhibited \({\kappa }_{{\rm{lat}}} < 5.0\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\). Considering that finding materials with \({\kappa }_{{\rm{p}}}\approx 0.5\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) was challenging in pioneering studies67, the obtained dataset provides a significant amount of information on low-κ materials. While phonon renormalization and four-phonon scattering should be considered for accurately calculating small \({\kappa }_{{\rm{lat}}}\), this analysis suggests that identifying low-κ materials may be relatively easier than finding high-κ materials, which remains a greater challenge.
Moreover, it is insightful to compare the Peierls and coherent contributions to the total lattice thermal conductivity. In most materials, particularly high-κ materials, the coherent contribution is smaller than the Peierls contribution or sometimes even negligible. However, we observed that a considerable number of materials exhibited a significant coherent contribution: \({\kappa }_{{\rm{c}}}\ge {\kappa }_{{\rm{p}}}\) in 8.4% of materials (purple regions in the top and bottom panels of Fig. 2c, bounded by solid lines), and \({\kappa }_{{\rm{c}}}\ge 0.1\times {\kappa }_{{\rm{p}}}\) in 50%, nearly half of the dataset (bluish regions, bounded by dotted lines). While the relative contribution of the coherent component is known to have a significant effect when the Peierls contribution is small, a large \({\kappa }_{{\rm{c}}}\) was obtained for SiC polymorphs, which are located in the top-right corner (\({\kappa }_{{\rm{p}}}\approx 200\) to 500 and \({\kappa }_{{\rm{c}}} > 10\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)) of the bottom panel of Fig. 2c. Although the relative contribution of \({\kappa }_{{\rm{c}}}\) remains small compared to the Peierls conductivity, it is interesting that high-κ materials, SiC89,90,91, may exhibit a large coherent phonon conductivity (\(> 10\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) and up to 60 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) at 300 K). Since SiC has more than 200 polymorphs92, and some of them contain a substantial number of atoms (>50), the densely packed phonon branches resulting from the large number of atoms lead to a large κc, as shown in Supplementary Fig. S4. The developed database contains 15 polymorphs of SiC, among which Si36C36 exhibits the highest \({\kappa }_{{\rm{c}}}\) of 65 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), while its \({\kappa }_{{\rm{p}}}\) reaches \({\kappa }_{p,{xx}\left({yy}\right)}=305\) and \({\kappa }_{p,{zz}}=11\) \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\).
Computational accuracy
To assess computational validity, we compared the results obtained in this study with those in Phonondb16 as well as with experimental thermal conductivity data. As shown in Supplementary Fig. S5, the phonon dispersions calculated in this work exhibit excellent agreement with those reported in Phonondb. The remaining discrepancies are likely attributable to differences in the relaxed structures, particularly the lattice constants. Overall, this comparison further supports the reliability of both datasets. In addition to harmonic properties, an anharmonic phonon property—namely, the lattice thermal conductivity at room temperature—was compared with experimental data for 103 single-crystal compounds. While calculated data deviate from experimental data for certain materials, calculated data overall show good agreement with the experimental results, as shown in Supplementary Fig. S6. To further reduce the discrepancies between computational and experimental values, additional factors should be considered in the simulations, including spin–orbit interaction, long-range interactions93,94,95, four-phonon scattering, and others. It is also worth noting that these discrepancies could potentially be reduced by employing machine-learned surrogate models, particularly in cases with substantial deviations, as informed by our experience.
Computational accuracy may be limited for materials with high lattice thermal conductivity and should be interpreted with caution, as the study prioritized generating a large dataset under constrained computational resources. The automated calculations occasionally produce excessively high thermal conductivity values—exceeding several thousand \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)—which appear to be unrealistic at this point. These overestimations typically arise from flat phonon bands or acoustic branches. In some instances, phonon modes on flat optical branches exhibit abnormally long lifetimes, while in others, low-frequency acoustic modes display either excessively long lifetimes or unusually high group velocities, as illustrated in Supplementary Fig. S7. To achieve more accurate thermal conductivity estimates, larger supercell sizes (up to 200 atoms) and/or denser q-point meshes are required. Another crucial factor is the inclusion of four-phonon interactions, which are expected to reduce the overestimated phonon lifetimes. Although the direct calculation of four-phonon scattering rates is computationally demanding, employing machine learning techniques to predict their effects82 represents a promising future direction for enhancing the database. In the subsequent machine learning analysis of anharmonic phonon properties, such implausible data have been excluded. Details regarding the computational accuracy of first-principles phonon analysis—including the effects of supercell size and the methods used to obtain force constants—are provided in Supplementary Section VIII.
Deep learning scaling law for anharmonic phonon properties
Using the database developed in this study, we conducted machine learning predictions for anharmonic phonon properties and investigated how prediction accuracy scales with data size14,96,97,98,99. Our database enables the machine learning prediction of spectral thermal conductivity, not merely scalar values such as \({\kappa }_{{\rm{lat}}}\) at room temperature (300 K). Since modal lattice thermal conductivity depends on mean free path (MFP) and phonon frequency, predicting spectral thermal conductivity is essential for evaluating the effects of nanostructuring100,101 and interactions with other particles and excitations, including electrons33, photons36,37, and magnons34,35. Here, we demonstrate predictions for Peierls thermal conductivity (\({\kappa }_{{\rm{p}}}\)) and cumulative Peierls thermal conductivity (\({\kappa }_{{\rm{cumul}}}\)) as functions of MFP (Λ) at 300 K. Additional examples of spectral thermal conductivity predictions as functions of frequency and the maximum phonon frequency are provided in the Supplementary Information (see Supplementary Fig. S10 and related discussion).
In this study, we employed the crystal graph convolutional neural network (CGCNN)102 to predict scalar quantities, such as thermal conductivity, and the Euclidean neural network (e3nn)103 to predict spectral functions. In CGCNN, atoms are represented by node features composed of one-hot encodings of nine atomic properties, including group number, period number, and electronegativity, while interatomic distances are encoded as discretized edge features. In e3nn, atomic species are represented by 118-dimensional mass-weighted one-hot vectors, and interatomic relations are described using relative position vectors. The e3nn framework incorporates the SE(3)-Transformer104—a state-of-the-art architecture for three-dimensional point clouds and graphs—which is equivariant under continuous 3D roto-translations and rigorously accounts for structural symmetries, including mirror (O(3)) and rotational (SO(3)) symmetries, both of which are crucial for phonon analysis. This method has recently been applied to the prediction of complex phonon properties, including DOS105 and phonon dispersion106,107. Further methodological details are provided in the Methods section.
By performing machine learning predictions for \({\kappa }_{{\rm{p}}}\) and normalized \({\kappa }_{{\rm{cumul}}}({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}(\Lambda ))\) at 300 K using various training dataset sizes (\({N}_{{\rm{train}}}\)), we observed clear scaling behavior with respect to data size, as shown in the left panels of Fig. 3a, b. These results clearly demonstrate the enhancement in prediction accuracy enabled by our database. The relationship between mean absolute error (MAE) and \({N}_{{\rm{train}}}\) was fitted using the empirical formula97: \(\left({\rm{error}}\right)={\left({N}_{c}/{N}_{{\rm{train}}}\right)}^{\alpha }\,({N}_{c},\,\alpha > 0)\), where \({N}_{C}\) is a constant and \(\alpha\) is the scaling factor indicating how effectively increased data improves predictive accuracy. The scaling factors were 0.17 for \({\kappa }_{p}\) and 0.14 for \({\kappa }_{{\rm{cumul}}}\), as shown in Fig. 3a, b, and ranged from 0.075 to 0.28 for other properties, as illustrated in Supplementary Fig. S6. These values are comparable to those for large language models (0.095)97 and force prediction tasks in crystalline materials (0.21)14 (see Supplementary Fig. S6e). As the database continues to expand, the predictive accuracy of surrogate models for large-scale materials screening is expected to improve further. For example, according to the fitted scaling law, the MAE for \({\log }_{10}{\kappa }_{{\rm{p}}}\) is expected to decrease to 0.15 as the training dataset size approaches \(2.5\times {10}^{5}\). Nevertheless, brute-force calculations of anharmonic phonon properties for 105-order materials—particularly including higher-order effects such as four-phonon scattering and phonon renormalization—remain impractical. Therefore, further expansion of the database will require machine learning-based acceleration methods, such as machine learning potentials31,77, to facilitate the efficient evaluation of phonon properties76,82,108.
a Peierls thermal conductivity (\({\kappa }_{{\rm{p}}}\)) and (b, c) normalized cumulative Peierls thermal conductivity (\({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\)) were predicted using graph neural networks. The left panels in (a) and (b) show the reduction of mean absolute error (MAE) with increasing data size, demonstrating clear scaling behavior. MAEs were evaluated using \({\log }_{10}{\kappa }_{{\rm{p}}}\) and \({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\), respectively. The fitted scaling curve is shown as a grey line, with the corresponding equation displayed at the bottom of each panel. Error bars represent the 90% confidence interval based on 50 ensembles. The right panels in (a) and (b) show prediction examples at different data sizes, selected based on MAE values closest to the ensemble average. In panel (a), blue, green, and red markers represent training, validation, and test data, respectively. In panel (b), colored lines indicate predicted results, while grey lines show data from first-principles calculations. c Prediction results for \({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\) using the entire dataset (\({N}_{\mathrm{all}}\approx 5000\)). The left panel presents the MAE distribution (dotted line) and its cumulative sum (solid line), color-coded by quartile. The right panels display multiple examples of predicted \({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\) curves; colored lines indicate predictions, and black lines represent reference calculations.
The right panels in Fig. 3a, b show representative test cases selected from 50 ensembles for each data size, chosen as those with MAE values closest to the average for the corresponding condition. For instance, when \({N}_{\mathrm{all}}\approx 1000\), where \({N}_{{\rm{all}}}\) denotes the total number of data points used for training, validation, and testing, the average MAE of \({\log }_{10}{\kappa }_{{\rm{p}}}\) was 0.37, as shown in the left panel of Fig. 3a. The middle panel on the right side of Fig. 3a displays a representative case with an MAE of 0.378. The prediction results in the right panel of Fig. 3a clearly demonstrate that the predicted data points cluster more closely around the parity line as \({N}_{{\rm{all}}}\) increases. Similarly, the right panel of Fig. 3b shows that the fluctuations in the predicted curve are reduced with increasing \({N}_{{\rm{all}}}\), and the predicted trend aligns more closely with the first-principles results (grey line) for larger datasets.
The exceptional predictive performance for \({\kappa }_{{\rm{cumul}}}\) is emphasized in Fig. 3c. As shown in the left panel, 50% (75%) of the test data yielded an MAE for \({\log }_{10}{\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\) below 0.05 (0.08). This panel illustrates the MAE distribution, while the right panels provide prediction examples for individual materials. In the right panel, 50% of the predicted curves exhibit excellent agreement (green and blue regions) with the first-principles results (black line), while 75% demonstrate good agreement (orange region). Even for the final group, where MAE exceeds 0.08 (red region), although the initial value of κcumul—i.e., the \({\kappa }_{{\rm{p}}}\) contribution from phonons with MFPs shorter than 1 nm—shows a discrepancy, the MFP range where \({\kappa }_{{\rm{cumul}}}\) begins to increase remains reasonably well predicted.
Screening using the Phonix database
Using prediction models developed from our database, we screened materials with high and low thermal conductivity from the GNoME database14, which contains 381,000 novel crystal structures. The Peierls thermal conductivity (\({\kappa }_{{\rm{p}}}\)) for all materials was evaluated as the average of 20 ensemble predictions. Magnetic materials, including those containing transition metals, were included in the screening. Although magnetic effects can affect lattice thermal conductivity in three-dimensional systems with Curie temperatures close to room temperature109, they are generally secondary to phonon–phonon scattering because of the abundance and strength of phonon–phonon interactions95,110. Each model was trained on 3000 anharmonic phonon data points, divided into 2400 for training, 300 for validation, and 300 for testing. Following the screening, phonon properties, including \({\kappa }_{{\rm{p}}}\), were computed for 169 selected materials (148 with the highest \({\kappa }_{{\rm{p}}}\) and 21 with the lowest) using the auto-kappa workflow.
An analysis of the validation results for the screened materials revealed several insights regarding prediction accuracy, as shown in Fig. 4a. The predicted \({\kappa }_{{\rm{p}}}\) values for low-thermal-conductivity materials in the GNoME database showed accuracy comparable to that of the full dataset (MAE: 0.27 for \({\log }_{10}{\kappa }_{{\rm{p}}}\)), with low variability in the predictions, as illustrated in Fig. 3a. In contrast, the prediction accuracy for high-κp materials was notably lower (MAE: 0.68), and the predictions exhibited greater variability. Although definitive conclusions are limited by the relatively small number of computed data points, these results suggest that high-κ predictions are more challenging. From a machine learning standpoint, this difficulty likely stems from the simpler structural characteristics of high-κ materials, which typically contain fewer atoms and atomic species in their primitive cells. Consequently, these materials offer less structural information for learning compared to low-κ materials, which often have complex frameworks, such as skutterudites and clathrates23,111,112. Predicting material properties from such sparse structural information is inherently more difficult. From a physical perspective, accurately estimating high-κ values demands rigorous treatment of anharmonic phonon interactions and highly converged computational parameters, such as dense q-point meshes, since even small errors in force constants can significantly impact the results. Nonetheless, the predicted candidates remain promising for high-κ applications.
a Parity plot comparing predicted and calculated values of Peierls thermal conductivity (\({\kappa }_{{\rm{p}}}\)). Blue and red markers represent materials predicted to exhibit high and low thermal conductivity, respectively, using models trained on the constructed the Phonix database. Error bars indicate the 90% confidence interval from 20 ensemble predictions. The solid line denotes the parity line. b and (c) display135 crystal structures with \({\kappa }_{{\rm{p}}}^{3{\rm{ph}}} > 200\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) and the four lowest-\(\kappa\) structures. For each material, the chemical formula, space group number (in parentheses), and GNoME database ID are provided (d) and (e) present phonon properties of hexagonal NpPH and trigonal Cs6Rb2SnPbI12, which exhibit the highest (\({\kappa }_{{\rm{lat}}}^{3{\rm{ph}}(3+4{\rm{ph}})}\approx 280\,(80){\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)) and lowest (\({\kappa }_{{\rm{lat}}}\approx 0.15\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)) lattice thermal conductivities (\({\kappa }_{{\rm{lat}}}={\kappa }_{{\rm{p}}}+{\kappa }_{{\rm{c}}}\)), respectively. In the case of high-κ materials, both three-phonon (3 ph) and four-phonon (4 ph) scattering were taken into account. The panels include phonon dispersion, total and partial DOS, phonon lifetime (\(\tau\)), spectral (green) and cumulative (blue) Peierls thermal conductivity for each, as well as labels such as the chemical formula, space group (in parentheses), material ID, and lattice thermal conductivities (\({\kappa }_{p}\) and \({\kappa }_{c}\)) along different directions in units of \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\). While the maximum phonon frequency of the high-\(\kappa\) material in (d) exceeds 1000 cm−1, properties are shown up to 400 cm−1. Full-range phonon properties are available in Supplementary Fig. S11. Spectral and cumulative thermal conductivity are normalized by the maximum and total Peierls conductivities, respectively.
By screening materials with high and low κ, we identified three compounds with \({\kappa }_{{\rm{lat}}}^{3{\rm{ph}}} > 200\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) and nine with \({\kappa }_{{\rm{lat}}}^{3{\rm{ph}}} < 0.2\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\), as shown in Supplementary Fig. S7, where the superscript “3 ph” denotes three phonon scattering. Among the predicted materials, the highest and lowest calculated lattice thermal conductivities (\({\kappa }_{{\rm{lat}}}^{3{\rm{ph}}}={\kappa }_{{\rm{p}}}^{3{\rm{ph}}}+{\kappa }_{{\rm{c}}}^{3{\rm{ph}}}\)) were 284 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) for the \({xx}\) and \({yy}\) components of hexagonal NpPH, and 0.14 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) for trigonal Cs6Rb2SnPbI12, respectively, where \(\left({\kappa }_{{\rm{p}},\{{xx}/{yy}\}},{\kappa }_{{\rm{c}}}\right)=(0.031,\,0.11)\) \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\). Although we did not find materials that surpassed known record values, the results highlight the potential for future discovery of record-breaking compounds. Importantly, the identified candidates offer valuable insights into the structural and compositional characteristics of both high- and low-κ materials. Discovering materials at the extremes of thermal conductivity is inherently challenging, as machine learning models typically excel at interpolation but struggle with extrapolation113,114,115. Therefore, further advancement in automated high-throughput calculations will be critical for identifying such extreme materials in future studies.
In the three-phonon calculations, high thermal conductivity values (\(\gtrsim 200{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)) were observed in hydrogen-containing hexagonal ternary compounds belonging to space group 194 (\(P{6}_{3}/{mmc}\)), such as NpPH (\({\kappa }_{p,{zz}}^{3{\rm{ph}}}=172\), \({\kappa }_{p,{xx}/{yy}}^{3{\rm{ph}}}=277\), \({\kappa }_{c}^{3{\rm{ph}}}=6.9\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), PaPH (\({\kappa }_{p,{zz}}^{3{\rm{ph}}}=173\), \({\kappa }_{p,{xx}/{yy}}^{3{\rm{ph}}}=264\), \({\kappa }_{c}=0.0037\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), and PuHS (\({\kappa }_{p,{xx}/{yy}/{zz}}=216\), \({\kappa }_{c}=0.012\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), as shown in Fig. 4b and Supplementary Fig. S11a. When four-phonon scattering is taken into account, the thermal conductivity is reduced to \({\kappa }_{p,{xx}/{yy}}^{3+4{\rm{ph}}}=\mathrm{78,59}\), and 51 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\) for NpPH, PaPH, and PuHS, respectively. The origin of their relatively high thermal conductivity nevertheless remains to be elucidated. These materials are characterized by heavy atoms surrounded by light atoms, including hydrogen. The phonon dispersion and DOS in Fig. 4d clearly show that phonon modes associated with heavy atoms (Np) and those associated with light atoms (P and H) are completely separated into different frequency ranges: modes of heavy atoms appearing at low frequencies (\(< 200\,{{\rm{cm}}}^{-1}\)) and those of light atoms appearing at high frequencies. This complete separation of phonon modes by different atomic species in energy space is expected to suppress anharmonic interactions between phonon modes within their respective frequency ranges, similar to other high-κ materials such as BAs116,117. Consequently, the phonon lifetimes of acoustic modes primarily composed of heavy atoms remain long, contributing dominantly to the overall heat transport, as shown in the last two panels of Fig. 4d. In contrast, the crystal structures of low-κ materials are significantly more complex, as illustrated by the examples in Fig. 4c such as Cs6Rb2SnPBI12 (\({\kappa }_{p,{zz}}=0.049\), \({\kappa }_{p,{xx}/{yy}}=0.032\), \({\kappa }_{c}=0.11\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), CsAgS6 (\({\kappa }_{p,{xx}/{yy}/{zz}}=0.013\), \({\kappa }_{c}=0.141\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), K3AgSe13 (\({\kappa }_{p,{xx}/{zz}}=0.030\), \({\kappa }_{p,{yy}}=0.048\), \({\kappa }_{c}=0.17\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), and Cs6K2SnPbI12 (\({\kappa }_{p,{zz}}=0.057\), \({\kappa }_{p,{xx}/{yy}}=0.039\), \({\kappa }_{c}=0.11\,{\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)). Notably, six of the nine discovered low-κ materials contain cesium, whose alloy (α-CsPbBr₃) is known for its intrinsically low thermal conductivity51. In these low-κ materials, phonon modes—formed by a mixture of atomic species—are distributed across a wide frequency range, as illustrated in Fig. 4e and Supplementary Fig. S11b, in stark contrast to the more localized mode behavior seen in high-κ materials. Although several attempts have been made to synthesize related materials, including actinide hydrides118,119,120, the compounds identified in this screening—particularly those with high κlat—may present significant challenges for experimental synthesis. Nevertheless, the above discussion provides concrete insight into the synthesis of highly thermally conductive materials. For example, realizing similar phenomena with transition metals, rather than actinides, could enable high thermal conductivity in compounds that are more amenable to experimental synthesis.
Discussion
In conclusion, we developed an automated software package, auto-kappa, and constructed a large-scale first-principles database for anharmonic phonon interactions (Phonix), encompassing more than 6500 materials with diverse crystal structures. Using this database, we demonstrated a clear scaling law linking dataset size to predictive performance for key anharmonic phonon properties, including lattice and spectral thermal conductivities. Furthermore, by screening a vast crystal structure database, we identified promising candidates for both high and low thermal conductivity applications. Although future improvements—such as the inclusion of higher-order anharmonic effects like four-phonon scattering and phonon renormalization—are necessary for more accurate assessments, this study establishes a strong foundation for data-driven discovery of thermofunctional materials with wide-ranging technological relevance, including applications in superconductivity, spintronics, and beyond.
Methods
Automated workflow for anharmonic phonon calculations
Phonon calculations based on first-principles methods involve a considerably more complex workflow than typical calculations of total energy, electronic band structures, or electronic conductivity within the constant relaxation time approximation. To facilitate the construction of an anharmonic phonon property database, we developed auto-kappa, a Python-based automation software for first-principles analysis of anharmonic phonon properties. Auto-kappa streamlines the intricate workflow—illustrated in Fig. 1a—for computing anharmonic phonon properties by integrating the Vienna Ab Initio Simulation Package (VASP)86 for electronic structure calculations and the phonon analysis software ALAMODE45.
Through automated calculations, the auto-kappa software utilizes various existing libraries and packages in addition to VASP (≥6.3.2) and ALAMODE (versions 1.4–1.5). Crystal structures were handled using the Atomic Simulation Environment (ASE)121 (≥3.22) and Pymatgen7 (≥2023.8.10). Symmetry operations were performed using Spglib122 (≥2.3.1), Pymatgen7, and modules from Phonopy52,123 (≥2.20). VASP calculations, including input file generation and job submission, were managed using ASE and the Custodian package7 (≥2023.10.9). The phonon dispersion path was determined using the SeeK-path library122,124.
The integration of various libraries—such as those listed above—enables researchers to perform first-principles phonon calculations with significantly reduced manual effort. Using auto-kappa, the database was generated through the following procedure, which follows the workflow illustrated in Fig. 1a.
Step 1: Symmetry analysis of the crystal structure
The primitive, conventional, and supercells of the input crystal structure were first determined. The conventional cell was selected to have a compact shape while maintaining resemblance to a regular hexahedron. The supercell was then generated from the conventional cell, with a target of maximizing the number of atoms (up to a limit of 150 atoms) while maintaining geometric similarity to a regular hexahedron. The resulting supercell was used for force calculations required for both harmonic and cubic force constants—steps iv and vi, respectively. However, when imaginary frequencies appeared, larger supercells were employed specifically for the harmonic force constant calculations.
Step 2: Structure optimization
The accurate calculation of atomic forces using supercells in a later step is crucial for obtaining a reliable phonon analysis. Therefore, the shape and atomic positions in the crystal structure were carefully optimized through a rigorous procedure. Although both primitive and conventional cells can be used for this purpose, we chose the conventional cell to ensure consistency in the basis wavefunctions with those used in the supercell-based phonon calculations. While the primitive cell offers computational efficiency and better symmetry preservation, the conventional cell provides a more consistent basis set across all simulation steps.
The structure optimization was performed in three steps: two successive full relaxations—allowing for optimization of both the cell shape/volume and atomic positions—followed by a final atomic relaxation with the cell shape and volume fixed. Because changes in the cell can affect the optimal basis set of wavefunctions, performing two full relaxations helps mitigate the impact of basis fluctuations. Once the cell shape and size were determined, the atomic positions were further relaxed in a single-step calculation.
Step 3: Calculation of Born effective charges
The Born effective charges were calculated using a first-principles approach to apply non-analytical corrections in subsequent phonon analyses. For harmonic phonon properties, such as phonon dispersion and DOS, the non-analytic correction was initially applied using the mixed-space approach125. This correction primarily affects the splitting between longitudinal optical (LO) and transverse optical (TO) modes (LO–TO splitting), but in some cases, it also influences the phonon stability of certain materials. When imaginary phonon frequencies were observed, the method for applying the non-analytic correction was modified—first by using the damping method126 and, if necessary, switching to the Ewald method127.
Step 4: Calculation of harmonic force constants
Harmonic interatomic force constants were calculated using the finite-displacement method (also known as the brute-force method), in which atomic displacement patterns were generated in a supercell, and the resulting atomic forces were computed for each pattern. For these calculations, a single atom was displaced within the supercell, and the displacement patterns were determined based on crystal symmetry. The number of displacement patterns required for harmonic force constants is relatively small compared to those needed for higher-order force constants, allowing the finite-displacement method to be directly applied. The displacement magnitude was set to a small value (0.01 Å) to minimize the influence of anharmonic effects. Harmonic force constants were then obtained using a least-squares fitting procedure. If the fitting error exceeded 10%, the data were excluded from the analyses presented in this paper. No cutoff distance was imposed on the harmonic force constants in order to account for all possible atomic interactions within the supercell.
To ensure accurate force calculations within the first-principles framework, it is important to evaluate the nonlocal part of the pseudopotential in reciprocal space rather than in real space. While using projector operators in real space can reduce computational cost for large supercells, it introduces aliasing errors due to wavefunction projection. Therefore, in our developed software, projector operators are consistently evaluated in reciprocal space by setting ‘LREAL=FALSE’ in the VASP calculations.
Step 5: Analysis for harmonic phonon properties
Using harmonic force constants, harmonic phonon properties—including phonon dispersion and DOS—were calculated. As described in the section on the Born effective charge, different approaches were applied to include non-analytic corrections when necessary to eliminate imaginary frequencies. For the DOS calculation, the reciprocal space mesh density for the phonon wavevector (q-mesh) was set to 1500 q-points per reciprocal atom (q-points Å3/atom). For example, the q-mesh for diamond-structured silicon was set to 21 × 21 × 21.
Step 6: Calculation of cubic force constants
If the structure exhibited no imaginary frequencies, the calculation of cubic force constants was performed following the harmonic phonon property analysis. To obtain cubic force constants, the finite-displacement method typically requires a significantly larger number of displacement patterns—on average, ~100 times more than those needed for harmonic force constants. Therefore, a cutoff distance was imposed for the cubic force constants, which was set to the larger of 4.3 Å and the third-shortest interatomic distance. Additionally, while the finite-displacement and least-squares methods were used when the number of required displacement patterns was below a predefined threshold (set to 100 patterns), the least absolute shrinkage and selection operator (LASSO) regression128 was employed to estimate cubic force constants from randomly generated displacement patterns. The harmonic force constants were fixed to the values obtained from the previous calculation (step iv) during the LASSO regression. If the fitting error for the least-squares method or the residual force for the LASSO regression exceeded 10%, the data were excluded from the discussion, as was done for harmonic force constants.
The number of generated random displacement patterns was determined using the formula \({N}_{{\rm{pattern}}}^{{\rm{rand}}}=\alpha {N}_{{\rm{FC}}3}/{N}_{{\rm{atom}}}^{{\rm{sc}}}\), where \({N}_{{\rm{FC}}3}\) is the number of unique cubic force constants, \({N}_{{\rm{atom}}}^{{\rm{sc}}}\) is the number of atoms in the supercell, and \(\alpha\) is a coefficient greater than \(1/3\); in this study, it was set to \(1.0\). To generate a random displacement pattern, a random displacement was applied to each atom. The displacement magnitude for cubic calculations was set to 0.01 or 0.03 Å per atom for both the finite-displacement method and the LASSO approach, which is larger than the value used for harmonic calculations.
Step 7: Analysis for anharmonic phonon properties
Using the cubic force constants obtained in the previous step, we analyzed anharmonic phonon properties. To assess convergence with respect to the q-mesh size, the q-mesh density was varied from 500 to 1000 to 1500 q-points⋅Å3/ atom. The effect of three-phonon scattering was estimated by solving the phonon transport Boltzmann equation under the relaxation time approximation. Phonon scattering by natural isotopes was also considered and incorporated using Matthiessen’s rule. Finally, various anharmonic phonon properties were obtained, including mode-dependent lifetimes; spectral and cumulative thermal conductivities (\({\kappa }_{{\rm{spec}}}\) and \({\kappa }_{{\rm{cumul}}}\)) as functions of frequency and mean free path; and temperature-dependent thermal conductivities for both Peierls (\({\kappa }_{{\rm{p}}}\)) and coherence (\({\kappa }_{{\rm{c}}}\))59 contributions, as illustrated in Fig. 1b. For details, please refer to Section I of the Supplementary Information.
Step 8: Strict structure optimization
If imaginary frequencies were observed in the harmonic phonon analysis during process (iv), a strict structural optimization was performed. In this step, the volume of the crystal structure was modified by applying hydrostatic strain, and the corresponding structural energies were calculated. After evaluating energies at different volumes, the Birch-Murnaghan equation of state129,130 was used to determine the volume that minimized the structural energy. Once the newly optimized structure was obtained, the procedure was restarted from process (iii).
Step 9: Use of larger supercell for harmonic force constants
If the strictly optimized structure still exhibited imaginary frequencies, a larger supercell was used for calculating harmonic force constants. The maximum limit for this second harmonic force constant analysis was set to 200 atoms—an increase of 50 atoms from the original setting. If this step successfully eliminated imaginary frequencies, cubic force constants were then calculated. While a larger supercell was used for harmonic force constants in this case, the original supercell size (fewer than 150 atoms) was retained for estimating cubic force constants. The harmonic force constants obtained using the original supercell were kept fixed during the estimation of cubic force constants.
Step 10: Phonon renormalization
The process for phonon renormalization using self-consistent phonon (SCP) theory55,56 was also implemented in auto-kappa, although this process was not performed in the present study. Using the SCP approach, temperature-dependent effective harmonic force constants can be calculated by incorporating the effects of phonon renormalization due to the fourth-order potential. Phonon renormalization can eliminate imaginary frequencies in certain cases56,112, and should also be considered for accurately estimating low thermal conductivity.
Parameters for first-principles calculations
For all first-principles simulations described above, the following conditions were applied. The k-mesh was determined by \({N}_{i}=\max [1,\mathrm{int}({l}_{k}\cdot |{{\bf{b}}}_{i}|)]\), following the method recommended by VASP. Here, \({l}_{k}\) is a length scale that determines the number of subdivisions along each reciprocal lattice direction and is set to 20 Å, and \({{\bf{b}}}_{i}\) is the reciprocal lattice vector along the i direction (\(i={k}_{x},\,{k}_{y},{k}_{z}\)). The Γ-centered scheme was used to generate the k-mesh. The Perdew-Burke-Ernzerhof exchange-correlation functional revised for solids (PBEsol)131 with the projector augmented wave (PAW) potential132,133 was employed. The cutoff energy for VASP calculations was set to 1.3 times the recommended value provided in the VASP pseudopotential files.
Machine learning prediction of phonon properties
We employed the crystal graph convolutional neural network (CGCNN)102 to predict the Peierls conductivity (\({\kappa }_{{\rm{p}}}\)) and the graph neural network based on the Euclidean neural network (e3nn)103,105 to predict spectral functions and cumulative Peierls conductivity (\({\kappa }_{{\rm{cumul}}}\)) as a function of the phonon mean free path (\(\Lambda\)). In both graph neural network approaches, nodes and edges correspond to atoms and bonds within the crystal, respectively.
The node descriptors in CGCNN consist of a one-hot encodings of nine atomic properties, including group number, period number, electronegativity, and covalent radius, as also described in the main text. In contrast, the e3nn approach employs a simpler node descriptor: a 118-dimensional mass-weighted one-hot encoding based solely on atomic species and their masses. For edge descriptors, CGCNN utilizes a 10-dimensional encoding based on interatomic distances categorized into discrete intervals, whereas e3nn encodes edges using full three-dimensional relative position vectors between neighboring atoms, explicitly capturing both geometric and directional information. The cutoff bond lengths were set to 6.0 Å and 4.3 Å for CGCNN and e3nn, respectively.
Both graph neural networks employ multiple convolutional layers to update atomic features by aggregating local atomic environments. In CGCNN, three graph convolutional layers sequentially update node features using information from up to 12 nearest neighbors. A pooling layer aggregates atomic-level features into a global crystal representation, which is subsequently mapped to scalar material properties through fully connected layers. The e3nn approach utilizes convolutional layers constructed from spherical harmonics and learnable radial basis functions, designed to ensure equivariance under rotations, translations, and inversions. The network typically includes two equivariant convolutional layers followed by gated nonlinearity blocks tailored for tensorial data. After convolution and activation, atomic features are aggregated to form a global descriptor, which is directly mapped to continuous spectral functions, namely the cumulative (\({\kappa }_{{\rm{cumul}}}^{{\rm{norm}}}\)) and spectral (\({\kappa }_{{\rm{spec}}}^{{\rm{norm}}}\)) thermal conductivities.
The neural networks were trained using the Adam optimizer134. For CGCNN, the learning rate was set to 0.0001, and early stopping was applied with a patience of 50 epochs. While the prediction performance of CGCNN was relatively insensitive to hyperparameter choices, the hyperparameters for the e3nn approach—particularly the learning rate—were carefully tuned. The initial learning rate was set to \(5.0/{N}_{{\rm{all}}}\) and decayed by a factor of 0.95 per epoch until it reached a minimum of \(1.5/{N}_{{\rm{all}}}\), where \({N}_{{\rm{all}}}\) denotes the total number of data points, including training, validation, and test sets. Early stopping was applied with a patience of 100 epochs during e3nn training.
In both cases, the simulation dataset was split into training (80%), validation (10%), and test (10%) sets based on materials, as the Phonix database contains duplicated material entries. This material-based splitting was adopted to prevent data leakage in model training. The training data were used to develop the prediction model, while the validation data were used to tune hyperparameters and prevent overfitting. The test data were employed to evaluate the prediction error. The size of the simulation dataset was varied from 100 to the full dataset (~5000 samples), and 20 ensembles were generated to assess the fluctuation in prediction performance. Log scaling and normalization were applied to the target values for \({\kappa }_{{\rm{p}}}\) and \({\kappa }_{{\rm{cumul}}}(\Lambda )\), respectively. Therefore, if the absolute value of \({\kappa }_{{\rm{cumul}}}(\Lambda )\) is required, it can be reconstructed by combining the two predictions.
For the prediction of \({\kappa }_{{\rm{cumul}}}\), the data were prepared over a range from 1 nm to 100 µm, sampled at 51 logarithmically spaced points. The performance of the prediction model was evaluated using the mean absolute error (MAE). The MAE for each material was computed as \(|{\kappa }_{{\rm{p}}}^{{\rm{calc}}}-{\kappa }_{{\rm{p}}}^{{\rm{pred}}}|\) for \({\kappa }_{{\rm{p}}}\), and as \({\sum }_{\varLambda }|{\kappa }_{{\rm{cumul}}}^{{\rm{calc}}}\left(\varLambda \right)-{\kappa }_{{\rm{cumul}}}^{{\rm{pred}}}(\varLambda )|\) for \({\kappa }_{{\rm{cumul}}}(\varLambda )\), where the superscripts “calc” and “pred” refer to the calculated and predicted values, respectively. The final MAE was obtained by averaging over the entire test dataset. After calculating the MAE for various training data sizes (\({N}_{{\rm{train}}}\)), the scaling law was determined by fitting the relationship using the function (\({\rm{MAE}})={\left({N}_{{\rm{c}}}/{N}_{{\rm{train}}}\right)}^{\alpha }\,({N}_{c},\,\alpha > 0)\), where \({N}_{{\rm{c}}}\) is a constant and \(\alpha\) is the scaling factor indicating how efficiently increasing the data size improves prediction accuracy.
For data curation, we removed data with i) excessively high thermal conductivity (>2000 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), ii) both a large phonon gap (>10 \({\rm{c}}{{\rm{m}}}^{-1}\)) and high thermal conductivity (>500 \({\rm{W}}{{\rm{m}}}^{-1}{{\rm{K}}}^{-1}\)), and iii) large fitting errors in the harmonic and cubic force constants (>10%) for the analysis using the e3nn model. For the analysis using the CGCNN model, only the first criterion was applied. This difference in the applied criteria explains why the numbers of available data points differ between the two models (7308 for the CGCNN model and 7244 for the e3nn model), as shown in Fig. 3 and Supplementary Fig. S8. The second criterion was applied, as high thermal conductivity is often suppressed by four-phonon scattering.
Data availability
The dataset used for machine learning prediction, along with the Python scripts employed in this study, is available in the GitHub repository at https://github.com/masato1122/phonon_e3nn. Phonix—a database for anharmonic phonon interactions—will be made available on ARIM-mdx at https://phonix-db.org.
Code availability
Software for the automated calculation of anharmonic phonon properties (auto-kappa), as well as for the machine learning prediction of these properties, will be made available in the GitHub repository at https://github.com/masato1122/auto-kappa.
References
Nishijima, M. et al. Accelerated discovery of cathode materials with prolonged cycle life for lithium-ion battery. Nat. Commun. 5, 4553 (2014).
Ling, C. A review of the recent progress in battery informatics. npj Comput. Mater. 8, 33 (2022).
Wang, Y. et al. Design principles for solid-state lithium superionic conductors. Nat. Mater. 14, 1026–1031 (2015).
Zavyalova, U., Holena, M., Schlögl, R. & Baerns, M. Statistical analysis of past catalytic data on oxidative methane coupling for new insights into the composition of high-performance catalysts. ChemCatChem 3, 1935–1947 (2011).
Kusne, A. G. et al. On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci. Rep. 4, 6367 (2014).
Jain, A. et al. Commentary: the Materials Project: a materials genome approach to accelerating materials innovation. APL Mater 1, 011002 (2013).
Ong, S. P. et al. Python Materials Genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).
Ong, S. P. et al. The Materials Application Programming Interface (API): a simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Comput. Mater. Sci. 97, 209–215 (2015).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).
Kirklin, S. et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015).
Taylor, R. H. et al. A RESTful API for exchanging materials data in the AFLOWLIB.org consortium. Comput. Mater. Sci. 93, 178–192 (2014).
Dan, Y. et al. Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials. npj Comput. Mater. 6, 84 (2020).
Zhao, Y. et al. High-throughput discovery of novel cubic crystal materials using deep generative neural networks. Adv. Sci. 8, 2100566 (2021).
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Barroso-Luque, L. et al. Open Materials 2024 (OMat24) inorganic materials dataset and models. arXiv https://doi.org/10.48550/arxiv.2410.12771 (2024).
Togo, A. Phonondb. https://github.com/atztogo/phonondb.
Toher, C. et al. High-throughput computational screening of thermal conductivity, Debye temperature, and Grüneisen parameter using a quasiharmonic Debye model. Phys. Rev. B 90, 174107 (2014).
Katsura, Y. et al. Data-driven analysis of electron relaxation times in PbTe-type thermoelectric materials. Sci. Technol. Adv. Mater. 20, 511–520 (2019).
Xu, Y., Yamazaki, M. & Villars, P. Inorganic materials database for exploring the nature of material. Jpn. J. Appl. Phys. 50, 11RH02 (2011).
Poudel, B. et al. High-thermoelectric performance of nanostructured bismuth antimony Telluride bulk alloys. Science 320, 634–638 (2008).
Miura, A., Zhou, S. & Nozaki, T. Crystalline–amorphous silicon nanocomposites with reduced thermal conductivity for bulk thermoelectrics. ACS Appl. Mater. Interfaces 7, 13484–13489 (2015).
Ångqvist, M. & Erhart, P. Understanding chemical ordering in intermetallic clathrates from atomic scale simulations. Chem. Mater. 29, 7554–7562 (2017).
Ohnishi, M. et al. Enhancing the thermoelectric performance of Si-based clathrates via carrier optimization considering finite temperature effects. Chem. Mater. 36, 10595–10604 (2024).
Tamura, S. Isotope scattering of dispersive phonons in Ge. Phys. Rev. B 27, 858–866 (1983).
Protik, N. H. & Draxl, C. Beyond the Tamura model of phonon-isotope scattering. Phys. Rev. B 109, 165201 (2024).
Ohnishi, M., Shiga, T. & Shiomi, J. Effects of defects on thermoelectric properties of carbon nanotubes. Phys. Rev. B 95, 155405 (2017).
Yamawaki, M., Ohnishi, M., Ju, S. & Shiomi, J. Multifunctional structural design of graphene thermoelectrics by Bayesian optimization. Sci. Adv 4, eaar4192 (2018).
Ohnishi, M. & Shiomi, J. Strain-induced band moudlation of thermal phonons in carbon nanotubes. Phys. Rev. B 104, 014306 (2021).
Kodama, T. et al. Modulation of thermal and thermoelectric transport in individual carbon nanotubes by fullerene encapsulation. Nat. Mater. 16, 892–897 (2017).
Heremans, J. P. & Martin, J. Thermoelectric measurements. Nat. Mater. 23, 18–19 (2024).
Li, J. et al. Probing the limit of heat transfer in inorganic crystals with deep learning. arXiv https://doi.org/10.48550/arxiv.2503.11568 (2025).
Yang, H. et al. MatterSim: A deep learning atomistic model across elements, temperatures and pressures. arXiv https://doi.org/10.48550/arxiv.2405.04967 (2024).
Ziman, J. M. Electrons and Phonons: The Theory of Transport Phenomena in Solids (Oxford University Press, 2001) https://doi.org/10.1093/acprof:oso/9780198507796.001.0001.
Uchida, K. et al. Observation of the spin seebeck effect. Nature 455, 778–781 (2008).
Maekawa, S., Maekawa, S., Valenzuela, S. O., Saitoh, E. & Kimura, T. Spin Current (Oxford University Press, 2012) https://doi.org/10.1093/acprof:oso/9780199600380.001.0001.
Huang, K. & Rhys, A. Theory of light absorption and non-radiative transitions in F-centres. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci 204, 406–423 (1950).
Liang, F. et al. Multiphonon-assisted lasing beyond the fluorescence spectrum. Nat. Phys. 18, 1312–1316 (2022).
Törmä, P. & Barnes, W. L. Strong coupling between surface plasmon polaritons and emitters: a review. Rep. Prog. Phys. 78, 013901 (2015).
Yang, F., Sambles, J. R. & Bradberry, G. W. Long-range surface modes supported by thin films. Phys. Rev. B 44, 5855–5872 (1991).
Chen, D.-Z. A., Narayanaswamy, A. & Chen, G. Surface phonon-polariton mediated thermal conductivity enhancement of amorphous thin films. Phys. Rev. B 72, 155435 (2005).
Broido, D. A., Malorny, M., Birner, G., Mingo, N. & Stewart, D. A. Intrinsic lattice thermal conductivity of semiconductors from first principles. Appl. Phys. Lett. 91, 231922 (2007).
Esfarjani, K. & Stokes, H. T. Method to extract anharmonic force constants from first principles calculations. Phys. Rev. B 77, 144112 (2008).
Esfarjani, K., Chen, G. & Stokes, H. T. Heat transport in silicon from first-principles calculations. Phys. Rev. B 84, 085204 (2011).
Togo, A., Chaput, L. & Tanaka, I. Distributions of phonon lifetimes in Brillouin zones. Phys. Rev. B 91, 094306 (2015).
Tadano, T., Gohda, Y. & Tsuneyuki, S. Anharmonic force constants extracted from first-principles molecular dynamics: applications to heat transfer simulations. J. Phys. Condens. Matter 26, 225402 (2014).
Li, W., Carrete, J., Katcho, N. A. & Mingo, N. ShengBTE: a solver of the Boltzmann transport equation for phonons. Comput. Phys. Commun. 185, 1747–1758 (2014).
Esfarjani, K. et al. ALATDYN: a set of Anharmonic LATtice DYNamics codes to compute thermodynamic and thermal transport properties of crystalline solids. Comput. Phys. Commun. 312, 109575 (2025).
McGaughey, A. J. H., Jain, A., Kim, H.-Y. & Fu, B. Phonon properties and thermal conductivity from first principles, lattice dynamics, and the Boltzmann transport equation. J. Appl. Phys. 125, 011101 (2019).
Omini, M. & Sparavigna, A. An iterative approach to the phonon Boltzmann equation in the theory of thermal conductivity. Phys. B Condens. Matter 212, 101–112 (1995).
Ward, A., Broido, D. A., Stewart, D. A. & Deinzer, G. Ab initio theory of the lattice thermal conductivity in diamond. Phys. Rev. B 80, 125203 (2009).
Chaput, L. Direct solution to the linearized phonon Boltzmann equation. Phys. Rev. Lett. 110, 265506 (2013).
Togo, A., Chaput, L., Tadano, T. & Tanaka, I. Implementation strategies in phonopy and phono3py. J. Phys. Condens. Matter 35, 353001 (2023).
Feng, T. & Ruan, X. Quantum mechanical prediction of four-phonon scattering rates and reduced thermal conductivity of solids. Phys. Rev. B 93, 045202 (2016).
Feng, T., Lindsay, L. & Ruan, X. Four-phonon scattering significantly reduces intrinsic thermal conductivity of solids. Phys. Rev. B 96, 161201 (2017).
Werthamer, N. R. Self-consistent phonon formulation of anharmonic lattice dynamics. Phys. Rev. B 1, 572–581 (1970).
Tadano, T. & Tsuneyuki, S. Self-consistent phonon calculations of lattice dynamical properties in cubic SrTiO3 with first-principles anharmonic force constants. Phys. Rev. B 92, 054301 (2015).
Eriksson, F., Fransson, E. & Erhart, P. The hiphive package for the extraction of high-order force constants by machine learning. Adv. Theory Simul. 2, (2019).
Tadano, T. & Saidi, W. A. First-principles phonon quasiparticle theory applied to a strongly anharmonic halide perovskite. Phys. Rev. Lett. 129, 185901 (2022).
Simoncelli, M., Marzari, N. & Mauri, F. Unified theory of thermal transport in crystals and glasses. Nat. Phys. 395, 1–813 (2019).
Zhou, J. et al. Ab initio optimization of phonon drag effect for lower-temperature thermoelectric energy conversion. Proc. Natl. Acad. Sci. USA 112, 14777–14782 (2015).
Liao, B. et al. Significant reduction of lattice thermal conductivity by the electron-phonon interaction in silicon with high carrier concentrations: a first-principles study. Phys. Rev. Lett. 114, 115901 (2015).
Cepellotti, A., Coulter, J., Johansson, A., Fedorova, N. S. & Kozinsky, B. Phoebe: a high-performance framework for solving phonon and electron Boltzmann transport equations. J. Phys.: Mater. 5, 035003 (2022).
Mingo, N., Esfarjani, K., Broido, D. A. & Stewart, D. A. Cluster scattering effects on phonon conduction in graphene. Phys. Rev. B 81, 045408 (2010).
Katcho, N. A., Carrete, J., Li, W. & Mingo, N. Effect of nitrogen and vacancy defects on the thermal conductivity of diamond: an ab initio Green’s function approach. Phys. Rev. B 90, 094117 (2014).
Ångqvist, M. et al. ICET – a Python library for constructing and sampling alloy cluster expansions. Adv. Theory Simul. 2, 1900015 (2019).
Carrete, J., Li, W., Mingo, N., Wang, S. & Curtarolo, S. Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Phys. Rev. X 4, 011019 (2014).
Seko, A. et al. Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and Bayesian optimization. Phys. Rev. Lett. 115, 205901 (2015).
Ju, S. et al. Exploring diamondlike lattice thermal conductivity crystals via feature-based transfer learning. Phys. Rev. Mater. 5, 053801 (2021).
Qin, G. et al. Predicting lattice thermal conductivity from fundamental material properties using machine learning techniques. J. Mater. Chem. A 11, 5801–5810 (2023).
Miyazaki, H. et al. Machine learning based prediction of lattice thermal conductivity for half-Heusler compounds using atomic information. Sci. Rep. 11, 13410 (2021).
Zhu, T. et al. Charting lattice thermal conductivity for inorganic crystals and discovering rare earth chalcogenides for thermoelectrics. Energy Environ. Sci. 14, 3559–3566 (2021).
Yan, J. et al. Material descriptors for predicting thermoelectric performance. Energy Environ. Sci. 8, 983–994 (2014).
Callaway, J. Model for lattice thermal conductivity at low temperatures. Phys. Rev. 113, 1046–1051 (1958).
Cahill, D. G., Watson, S. K. & Pohl, R. O. Lower limit to the thermal conductivity of disordered crystals. Phys. Rev. B 46, 6131–6140 (1992).
Zhou, F., Nielson, W., Xia, Y. & Ozoliņš, V. Lattice anharmonicity and thermal conductivity from compressive sensing of first-principles calculations. Phys. Rev. Lett. 113, 185501 (2014).
Seko, A. & Togo, A. Projector-based efficient estimation of force constants. Phys. Rev. B 110, 214302 (2024).
Togo, A. & Seko, A. On-the-fly training of polynomial machine learning potentials in computing lattice thermal conductivity. J. Chem. Phys. 160, 211001 (2024).
Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci. 2, 718–728 (2022).
Simoncelli, M., Marzari, N. & Mauri, F. Wigner formulation of thermal transport in solids. Phys. Rev. X 12, 041011 (2022).
Póta, B., Ahlawat, P., Csányi, G. & Simoncelli, M. Thermal conductivity predictions with foundation atomistic models. arXiv https://doi.org/10.48550/arxiv.2408.00755 (2024).
Kielar, S. et al. Anomalous lattice thermal conductivity increase with temperature in cubic GeTe correlated with strengthening of second-nearest neighbor bonds. Nat. Commun. 15, 6981 (2024).
Guo, Z. et al. Fast and accurate machine learning prediction of phonon scattering rates and lattice thermal conductivity. npj Comput. Mater. 9, 95 (2023).
Plata, J. J., Posligua, V., Márquez, A. M., Sanz, J. F. & Grau-Crespo, R. Charting the lattice thermal conductivities of I–III–VI2 chalcopyrite semiconductors. Chem. Mater. 34, 2833–2841 (2022).
Xia, Y. et al. High-throughput study of lattice thermal conductivity in binary rocksalt and zinc blende compounds including higher-order anharmonicity. Phys Rev X 10, 041029 (2020).
Li, Z., Lee, H., Wolverton, C. & Xia, Y. High-throughput computational framework for high-order anharmonicthermal transport in cubic and tetragonal crystals. npj Comput. Mater 12, 51 (2026).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).
Hanai, M. et al. ARIM-mdx data system: towards a nationwide data platform for materials science. IEEE Int. Conf. Big Data 00, 2326–2333 (2024).
Slack, G. A. Anisotropic thermal conductivity of pyrolytic graphite. Phys. Rev. 127, 694–701 (1962).
Protik, N. H. et al. Phonon thermal transport in 2H, 4H and 6H silicon carbide from first principles. Mater. Today Phys 1, 31–38 (2017).
Cheng, Z. et al. High thermal conductivity in wafer-scale cubic silicon carbide crystals. Nat. Commun. 13, 7201 (2022).
Zheng, Q. et al. Thermal conductivity of GaN, GaN71, and SiC from 150 K to 850 K. Phys. Rev. Mater. 3, 014601 (2019).
Fisher, G. R. & Barnes, P. Towards a unified view of polytypism in silicon carbide. Philos. Mag. Part B 61, 217–236 (1990).
Zhang, Y., Ke, X., Chen, C., Yang, J. & Kent, P. R. C. Thermodynamic properties of PbTe, PbSe, and PbS: First-principles study. Phys. Rev. B 80, 024304 (2009).
Tian, Z. et al. Phonon conduction in PbSe, PbTe, and PbTe 1−xSe xfrom first-principles calculations. Phys. Rev. B 85, 184303 (2012).
Ju, S., Shiga, T., Feng, L. & Shiomi, J. Revisiting PbTe to identify how thermal conductivity is really limited. Phys. Rev. B 97, 184305 (2018).
Hestness, J. et al. Deep Learning Scaling is Predictable, Empirically. arXiv https://doi.org/10.48550/arxiv.1712.00409 (2017).
Kaplan, J. et al. Scaling laws for neural language models. arXiv https://doi.org/10.48550/arxiv.2001.08361 (2020).
Minami, S. et al. Scaling law of Sim2Real transfer learning in expanding computational materials databases for real-world predictions. arXiv https://doi.org/10.48550/arxiv.2408.04042 (2024).
Mikami, H. et al. Machine learning and knowledge discovery in databases. In European Conference, ECML PKDD 2022, Proceedings, part III, 477–492, https://inspirehep.net/literature/2818406 (2023).
Ohnishi, M. & Shiomi, J. Towards ultimate impedance of phonon transport by nanostructure interface. APL Mater 7, 013102 (2019).
Qian, X., Zhou, J. & Chen, G. Phonon-engineered extreme thermal conductivity materials. Nat. Mater. 20, 1188–1202 (2021).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Geiger, M. & Smidt, T. e3nn: Euclidean Neural Networks. arXiv https://doi.org/10.48550/arxiv.2207.09453 (2022).
Fuchs, F. B., Worrall, D. E., Fischer, V. & Welling, M. SE(3)-transformers: 3D roto-translation equivariant attention networks. arXiv https://doi.org/10.48550/arxiv.2006.10503 (2020).
Chen, Z. et al. Direct prediction of phonon density of states with euclidean neural networks. Adv. Sci. 8, 2004214 (2021).
Okabe, R. et al. Virtual node graph neural network for full phonon prediction. Nat. Comput. Sci. 4, 522–531 (2024).
Fang, S., Geiger, M., Checkelsky, J. G. & Smidt, T. Phonon predictions with E(3)-equivariant graph neural networks. arXiv https://doi.org/10.48550/arxiv.2403.11347 (2024).
Srivastava, Y. & Jain, A. Accelerating prediction of phonon thermal conductivity by an order of magnitude through machine learning assisted extraction of anharmonic force constants. Phys. Rev. B 110, 165202 (2024).
Zhang, F. et al. Room-temperature magnetic thermal switching by suppressing phonon-magnon scattering. Phys. Rev. B 109, 184411 (2024).
Shao, H. et al. Phonon transport in Cu2GeSe3: effects of spin-orbit coupling and higher-order phonon-phonon scattering. Phys. Rev. B 107, 085202 (2023).
Tadano, T., Gohda, Y. & Tsuneyuki, S. Impact of rattlers on thermal conductivity of a thermoelectric clathrate: a first-principles study. Phys. Rev. Lett. 114, 095501 (2015).
Ohnishi, M., Tadano, T., Tsuneyuki, S. & Shiomi, J. Anharmonic phonon renormalization and thermal transport in the type-I Ba8Ga16Sn30 clathrate from first principles. Phys. Rev. B 106, 024303 (2022).
Meredig, B. et al. Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol. Syst. Des. Eng. 3, 819–825 (2018).
Xu, K. et al. How neural networks extrapolate: from feedforward to graph neural networks. arXiv https://doi.org/10.48550/arxiv.2009.11848 (2020).
Noda, K., Wakiuchi, A., Hayashi, Y. & Yoshida, R. Advancing extrapolative predictions of material properties through learning to learn. arXiv https://doi.org/10.48550/arxiv.2404.08657 (2024).
Lindsay, L., Broido, D. A. & Reinecke, T. L. First-principles determination of ultrahigh thermal conductivity of boron arsenide: a competitor for diamond? Phys. Rev. Lett. 111, 025901 (2013).
Qin, G., Xu, J., Wang, H., Qin, Z. & Hu, M. Activated lone-pair electrons lead to low lattice thermal conductivity: a case study of boron arsenide. J. Phys. Chem. Lett. 14, 139–147 (2023).
Semenok, D. V. et al. Superconductivity at 161 K in thorium hydride ThH10: Synthesis and properties. Mater. Today 33, 36–44 (2020).
Cort, B., Ward, J. W., Vigil, F. A. & Haire, R. G. Resistivity studies of cubic americium hydrides from 20 to 300 K. J. Alloy. Compd. 224, 237–240 (1995).
Cendrowski-Guillaume, S. M., Lance, M., Nierlich, M., Vigner, J. & Ephritikhine, M. New actinide hydrogen transition metal compounds. Synthesis of [K(C 12 H 24 O 6)][(η-C 5 Me 5) 2 (Cl)UH 6 Re(PPh 3) 2] and the crystal structure of its benzene solvate. J. Chem. Soc. Chem. Commun 0, 1655–1656 (1994).
Larsen, A. H. et al. The atomic simulation environment—a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Togo, A., Shinohara, K. & Tanaka, I. Spglib: a software library for crystal symmetry search. Sci. Technol. Adv. Mater. Methods 4, 2384822 (2024).
Togo, A. First-principles phonon calculations with phonopy and phono3py. J. Phys. Soc. Jpn. 92, 012001 (2023).
Hinuma, Y., Pizzi, G., Kumagai, Y., Oba, F. & Tanaka, I. Band structure diagram paths based on crystallography. Comput. Mater. Sci. 128, 140–184 (2017).
Wang, Y. et al. A mixed-space approach to first-principles calculations of phonon frequencies for polar materials. J. Phys. Condens. Matter 22, 202201 (2010).
Parlinski, K., Li, Z. Q. & Kawazoe, Y. Parlinski, Li, and Kawazoe Reply. Phys. Rev. Lett. 81, 3298–3298 (1998).
Gonze, X. & Lee, C. Dynamical matrices, Born effective charges, dielectric permittivity tensors, and interatomic force constants from density-functional perturbation theory. Phys. Rev. B 55, 10355–10368 (1997).
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006).
Birch, F. Finite elastic strain of cubic crystals. Phys. Rev. 71, 809–824 (1947).
Murnaghan, F. D. The compressibility of media under extreme pressures. Proc. Natl. Acad. Sci. 30, 244–247 (1944).
Perdew, J. P. et al. Restoring the density-gradient expansion for exchange in solids and surfaces. Phys. Rev. Lett. 100, 136406 (2008).
Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. arXiv https://doi.org/10.48550/arxiv.1412.6980 (2014).
Momma, K. & Izumi, F. VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data. J. Appl. Crystallogr. 44, 1272–1276 (2011).
Acknowledgements
The authors thank C. Dames and Y. Sun for co-organizing the Workshop “Thermal Transport, Materials Informatics, and Quantum Computing” supported by National Science Foundation (NSF) and Japan Science and Technology Agency (JST), where this project was conceptualized. The authors also thank C. Wolverton, A. Togo, K. Esfarjani, and M. Kawamura for fruitful discussions. Numerical calculations were performed using the following supercomputers through the HPCI System Research Project (Project IDs: hp220151, jh230065, and hp240194): Grand Chariot at the Information Initiative Center, Hokkaido University; OCTOPUS and SQUID at the D3 Center, Osaka University; Oakbridge-CX and Wisteria/BDEC-01 at the Supercomputing Division, Information Technology Center, The University of Tokyo; and AOBA-B at the Cyberscience Center, Tohoku University. Additional resources were provided by the Supercomputer Center, Institute for Solid State Physics, The University of Tokyo, and MASAMUNE-IMR at the Center for Computational Materials Science, Institute for Materials Research, Tohoku University. This work was partially supported by CREST Grants No. JPMJCR21O2 and No. JPMJCR19I2 from the Japan Science and Technology Agency (JST), JSPS KAKENHI Grants No. 22H04950 and No. 24K07354 from the Japan Society for the Promotion of Science (JSPS), and a grant-in-aid from the Thermal and Electric Energy Technology Foundation. K.H. acknowledges funding from the MAT-GDT Program at A*STAR via the AME Programmatic Fund by the Agency for Science, Technology and Research under Grant No. M24N4b0034. L.L. acknowledges supported for vibrational property calculations and database discussions from the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Material Sciences and Engineering Division. T.D. acknowledges the financial support from National Natural Science Foundation of China (Grant No. 62204218) and Zhejiang Provincial Natural Science Foundation of China (No. LJXSZ26A040002), and computational resources from the National Supercomputer Center in Tianjin. P. T. acknowledges the financial support from the Catalan Government through the funding grant ACCIÓ-Eurecat (Project TRAÇA SMART-MAT).
Author information
Authors and Affiliations
Contributions
The project was conceptualized by T.L. and J.S. (together with Chris Dames and Ying Sun), and managed by M.O. and J.S. M.O., T.T., T.D., P.T., Z.X., Z.W., and M.M. contributed to code development. M.O., T.D., P.T., Z.X., H.Z., W.N., Z.W., M.M. generated phonon property data through automated calculations. M.O., R.Y., and J.S. contributed to data analysis. M.O., M.H., Z.W., T.S., R.Y., and J.S. contributed to the machine learning and database construction. M.O., Z.W., and J.S. wrote the original manuscript, and all authors contributed to revising the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ohnishi, M., Deng, T., Torres, P. et al. Database and deep-learning scalability of anharmonic phonon properties by automated brute-force first-principles calculations. npj Comput Mater 12, 150 (2026). https://doi.org/10.1038/s41524-026-02033-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41524-026-02033-w






