Introduction

Si/SiO2 interfaces are ubiquitous in semiconductor manufacturing, which includes metal-oxide-semiconductor field-effect transistors1, nanowire- and nanodot-transistors. The formation of SiO2 layers involves charge transfer during the oxidation of Si substrates2. Additionally, siliceous materials-including clay minerals and cement, which comprise Si, O, and SiO2 components-are governed by interactions that entail similar charge transfer. Abundant past theoretical research has focused on understanding Si, its oxides, the formation of SiO2 multilayered structures, early oxidation rates, and amorphization of oxide layers3,4,5,6,7,8. These studies predominantly relied on electronic density functional theory (DFT)9,10,11 and nonflexible classical potentials12,13. However, simulating large systems with multiple components, charge transfer, and hetero-interfacial systems poses challenges within these frameworks. An ideal modeling approach should explicitly or implicitly capture charge transfer without compromising accuracy or incurring prohibitively large computational costs. Existing charge equilibration potentials like ReaxFF14,15 and COMB16,17,18, while being capable of describing chemical interactions during MD simulations, tend to have a limited ability to describe mechanical properties of materials15,17,19 unless special reparametrization is applied.

Recent advances, such as linear-scaling DFT11,20 and machine learning (ML) force fields–e.g., the Gaussian approximation potentials (GAP)21, and artificial neural networks22,23—lift the limitations of traditional methods. ML force fields have demonstrated high accuracy in modeling Si24,25,26,27 and many other elements28,29,30,31,32. Similar progress has been made in improving interatomic interaction descriptions in Si oxides33,34,35,36 and metal oxides37,38,39,40. However, jointly describing compounds and their constituents using ML force fields presents challenges due to the disjointed configurational space of multi-phase forms and the need to handle charge transfer. ML force fields41,42,43,44,45 combine a descriptor to a regression procedure to encode geometry and ab initio properties, usually omitting explicit electronic structures. A previous study focusing on modeling SiO2 using the moment tensor potential (MTP) suggests incorporating additional reference data is preferable to adding explicit charge equilibration for long-range interactions33.

The novelty of this article is a MTP46,47 that jointly describes interatomic interactions in SiO2 and its constituents (Si and O), enabling the representation of multiple charge states. The developed MTP for Si/O/SiO2 systems is parameterized using an ab initio database containing diverse crystal structures, point defects, extended defects, and disordered structures. This MTP is then utilized for molecular statics (MS) and molecular dynamics (MD) simulations to investigate crystalline, interfaces, amorphous, and liquid states of Si and SiO2. These test simulations indicate that the MTP can provide a unified description of these disjoint systems.

Results

Our analysis encompasses both MS and MD simulations. MS results include cohesive energy, lattice constant, elastic constant, defect formation energies, interface relaxation, as well as linear and planar defect. The MD results bracket vacancy diffusion coefficients, melting point,interfaces, as well as the liquid and amorphous structures of Si and SiO2. The outcome of these runs are compared against the reference method (DFT) and those derived from the semi-classical potential.

Cohesive energy, elastic constant, point defects and extended defects

First, the state equations of various Si and SiO2 polymorphs are presented as cohesive energy vs lattice parameter. Please see the methodology section for calculation details. Additionally, the cohesive energy values for molecular oxygen, both O2 and O3, are provided. As shown in Fig. 1, the MTP replicated the cohesive energies of the references states with remarkable accuracy. The Table 1 compares lattice parameters predicted by the MTP, COMB, and ReaxFF models with benchmark and experimental data. Remarkably, the MTP predictions show excellent agreement with both the benchmark and experimental data.

Fig. 1: Bond and angle energy of oxygen molecules, and equation of state of silicon and silica polymorphs.
figure 1

a Bond and angle energy of dioxygen and ozone as calculated using MTP (lines) and DFT (dots). bd Energy-volume relationships in crystalline SiO2 and Si polymorphs calculated using the MTP (lines) and DFT (dots). 3D Si polymorphs (b), 3D (c), and 2D (d) SiO2 structures were considered. Details pertaining to these crystal structures are available as S.I. Agreement between MTP and DFT is excellent. Insets: illustration of selected polymorphs crystal structures.

Table 1 Comparison of lattice constants predicted by MTP, ReaxFF, and COMB models against experimental data and DFT calculations

The second-order elastic constants and bulk modulus are determined using finite difference, as detailed in the methods section. Table 2 provides the relative root mean square error (RRMSE) on elastic constant with respect to the DFT benchmark and experimental data. The MTP model demonstrates lower RRMSE when compared to those of ReaxFF and COMB potentials, it competes closely with the Beest Kramer van Santen (BKS)48 potential. Moreover, other semi-empirical models reported in ref. 24 demonstrate higher errors compared to the predictions made by the MTP model. The bulk modulus values for various silicon and silica polymorphs can also be found in Table 3. As evident, the MTP predictions closely align with the reference methods and experimental values, although it is worth noting that the training set did not encompass the deformation of certain polymorphs. Our potential accurately predicts the elastic constant of amorphous silica, even though amorphous configurations were not included in the training set. In our testing of the MTP potential, we have also considered point defects like vacancies, divacancies, and self-interstitials. The RRMSE values for these defects are reported in Table 4. Once again, using the MTP leads to smaller relative errors in comparison to ReaxFF potential. Since the majority of potentials were not specifically parameterized for the oxygen system alone, our comparison was limited solely to ReaxFF. In our study, we examined a specific case involving the I4 compact cluster49 within the Si crystal, which was not included in our training set. The atoms within the cluster exhibit a harmonious four-coordinated arrangement. Notably, the cluster boasts the presence of five-, six-, and seven-membered atomic rings. The bond lengths and bond angles of this cluster were calculated based on relaxed structures obtained from DFT, MTP, SW, ReaxFF and COMB calculations, as illustrated in Fig. 2. Notably, no dangling bonds were observed for all potentials, except for the COMB potential, which failed to reproduce the I4 structure. When analyzing the formation energy of the cluster, the MTP model exhibited a prediction within 14% of the reference value, while the SW and ReaxFF potentials displayed errors reaching up to 27% and 47%, respectively. Among the models assessed, the MTP model exhibited superior agreement with DFT calculations for both bond lengths and bond angles. As this is a perfectly coordinated tetra-interstitial, we also tested 3 and 5-fold coordinated interstitials, namely, di-interstitial, tri-interstitial, and tetra-interstitial, as shown in Fig. 2. While these defects are not included in the training set, the MTP exhibits better agreement with the benchmark than ReaxFF, as detailed in Table 4. The MTP also outperforms ReaxFF in describing the vacancy formation energy in silica polymorphs, as demonstrated in Table 4. Again, no SiO2 point defects configurations were incorporated to our training set.

Table 2 Comparision of force fields prediction of elastic constant for silicon crystal and α-quart
Table 3 Exploring bulk modulus in silicon and silica polymorphs: calculations using experimental lattice parameters input in LAMMPS code
Table 4 Comparative analysis of defect formation energies in silicon crystals and silica polymorphs using DFT, MTP, and ReaxFF results
Fig. 2: Point defects in silicon crystal.
figure 2

In silicon crystal, a relaxed perfectly coordinated four-interstitial cluster is depicted. left a and c: the MTP leads to bond lengths and angles (as illustrated in the b) in better agreement with DFT as compared to SW, ReaxFF and COMB models. Additionally, two other interstitial clusters with coordination defects are presented, namely a four-interstitial cluster d and a di-interstitial e. All the interstitial atoms are colored orange. In d and e, the interstitial bonds are also colored orange. Their formation energies are indicated in Table 4, where configurations b, d, and e correspond to Si-I4C2, Si-I4C1, and Si-I2, respectively. The notations I1, I2, I3, and I4, as well as α, β, γ, δ, and ϵ, represent bond lengths and bond angles within the cluster, respectively. These interstitial configurations were not included in the training set.

The static migration barrier energy of the vacancy was determined using the nudged elastic band (NEB)50. The migration barrier profiles, including the DFT-based profiles as well as the MTP- and SW-based profiles, are depicted in Fig. 3b.The MTP migration barrier profile shows excellent agreement with the DFT reference profile. In contrast, the SW potential does not capture the reference profiles with similar accuracy. Examining the barrier for vacancy migration reveals a relative error in barrier height of 15.0% for MTP, while for SW, it is 73.1%. These results demonstrate that the MTP model can better mimic the reference method when studying point defects within larger systems, as demonstrated in references28,51,52. The activation barriers for mono-vacancy hopping were further investigated using MD simulations, considering temperatures ranging from 1000 K to 1650 K. The simulation details are provided in the method section for reference. The mean square displacements are also provided in the supplementary Fig. S4. As reported in Fig. 3a, the activation energy for mono-vacancy hopping is 0.31 eV for MTP and 0.41 eV for SW. These values are close to the static activation barriers computed at 0 K; the MTP model behaves in a physically plausible fashion. As observed in Fig. 3a, b, the barrier obtained from MD simulation is 0.32 eV, whereas the static migration barrier stands at 0.31 eV. It is noteworthy that NEB configurations, including ab initio molecular dynamics (AIMD) configurations containing vacancies, were not included in the training set. We investigated extended defects such as generalized stacking faults and dislocations in the Si crystal. Generalized stacking faults are planar defects closely linked to slip. In turn, the behavior of dislocations and their core properties are particularly important for understanding plasticity. The methods section of our study provides a detailed description of our model for generalized stacking faults and dislocations, as well as the calculations involved. In Fig. 4a, b, the excess energy per unit area, also known as a γ-line, is presented. The γ-line was calculated using the benchmark method, the MTP model, the SW53 and Tersoff (TS)54,55 semi-classical potentials. As shown in the Fig. 4, the MTP is in good agreement with the benchmark results. In contrast, SW and TS demonstrate lesser agreement with the benchmark. In Fig. 5, the dislocation core structures are presented. The core structures predicted by the MTP model exhibit a nearly perfect agreement with those predicted by DFT, as reported in ref. 56. An important and distinctive feature of our potential is the direct relaxation of the C1 core structure to the C2 structure, which is commonly referred to as the double-period reconstruction of the C1 core. In past studies, the C2 core structure was often manually reconstructed from the relaxed C1 core56,57. However, our potential eliminates the need for manual reconstruction by obtaining the C2 structure directly. To obtain the relaxed structure and energy of the C1 configuration, a snapshot is selected from the relaxation steps that ultimately lead to the C2 structure. Our investigation also revealed that the C2 core is the most stable configuration, a result consistent with previous reports56,57. In addition to Fig. 5, the core structures are also depicted in Supplementary Fig. S10.

Fig. 3: Diffusion of point defects in silicon crystal.
figure 3

a Temperature dependence of vacancy diffusion coefficients simulated using the MTP and SW. b NEB-based mono-vacancy jump barrier. DFT, MTP and SW are compared. AIMD configurations containing vacancies, along with NEB configurations, were excluded from the training set. Insets: illustration of vacancy position before and after the jump within the silicon crystal.

Fig. 4: Planar defect in silicon crystal.
figure 4

γ-lines on the (111) plane as predicted by DFT, MTP, TS and SW. The shuffle (S) and glide (G) cuts are illustrated in the inset. a The MTP provides a good description of the γ-line associated to the shuffle cut and b a near-perfect description of the γ-line associated to the glide cut. Insets (i), (ii), and (iii) represent the bulk structure, the relaxed shuffle (S) structure, and the relaxed glide (G) structure, respectively.

Fig. 5: Line defect in silicon crystal.
figure 5

Positions and relaxed structures of [110] screw dislocation cores in Si obtained with the MTP potential. The system size was set to 14400 atoms and oriented such that the x, y, and z directions coincide to [11-2], [111], [110] respectively. The dislocation core structures are in good agreement with DFT core structures reported in the literature56. The core types are represented by A, B, C1, and C2, respectively. The red mark indicates the position of the dislocation core. Dislocation configurations were excluded from the training dataset.

Coexistence simulation

We determined the solid-liquid coexistence temperature of silicon using the solid-liquid interface method described in ref. 58. Our MTP potential predicts the silicon melting point to be 1485 ± 5 K, which is ~0.5% lower than the benchmark DFT-GGA value of 1492 K59. Note that both the MTP potential and the benchmark value are ~12% lower than the experimental melting point of 1687 K, as reported in ref. 60. Notably, our database initially lacked solid-liquid interface data, so we integrated a few AIMD configurations gathered around the experimental melting point into our unified training set. Additionally, it is important to acknowledge the influence of the exchange-correlation (XC) function on melting behavior, as discussed in refs. 59,61. The capability of MTP to accurately replicate the melting point of the GGA XC functional showcases the high-quality simulation of liquid structure by MTP, as elaborated in the following section. For comprehensive details on our coexistence simulation approach, please refer to the accompanying supplementary Fig. S9.

Silicon slab energy

We compare the surface energy predicted by MTP against experimental data and DFT references, as slab data were not included in the training set. Experimental slab energy values for Si (100), Si(110), and Si(111) are reported to be 2.1, 1.5, and 1.2 j/m262, respectively, while our DFT values are 2.1, 1.8, and 1.6 j/m². The MTP predicts these surface energies to be 2.0, 1.3, and 1.2 j/m², respectively. While a large discrepancy between MTP and DFT is observed, especially for Si(110) with relative errors of up to 25%, the MTP-predicted surface energy closely matches the experimental slab energy.

Disordered structures

We also considered an extensive set of disordered Si and SiO2 structures. We compared structures generated using ab initio MD, MTP-MD, and semi-empirical-MD. All three cases were subjected to identical MD simulation conditions, except for amorphous silica, where the MTP simulation time was shorter compared to the other potentials. For comparison, we have chosen the Vashista (VA)63, Munetoh (TS)64, BKS48, and Sundararaman (SHK1 and SHK2)65 models. The details regarding the ab initio and classical MD simulations are comprehensively provided in the methods section. To analyze the disordered structures, we utilized both the pair correlation functions and the bond angle distribution functions.

Liquid and amorphous silicon

As observed from the radial distribution function shown in Fig. 6a, the MTP describes the structure of liquid Si with high precision. In contrast, the others potential leads to a shifted position of the first-neighbor peak compared to the results obtained from the reference (DFT). Additionally, there is an overestimation of the peak height, primarily observed with the EDIP and Tersoff potentials. The angular distribution function (ADF) Fig. 6b also demonstrates excellent agreement between the MTP model and DFT data. When considering amorphous Si, the MTP model accurately describes the structural features in agreement with experimental data. Conversely, semi-classical potentials like SW and Tersoff fail to replicate the experimental radial distribution profile. As shown in Fig. 6c, experimental g(r) and MTP-MD lead to nearly identical first-neighbor peaks (2.36 Å) and second neighbor peaks (3.89 Å). Additionally, the MTP model exhibits better agreement with the experimental bond angle distribution centered around 108.6°66. Although the bond angle distributions of semi-empirical models like SW and EDIP are closer to the experimental values, they exhibit angle distributions below 90°, as shown in Fig. 6d. Note that the ab initio cooling and equilibration trajectory was not included in the training set of the MTP model, which suggests the MTP is fairly general, which can be attributed to the fact that the training dataset encompasses a variety of configurations within the disordered Si systems. Overall, the MTP model demonstrates a level of accuracy comparable to that of DFT and experiment when describing the structural features and bonding characteristics of disordered Si.

Fig. 6: Liquid and amorphous silicon.
figure 6

Disordered structures of Si simulated using DFT, MTP, semi-empirical models a Radial and b angular distribution functions of liquid Si (3370 K, 64 atoms); c Radial and d angular distribution functions of amorphous Si (300 K, 1000 atoms). The amorphous distribution functions are compared against experimental data from Exp A112,113 and Exp B66.

Liquid and amorphous silica

Figure 7 illustrates results pertaining to liquid SiO2. It includes pair distribution functions (PDF) and ADF. The MTP is in better agreement with DFT as compared to the others potential. We notice both quantitative and qualitative differences between the semi-empirical potentials and DFT, except for the Si–O pair correlation function. In this case, the semi-empirical potentials only overestimates the height of the first peak, which is located around 1.62 Å67,68,69. This value is consistent with the experimental Si–O bond length observed in liquid SiO2, indicating a strong chemical interaction between the Si–O pairs. Both the O-O and Si-Si pair correlation functions exhibit a shift in the first peak as given by the others models, while there is a strong quantitative and qualitative match between the MTP-based and the DFT-based structures. Furthermore, the BKS-based Si–O–Si ADF, along with those of other semi-empirical models, does not match the DFT-based ADF. Conversely, the MTP-based ADF demonstrates a good match. At 3600 K, using our 96-atom simulation box model, the average Si–O–Si angle between two SiO4 tetrahedra is determined as follows: DFT–134.5°, MTP–131.4°, BKS–146.0°, TS–143.6°, VA–141.8°, SHK1–142.6°, and SHK2–140.4°. Our DFT value aligns closely with literature, approximately 136.0°70 and 135.0°71 respectively. It is evident that the MTP closely resembles the reference method, whereas other models align with experimental values for the Si–O–Si angle in amorphous silica, ranging between 140° and 152°72,73,74. This likely stems from the semi-empirical models being meticulously fitted with consideration of experimental properties. Most of them were optimized based on mixed ab initio-experimental data. We then varied the temperature of liquid silica from 2500 K to 3500 K using a 648-atom box and recorded the Si–O–Si angle, as depicted in Fig. 7d. We have found that the Si–O–Si angle in liquid silica changes with temperature, consistent with the findings reported in ref. 75. As the temperature decreases, the angle between tetrahedral interconnections increases, contributing to network relaxation. It is likely that the relaxation of the network at room temperature upon cooling is primarily attributed to variations in bond lengths and angles, given that the network structure of silica liquid does not qualitatively change between 3500 K and 300 K.

Fig. 7: Liquid silica.
figure 7

a Si-Si, b O-O and c Si–O correlation functions g(r) and f Si-O-Si, e O-Si-O, g Si-Si-Si, and h O-O-O partial bond angle distribution function for liquid SiO2 at 3600 K simulated using DFT, MTP, BKS, TS, VA, SHK1 and SHK2 with a 96-atom simulation box. d Distribution of the Si–O–Si angle against temperature using the MTP potential.

Furthermore ab initio MD, MTP-MD, BKS-MD and other models lead to a within-tetrahedra O-Si–O bond angle distribution centered around 109°, which is nearly equal to the experimental bond angle67,68,69. However, the BKS and VA potentials overestimate the average probability at 109°, while the TS potential underestimates this probability. The SHK1, SHK2, and MTP potentials match the benchmark. The DFT, MTP, and all other models except TS lead to very similar O-O-O ADFs. However, when it comes to the Si-Si-Si angle, qualitative and quantitative discrepancies are observed between BKS-generated structures and DFT-generated structures, while MTP-generated structures match the benchmark. There is also a notable discrepancy between the structure predicted by other models and that of the benchmark.

The partial PDFs of vitreous SiO2 are illustrated in Fig. 8a–c. The comparison is made against the PDF computed from experimental data using the reverse Monte Carlo (RMC) method76. The MTP potential exhibits qualitative agreement with RMC data, as our PDF profiles match those obtained from RMC.For example, only the MTP accurately reproduces the RMC profiles for Si-Si interactions, as the second peak around 5 Å is not reproduced by the other models, including the BKS model. Additionally, while the other models fail to reproduce the height of the Si–O RMC PDF, the MTP potential shows a good match. To further analyze the structure, we also computed the most important ADF, as well as the Si–O bond length distribution function. For O-Si–O (refer to Fig. 8c, all the profiles are similar but vary in height, centered around the experimental value of 109.0°. When it comes to Si–O–Si angle (Fig. 8d), both the MTP and BKS show similar profiles centered between the experimental values of 144° and 152°. The average values for BKS Si–O–Si angle is 150 and the one of the MTP is 145.5. As the average value for the same angle predicted by MTP in liquid silica at 3600 is ~131.4, this confirms that the Si–O–Si angle varies in liquid silica. Considering all the potential, the the Si–O bond length distribution are centered between 1.60 and 1.66 Å. The average bond length distribution for MTP potential is 1.63 Å, which is close to experimental values of 1.62 Å. Note that these structural properties may vary slightly depending on the employed cooling rate. Despite the high cooling rate, no major coordination defects were observed. This indicates that the configuration has been well equilibrated at 3000 K, resulting in the establishment of a strong network. Note that the ab initio MD trajectory, which encompasses both the cooling and equilibration stages of the amorphous structure preparation, was not included in the MTP training set, which is an indicator of the MTP’s generalization capability. Overall, the MTP potential demonstrates a remarkable improvement in accurately describing the structure of disordered SiO2 compared to well-established potentials such as the BKS potential and others. The MTP model captures the essential structural features of the disordered systems with greater precision, resulting in a better agreement with experimental observations67,68,69,76.

Fig. 8: Amorphous silica.
figure 8

Partial radial distribution functions (ac) in vitreous silica were obtained using various potential including MTP, BKS, TS, VA, SHK1, and SHK2. The vitreous systems were equilibrated at 300 K, consisting of 648 atoms. These partial radial distribution functions are compared with experimental data obtained using Reverse Monte Carlo (RMC)76. Additionally, the angular distribution function (d, e) and bond length distribution function (f) were analyzed and compared with experimental data from multiple sources: Exp A67, Exp B73, Exp C114, and Exp D115.

Phonon dispersion

We calculated the phonon dispersion of c-Si and α-quartz, as illustrated in Fig. 9a, b, respectively. The MTP model exhibits very good agreement with the reference method (DFT) except the higher frequencies for α-quartz. Once again, the corresponding frozen phonon configurations were not explicitly included in the training set.

Fig. 9: Phonon dispersions.
figure 9

Phonon dispersions of a c-Si and b α-quartz computed using DFT and MTP.

Si–SiO2 interface

The Si–SiO2 interface, a cornerstone in semiconductor physics and material science, plays a fundamental role in device fabrication and significantly impacts device performance. Exploring heterostructures involving Si–SiO2 interfaces opens avenues for novel functionalities and applications in microelectronics and beyond. Given its importance, capturing the structure and dynamics of the Si–SiO2 interface is paramount for a potential model of the Si–O system. To achieve this, we employed various models, encompassing crystalline Si slabs with different orientations, such as Si(100), Si(110), and Si(111). Our approach involved utilizing both α-quartz, β-cristobalite, and other polymorphs in constructing the Si–SiO2 interface. For a detailed explanation of our construction scheme, please refer to the Method section. To assess the suitability of the potential for modeling the Si/SiO2 interface, we examined interfaces using both O-terminated SiO2 slabs and Si-terminated slabs. For each Si/SiO2 crystalline interface configuration, we perform force and energy minimization, allowing the positions of atoms and the simulation box size to change simultaneously. Following geometry optimization, MDs simulations are conducted for 50 ps at 300 K in the NPT ensemble. First, our potential stabilized the majority of the interfaces in both static and dynamic runs, with the extent of stabilization depending on the orientation of the Si slab and the termination of the quartz slab. For Si(110) in contact with either a Si- or O-terminated quartz slab, our potential successfully stabilizes and describes the dynamics of the resulting interfaces, whether symmetric or non-symmetric. However, our potential only successfully describes symmetric interfaces for Si(100) and Si(111), which are built from quartz slabs terminated by Si. These combinations of termination and orientation were not considered in the training set, showcasing the remarkable generalization ability of our interaction potential. It is worth noting that defects, such as silicon dangling bonds or over-coordinated oxygen atoms, are observed at the interface following both minimization and dynamic runs, as depicted in Fig. 10. As noted in ref. 77, the presence of the dangling bonds is a natural occurrence and constitutes a typical aspect of interface defects. Such anomalies are commonplace and anticipated in these interfaces, owing to the inherent lattice mismatch between the involved materials. Usually, interfacial defects are passivated or special construction schemes are adopted to eliminate them. However, our study does not aim to create defect-free interfaces. Instead, our goal is to evaluate the potential’s capability to manage complex heterostructures with varying bonding types not encountered during the training process. To further validate our potential for silicon and silica interfaces, we compare the interfacial energies of small models computed using MTP and DFT. Detailed descriptions of these small models are provided in the Methods section. Our results (shown in Fig. 11) exhibit good correlation between DFT and MTP models. Importantly, configurations generated by the MTP potential through relaxation and MD simulations converged easily, typically requiring fewer than 50 iterations of single-point energy calculations of DFT. This highlights the reliability of interfacial configurations generated by the MTP potential. These findings demonstrate the capability of MTP potentials to effectively investigate heterosystems containing Si–SiO2 interfaces. The relaxed small models of the Si–SiO2 interfaces are presented in the supporting information (Figs. S11 and S12).

Fig. 10: Interface structure of Si–SiO2.
figure 10

Geometry optimization and MD equilibration of 50 ps simulation: Si(010)/α-Quartz (001) (top) and Si(110)/α-Quartz (001) (bottom) a, c Relaxed at 0 K, b, d Annealed at 300 K.

Fig. 11: Interface energy of Si–SiO2.
figure 11

Comparing interface energies of silicon and various silica polymorphs, including amorphous silica, using MTP potential and DFT. The orange line indicates the diagonal y = x, corresponding to a perfect correlation.

Discussion

In this work, we have successfully parametrized a ML potential that can implicitly capture and describe different charge states. Additionally, this ML potential has the remarkable ability to describe disjoint zones and hetero-zones of the configurational space of SiO2 and its constituent elements, Si and oxygen. The potential description of various phenomena-including point defects, diffusion in Si crystals, extended defects, the liquid phase of Si and SiO2, and the amorphous phase of Si and SiO2-either rivals or outperforms existing potentials. The potential exhibits very good agreement with experimental data in challenging configurational zones, such as the amorphous state, even though these configurations were not included in the training data. In many scenarios, such as disordered phases (liquid, amorphous), the potential achieved a near-perfect match with the reference method in terms of accuracy, using exactly the same simulation time(very short) and conditions. Even when utilizing longer simulation times and large system with a semi-empirical model, it does not reaches a level of accuracy similar to the reference method. Furthermore, the potential displays an intriguing capability regarding dislocation behavior. It autonomously transitioned the C1 structure to the C2 structure without the need for manual reconstruction. While there is a growing consensus that ML potentials can effectively serve as surrogates for DFT in terms of accuracy and speed, generalization including charge state modeling remains a challenging task. This study provides evidence that reaction coordinates are sufficient to implicitly capture charge transfer or charge states involved in chemical reactions. Aside from potentials explicitly considering charge transfer, separate ML potentials are developed for each individual chemical element or compound which database is constructible by DFT. Our approach suggests this is not necessary; compounds and their elemental constituents can be trained jointly. Indeed, joint parametrization, where parameters are derived simultaneously for both silicon, silica and oxygen, offers several advantages. By employing a single potential to describe the interactions between atoms in both silicon and silica, the computational model becomes more streamlined and easier to manage. The unified potential can save time and resources by avoiding the need to recalibrate parameters of the model for each of the materials involved. The unification idea is also important for some areas of application, such as interfacial modeling for electronic devices, energy storage and conversion, and surface coatings and tribology. Here, the reference data for ML potential must include both the individual materials in contact as well as the boundary region. In addition, as chemical reactions can occur in MD simulation, good modeling of a multi-component material or complex systems under certain conditions requires joint parametrization of the considered system and its constituents. For instance, oxygen aggregation in high-temperature MD simulations was observed by researchers, as noted in ref. 34 (supporting information). This observation led to the inclusion of oxygen molecules in the training set by the researchers. Our preliminary study pertained to a semiconductor and its oxide. However, whether this approach can be generalized to other elements and mixtures-including multi-component alloys and compounds remains to be seen. To ensure a well-implemented unified interaction potential, several other aspects need to be explored in the future. Given that the compound and its constituents are not located within the same zone of the configurational space, achieving an accurate, efficient, low-cost, and general unified potential for both the material and its oxides with limited data may require adjustments to the underlying mathematical model, the fitting procedure, and the database sampling methods (including active learning). These adjustments could help attain the same level of accuracy at a more affordable computational cost, resembling a feature of potentials parameterized for a single compound. Likewise, while this study achieved a joint description of Si and SiO2 using the MTP framework, it is likely that other currently developed ML potential frameworks would have led to a similar result. In conjunction with these questions, our aim in the future is to extend this work by incorporating the element of hydrogen to model silica gels.

Methods

Ab initio calculations

The database was constructed using DFT, as implemented in the Quantum ESPRESSO78 package. The exchange-correlation potential was treated using the generalized gradient approximation of Perdew-Burke-Ernzerhof (GGA-PBE)79. Projector augmented waves (PAW)80 were employed. Kinetic energy cutoffs of 884 eV for Si and 1224 eV for both SiO2 and oxygen were chosen. In all calculations, the Brillouin zone was sampled using the Monkhorst-Pack grid81 scheme. Different k-points were used for each polymorph, including an 8 × 8 × 8 for the ordinary phases of Si and an 11 × 11 × 11 for SiO2. The gamma point was used for oxygen molecules.

ML model: the MTP

In this work, the MTP46 was chosen as the ML model. The MTP is a multi-component potential. In a previous comparative study82, it demonstrated a favorable trade-off between accuracy and computational speed across a range of modeling problems. The model derives its name from its use of a tensorial representation of atomic coordinates and utilizes linear regression to determine the local atomic energy. These local atomic energies are subsequently summed to obtain the total energy of the system under consideration. The MTP model considers the total energy of a specific atomic configuration as a sum of individual atomic energy contributions.

$${E}_{Total}=\mathop{\sum }\limits_{i=1}^{n}{E}_{i}=\mathop{\sum }\limits_{i=1}^{n}{V}_{local}({\zeta }_{i})$$
(1)

The argument ζi is a tuple ζi = (rij, τi, τj) containing the relative coordinate rij and atomic types τi, τj. Here, Vlocal is approximately computed within the sphere or circle of radius (Rc) of 5.7 Å, beyond which the central atom no longer feels any interaction. Practically, in the MTP framework, the expansion of the atomic energy Vlocal into basis functions Bβ serves as the foundation for linear regression.

$${V}_{local}=\sum _{\beta }{c}_{\beta }{B}_{\beta }$$
(2)

Since the potential energy function Vlocal is smooth, the force acting on an atom k at position rk in a given configuration xq can be calculated by taking the gradient of the total energy.

$${F}_{k}({x}_{q})=-\nabla {E}_{Total}({x}_{q})=-\sum _{i}\frac{\partial V({\zeta }_{i})}{\partial {r}_{k}({x}_{q})}$$
(3)

The virial stress within an atomic configuration xq of volume Ω can be expressed as follows.

$${\sigma }_{ij}({x}_{q})=\frac{1}{2{{\Omega }}}\sum _{k\in {{\Omega }}}\sum _{l\in {{\Omega }}}({x}_{i}^{(l)}-{x}_{i}^{(k)}){F}_{j}^{(kl)}$$
(4)

The functions Bβ in equation 2 are obtained through the contraction of the descriptors. In the MTP model, the descriptors are formed by tensors of atomic coordinates weighted by radial functions. These descriptors consider both the radial distribution and the angular distribution of the neighborhood surrounding each atom. By incorporating information from both the radial and angular aspects, the descriptors capture the local atomic environment in a more comprehensive manner, enabling a more accurate representation of the atomic energy within the MTP framework.

$$M_{\mu,\nu}(r_{ij}, \tau_i, \tau_j) = \sum\limits_jf_\mu(|r_{ij}|,\tau_i, \tau_j) \mathop{\underbrace{r_{ij}\otimes\cdots\cdots\cdots\otimes r_{ij}}}\limits_{{\nu\,{\rm{times}}}}$$
(5)

The radial function fμ, is further expanded using radial basis functions Q(α) and fitting parameter \({c}_{\mu ,{\tau }_{i},{\tau }_{j}}^{(\alpha )}\) as expansion coefficients. This expansion allows for a more flexible and accurate representation of the radial dependence of the atomic interactions.

$${f}_{\mu ,}(| {r}_{ij}| ,{\tau }_{i},{\tau }_{j})=\sum _{\alpha }{c}_{\mu ,{\tau }_{i},{\tau }_{j}}^{(\alpha )}{Q}^{(\alpha )}(| {r}_{ij}| )$$
(6)

The model parameters \(\Theta =({c}_{\beta },{c}_{\mu ,{\tau }_{i},{\tau }_{j}})\) are determined during the minimization of the cost function as given by Equation (7).

Data curation and optimization

We acquired ab initio data using established methods and databases from prior research. The database construction involved two methodologies, namely manual processing24,82,83 and active learning84, as explained deeply in supporting information. Specifically, we referred to the dataset created for the GAP for silicon24, the comparative study82, and active learning techniques detailed in ref. 84. For liquid silica, we utilized the temperature range (1000 K–5000 K) from previous databases specifically designed for neural network interatomic potentials (NNIP), which covered temperatures exceeding its boiling point and extended as high as 5000 K, as referenced in ref. 36. While NNIP potentials involve a very large number of adjustable parameters–typically tens of thousands–allowing to jointly describe a large number of off-equilibrium configurations, accommodating such deviations from equilibrium becomes challenging within the MTP framework due to limited numbers of parameters. Effective MTP training, therefore, relies on carefully selecting the training set. Our final training dataset, herein referred to as the unified training set, was constructed through a two-step process: curation and subsequent optimization.

To enhance the quality of our training set and to properly assess the error of the test set, down selection was applied. To begin, we sorted our full database into smaller subsets as elaborated in Tab. S3 through S6 in the supporting information. Within each subset, we then utilize a filtering strategy referred to herein as the “train-remove-train" approach. We first train while monitoring for significant reductions in energy and force errors associated to each configuration as we incrementally raise the MTP level by one unit (We focus on levels 08 to 14 for deformation and defects, while levels 16 to 18 are used for disordered structures). Next, we analyze the error reduction between 2 or 3 consecutive levels. If a significant decrease is not observed, we then eliminate configurations based on factors such as:

  1. (1)

    Total energy: configurations with similar total energies yet differing atomic coordinates to other in the training set, as well as those with fluctuating total energies but nearly identical atomic positions to others in the training set are removed. These configurations originated from relaxation of molecules, single-point calculations of unrelaxed defects, manually constructed unrelaxed jump paths, and strained configurations where lattice parameters or vectors were strained without corresponding adjustments to atomic positions and embedded dimer.

  2. (2)

    Contributions from smearing: this can be primarily attributed to the extensive use of high-temperature AIMD simulations. Silica configurations generated from AIMD simulations, featuring a number of atoms greater than 36, are excluded if they exhibit smearing contributions to the total energy greater than zero.

  3. (3)

    Minimum interatomic distances (this criteria complements the total energy criteria): if multiple configurations from the same batch exhibit similar minimum distances, some are removed. This technique was mostly used for oxygen molecules. For example, we check the interatomic distance in the batch of relaxed O3 molecules. We also discard embedded dimer configurations by comparing their minimum interatomic distances with those of the AIMD configurations. If the minimum interatomic distances are equal, we choose AIMD over embedded dimer configuration.

  4. (4)

    Polymorphism: These configurations are derived from deformations following thermal expansion. Most of these configurations have been accurately computed and were included in the training database. However, some polymorphs arising from displacive phase transformations and polymorphs sharing the same lattice system with identical coordination numbers were excluded. In the case of displacive phase transformation, we retain the parent crystal and exclude the child crystal. When dealing with two polymorphs that share the same lattice system and coordination number, we typically choose one of them.

After removing these undesired configurations, we reinitiate the training process incrementally, following a pattern akin to the first stage. This “train-remove-train” process is iterated until we attain a high level of confidence in the cleanliness of the subset–i.e., all configurations included in the subset are associated with training errors that decrease as the MTP level increases. In the subsequent curation phase, we examine the possibility of extracting an even smaller subset from each previously cleaned subset.84. To achieve this, we use the “select-add" command embedded in the MTP code, as described in ref. 47. We applied the “select-add" command to every cleaned subset.

Training set

In implementing the unified potential, we selected a range of configurations from the optimized and curated database, as outlined in the Data Curation and Optimization section. Similarly, the test set was chosen from the same optimized database to eliminate any overlap between the two sets. While the test set encompasses all properties or types of configurations represented in the extensive database, the training set only includes certain configuration types. This strategic approach aims to ensure the portability of the interaction potential. Note that the database contains configurations of substantial size, with up to 1000 atoms. However, we constrained the maximum box size in the training set to 36, except for the interfaces set where a few configurations are of size 80. Consequently, the number of atoms in the training set’s boxes ranges from 1 to 80. Conversely, the test set contains configurations of the maximum size found in the database. The specific types of configurations represented in the training set are detailed in Table 5. By restricting the training set to specific configuration types, we aim to enhance the generalizability of our potential by simulating properties not present in the training set. Within the current implementation of the MTP potential, we do not employ a validation set in the typical manner used to estimate overfitting or underfitting during the training of neural network models. Instead, we opt to utilize an offline test set for this purpose. To gauge the potential’s portability more comprehensively, we perform MD simulations for properties that were not included in the training set.

Table 5 Final refined and optimized training dataset derived from extensive uncurated database

Training and validation

The cost function, as described by Equation (7), was minimized using the Broyden-Fletcher-Goldfarb-Shanno algorithm, a quasi-Newton optimization method implemented in the MTP framework. Fundamentally, training the MTP model with atomic configurations entails finding the parameter set {Θ} by solving the minimization problem presented in Equation (7). We trained the refined unified training sets as detailed in Tab. S7 (supporting information) and Table 5. First, MTP potentials with parameter sets ranging from 300 to 1600, corresponding to levels 18 to 26, were used to train set 1 (Tab. S7 and Fig. S1a) as part of preliminary works, including optimization, training mode, and testing. The preliminary works are also presented in supplemental information from Figs. S5S8. At this stage, two training modes were employed: vibration mode and structures weighting mode. Specifically, the potential resulting from the vibration mode was used as input for the structures weighting mode training. The training process iterated until a desirable level of accuracy was achieved. As outlined in the supporting information, we assessed the validation error using the resulting potentials. We employed two independent validation sets, denoted as Validation 1 and Validation 2, which atom distributions are shown in the supplementary Fig. S2 and Fig. S3, respectively. Validation 1 was randomly selected concurrently with the training set from the curated database. On the other hand, Validation 2 consisted of the AIMD cooling and equilibration trajectories at 300 K. These validation sets were utilized as part of preliminary work. For the final implementation, we utilized the level 28. Based on the preliminary work, we opted for the structure weighting mode. The two-step training mode, as applied to large-size configurations in set 1 (Tab. S7 and Fig. S1a)), was deemed unnecessary. Given that our final training set (Table 5) comprises small cell configurations, we exclusively employed the structure weighting mode. We utilized the resulting potential to conduct both static calculations and MD simulations, with the outcomes presented in the main text.

$$\begin{array}{ll}\mathop{\sum }\limits_{i=1}^{n}&\left[{w}_{e}{({E}^{mtp}({x}^{(i)},\Theta )-{E}^{qm}({x}^{(i)}))}^{2}+\right.\\ &{w}_{f}\mathop{\sum }\limits_{j=1}^{{N}_{a}({x}^{(i)})}| {F}_{j}^{mtp}({x}^{(i)},\Theta )-{F}_{j}^{qm}({x}^{(i)}){| }^{2}\\ &\left.+{w}_{\sigma }| {\sigma }^{mtp}({x}^{(i)},\Theta )-{\sigma }^{qm}({x}^{(i)}){| }^{2}\right]\,\to \,min\end{array}$$
(7)

Here, Eqm, Fqm, σqm denotes the values of energy, force, and stress computed by the quantum mechanical approach (DFT), while Emtp, Fmtp, σmtp represents the corresponding values obtained from the MTP model. we, wf, wσ are the relative weights indicating the importance of the energy, the force and stress in optimization procedure.

Static calculations

This section summarizes the mathematical procedure used to determine static properties presented in the article.

For the chemical component with a formula XlYmZn, we calculate the cohesive energy as follows:

$${E}_{coh}={E}_{{X}_{l}{Y}_{m}{Z}_{n}}-(l{E}_{X}+m{E}_{Y}+n{E}_{Z}).$$
(8)

Where \({E}_{{X}_{l}{Y}_{m}{Z}_{n}}\) represents the energy of the supercell of the compound, while EX, EY, and EZ correspond to the energies of the isolated atoms. The subscripts l, m, and n indicate the number of X, Y, and Z atoms present in the building block \({E}_{{X}_{l}{Y}_{m}{Z}_{n}}\) of the material. Due to variations in the number of atoms within the primitive cell of each polymorph compared to the standard structural configuration, we normalize the cohesive energy by dividing it by the number of atoms present in the regular phase.

Points defects formation properties such as vacancy formation energy (\({E}_{v}^{f}\)) was calculated using this equations:

$${E}_{v}^{f}={E}_{{N}_{0}-1}-\frac{{N}_{0}-1}{{N}_{0}}* {E}_{{N}_{0}}.$$
(9)

For interstitial formation energy \({E}_{i}^{f}\), we used:

$${E}_{i}^{f}={E}_{{N}_{0}+1}-\frac{{N}_{0}+1}{{N}_{0}}* {E}_{{N}_{0}}.$$
(10)

In equations (9) and (10), N0 and \({E}_{{N}_{0}}\) correspond to the number of atoms and total energy of a perfect supercell.

Particularly, vacancies in SiO2 polymorphs were estimated considering a neutral state. Thus, the formation energy in SiO2 polymorphs was calculated using:

$${E}_{f}={E}_{vac}-{E}_{bulk}+{\mu }_{O}.$$
(11)

In this equation, Evac and Ebulk represent the energy of the supercell containing the oxygen vacancy and the energy of the bulk supercell, respectively. The chemical potential is defined as half of the energy of a dioxygen molecule (\({\mu }_{O}=1/2\ast {{\rm{E}}}_{{O}_{2}}\)).

The equilibrium bulk modulus which correspond to the curvature of the energy-volume curve at its minimum was derived from the second-order elastic constants85. We calculate elastic stiffness constant Cij using central finite difference formula.

$${C}_{ij}=\frac{{P}_{i}^{(+{\varepsilon }_{j})}-{P}_{i}^{(-{\varepsilon }_{j})}}{2* {\varepsilon }_{j}}.$$
(12)

where \({P}_{i}^{(+{\varepsilon }_{j})}\) is the ith component of the stress tensor when the configuration is strained only by jth component (εj) of the strain vector (\(\overrightarrow{\varepsilon }\)). After applying directional or isotropic deformation, the atomic positions undergo relaxation while the overall box size remains fixed. We compute the generalized stacking fault energy (γ(u)) by incrementally shifting the upper crystal half along the slip direction and assessing energy differences per unit area (A) of the fault plane.

$$\gamma (u)=\frac{E(u)-{E}_{o}}{A}.$$
(13)

where, Eo represents the energy of the perfect crystal, while E(u) denote the energy of the supercell with the fault vector u which is directly proportional to the Burgers vector (b). Surface energy is also calculated using the following expression:

$$\gamma =\frac{{E}_{slab}-N{E}_{bulk}}{2A}.$$
(14)

In this context, A refers to the area of the slab, N represents the number of atoms in the slab, while Eslab and Ebulk denote the total energy of the slab and the bulk energy per atom, respectively.

Si–SiO2 Interface construction

Previous reports indicate that defects are commonly observed at the interface Si–SiO2 due to the imperfect matching of the two materials. To avoid considerable lattice mismatch, we utilize specific techniques. First, we rotate the alpha-quartz structure to achieve a tetragonal configuration. Next, we duplicate both the silicon crystal and alpha-quartz structure, ensuring that the lattice dimensions perpendicular to the interface direction closely match. This approach enables us to apply a small strain (<2%) to the lattice vectors before forming the interface. Technically, the lattice mismatch α can be defined as the relative difference in lattice parameters between two crystalline materials, often expressed as a percentage or in terms of the absolute difference in lattice constants along specific crystallographic directions:

$$\alpha =\frac{n* {L}_{1}-m* {L}_{2}}{n* {L}_{1}+m* {L}_{2}}.$$
(15)

Lattice duplication factors are represented by integers n and m; L1 and L2 denote the lattice parameters of a given direction. In both cases, symmetric and asymmetric interfaces were constructed for both oxygen-terminated and silicon-terminated quartz slabs, incorporating Si (100), Si (110), and Si (111) slab orientations. Our objective is not solely to construct a flawless interface representation of a naturally occurring or real-world interface, but rather to explore the versatility of the potential. We then estimate the interface energy using:

$$\gamma =\frac{{E}_{S}-({n}_{Si{O}_{2}}* {E}_{Si{O}_{2}}+{m}_{Si}* {E}_{Si})}{A}.$$
(16)

Where A represents the area of the interface, \({{\rm{n}}}_{Si{O}_{2}}\) and mSi represent the number of formula units of SiO2 and Si in the interface system. ES is the energy of the supercell containing the interface. The terms \({{\rm{E}}}_{Si{O}_{2}}\) and ESi correspond to the energy of silica and silicon per formula unit, respectively. Due to the impractical size of duplicated models for energy computation via DFT, smaller superlattices were also constructed involving Si (100) interfacing with α-quartz, β-cristobalite, α-cristobalite, β-tridymite, and amorphous silica. This facilitated comparison between results obtained using MTP potentials and those from DFT. Each simulation box contains two distinct interfaces. Initially, these small models were relaxed at 0 K using MTP potentials. Subsequently, MTP-driven MD simulations were performed at temperatures of 300 K, 500 K, 800 K, and 1200 K for 100 ps, and configurations were selected from the trajectories. The energies of these selected configurations, as well as the relaxed configurations, were then computed using DFT-based single-point energy calculations.

MD simulation

AIMD simulation was carried out using Quantum Espresso using the parameters as described in the section “Ab initio calculations details." The integration timestep was set to 1 fs for Si and 2 fs for SiO2. The ionic temperature during simulations was controlled using velocity rescaling.

Force-field MD simulations were performed using the large-scale atomic/molecular massively parallel simulator (LAMMPS) software package86. While it is impractical to perfectly replicate the MD settings in Quantum Espresso within LAMMPS, we aimed to make them as close as possible. To generate disordered structures, we employed the velocity rescaling thermostat to control the temperature during the simulations. The time integration used the same time steps as in the AIMD simulations. For studying point defects diffusion in Si and self-diffusion in SiO2, the Nosee-Hoover thermostat87 was employed. The latter simulations were performed using a timestep of 1 fs, and the damping parameter was set to 100 fs.