Introduction

Ga2O3 emerges as a vital next-generation wide-bandgap semiconductor that has stimulated enormous research and application interests over the last decade1,2. Desirable physical properties, such as a wide bandgap (4.8–5.3 eV), a large critical electric field (~8 MV cm−1), a small electron effective mass (0.27–0.28m0), and transparency deep into the ultraviolet (UV) range, bring Ga2O3 forward for highly promising utility, especially in the field of high-power electronics3,4, solar-blind UV detectors5,6,7, high-temperature gas sensing8,9, and low-dimensional devices10,11,12.

A particular difficulty for precise engineering of Ga2O3 largely arises from the complex polymorphic nature of this material. Five different polymorphs of Ga2O3, namely α, β, γ, δ and κ, are analogous to those of Al2O3. In addition, there are somewhat inconsistent descriptions of a possible ϵ-phase13,14,15,16. The latest experimental finding14 indicates that the ϵ-phase consists of three κ-phase domains connected with 120° pseudo-hexagonal symmetry. Therefore, for consistency, we will refer to this phase herein as the κ-phase of the space group Pna21.

The most stable Ga2O3 phase at room temperature and ambient pressure is the monoclinic β-phase. Some metastable phases, such as the κ- and α-phases,—the second and the third most stable phases, respectively—can be synthesized under high-temperature/pressure conditions17,18, or by means of carefully controlled thin-film growth methods19,20. Moreover, some properties of Ga2O3 in these metastable phases were reported to be superior to those of the β-phase. For example, the hexagonal α-phase has a still wider bandgap (5.25–5.3 eV21) than the β-phase. The α-phase can grow heteroepitaxially on a c-plane of sapphire substrates with higher quality than that grown from the β-phase22. In turn, the κ-phase has demonstrated ferroelectric properties23,24,25 favorable for generation of high-density two-dimensional electron gas in power devices. Therefore, the precise control over Ga2O3 polymorphs (phase engineering) is of great research interest and under active experimental scrutiny. Formation of strain-induced metastable phases were also observed not only during low-temperature heteroepitaxial growth on sapphire substrates26,27, but also under high fluence strongly focused ion-beam radiation28,29. However, subsequent annealing may cause strain relief triggering solid-state phase transition back to the most stable β-phase27,30.

Extensive experimental studies of Ga2O3 this far were complemented only by computational modeling on small-scale Ga2O3 atomic systems (less than 103 atoms) using computationally expensive ab initio calculations. Recent results obtained by density functional theory (DFT) for Ga2O3 report accurate structural and electronic properties of the perfect bulk phases16,31,32,33, formation energies of point defects34,35, and surfaces36,37. However, to study Ga2O3 phase engineering that involves solid-state phase transition in thousand- or tens-of-thousand-atom systems with coexisting polymorphs, one will inevitably need large-scale classical atomistic modeling using molecular dynamics (MD) methods. Carrying out MD simulations has been hindered by the high complexity of the Ga2O3 polymorphs and the lack of reliable interatomic potential (force field) models. Fortunately, accurate large-scale MD simulations of the Ga2O3 system is now possible with the rapid advance of machine-learning (ML) interatomic potentials38,39.

Among many existing ML algorithms, one particularly well-established branch is the Gaussian approximation potential (GAP) based on the framework of Gaussian process regression40. The techniques of database construction, active/iterative training, and potential validation have been extensively studied and verified for single-element systems41,42,43, and oxides44,45. Currently, a ML-GAP46 and a deep neural-network potential47 were developed independently for perfect bulk β-Ga2O3 system with accurate prediction of its thermal properties. A ML-GAP was recently developed for studying the thermal properties of the amorphous Ga2O3 system48. In our previous study, a ML-GAP was developed to describe two-dimensional phases of Ga2O3 with great accuracy, which successfully revealed specific kinetic pathways leading to phase transitions49. To date, however, there is few ML interatomic potential of extended generality that is required to describe the wide range of complex Ga2O3 bulk polymorphs and disordered structures. Therefore, to meet the needs of the current experimental applications and extend the computational toolbox to larger spatiotemporal domains, we here develop two generalized ML-GAPs for simulations of all β, κ, α, δ, and γ phases simultaneously. Moreover, we further extend their applicability to disordered, dispersed, and high-energy physical processes. This work opens up fundamental possibilities for computational exploration of large-scale dynamical processes in complex Ga2O3 systems.

Results and discussion

Generalized database and two GAP formalisms for Ga2O3 system

The fidelity of data-driven ML interatomic potentials heavily relies on the consistency and generality of the input data. In our case, the consistency is guaranteed by well-converged accurate GGA-DFT calculations (see Section Methods, and Supplementary Note 1), and the generality is ensured by a wide range of Ga-O structures as illustrated in Fig. 1, where a representative overview of the training database and its structures are shown. In total, 1630 configurations with 108,411 atomic environments are included in the selected database. The potential energies, atomic forces, and virial stresses are stored for training of ML-GAPs. As shown in Fig. 1, we arrange and classify the atomic configurations based on their degrees of physical (and/or chemical) ordering, starting from the perfect crystalline β-phase as the global energy minimum to the disordered high-energy structures.

Fig. 1: An overview of the DFT database.
figure 1

The fractions of the configuration-type-specific atom numbers to the total number (108,411) of atoms in the database are shown in the central part of the donut chart. For different configuration types, representative structures are shown as examples. In the panels (i) and (ii), the colored polyhedra within the crystalline structures show the 4-fold (blue)/5-fold (purple)/6-fold (green) Ga sites. Note that the isolated Ga and O (not shown here) are also included in the database as the global references of potential energies.

The four constituent parts of the database are shown in the respective panels of Fig. 1: (i) vital crystalline Ga2O3 polymorphs; (ii) non-stoichiometric GaOx including pure-Ga phases; (iii) disordered bulk structures (amorphous) and melted (liquid) phases; and (iv) widely spread random structure search43, structures from active training44 and dispersed gaseous configurations. Clearly, the part (i) with high-symmetry configurations is essential for modeling Ga2O3 polymorphs accurately, and the part (ii) is necessary to keep the overall chemical potential correct. Therefore, these two parts of the database are constructed and selected manually. In the part (i), we include the five experimentally identified polymorphs, and extend the data search to three computationally-predicted Ga2O3 phases, namely, the Pmc21 and \(P\overline{1}\) phases from ref. 50, and the hex* phase from ref. 51. In the part (ii), in addition to the experimentally known pure Ga phases, the five computationally predicted GaOx lattices are selected from The Materials Project52 database. On the other hand, the parts (iii) and (iv) consisting of low-symmetry, high-energy and dispersed configurations are stochastically generated to cover a wide range of atomic environments with the importance of diversity outweighing accuracy. Another important aspect is that highly strained configurations are explicitly included in the database. For the parts (i) and (ii), all the bulk lattices are compressed and stretched uniformly with relaxed internal atomic positions for local minima under strain, and additional ab initio MD (AIMD) to create randomized local atomic environments. For the disordered bulk structures in part (iii) and (iv), such as amorphous, melted and random structure search, the random cells are sampled with different densities, e.g., the densities of amorphous cells ranging from 3.8 to 5.4 g cm−3 with seven uniformly increasing steps.

Although the database is consistently calculated using GGA-DFT with the same level of ab initio accuracy, even a diverse database is for any ML algorithm still prone to overfitting. To achieve high accuracy for all five Ga2O3 polymorphs and yet retain a smooth interpolation between the more disordered structures, one essential part of our ML training is a set of expected errors used for regularization (σ)38. These σ values are chosen manually with systematical tests based on in-depth understanding of the physical nature of the atomic configurations. In general, small σ values (0.002 eV for potential energy and 0.01 eV Å−1 for force component) are set for vital crystalline polymorphs, while larger values (0.0035–0.01 eV, 0.035–0.1 eV Å−1) are used for non-stoichiometric GaOx and disordered configurations. Moreover, to enable the application of the developed potentials for ion irradiation simulations, we pay particular attention to the short-range interactions that take place in high-energy collision cascades (see Supplementary Note 4). We achieve the accurate description of interatomic repulsion by employing a set of external repulsive pair potentials explicitly fitted to the all-electron ab initio data53 at short range. These repulsive potentials are included when training the GAP, so that only the difference between them and GGA-DFT has to be machine-learned. A well-behaved potential energy landscape from near-equilibrium to highly repulsive interatomic distances is ensured by also including high-energy structures in the GGA-DFT training database54. More detailed information of the database construction and the GAP training process is provided in Supplementary Notes 2, 3, and 4.

One major advance in our ML-GAPs is the choices of local-atomic-environment descriptors. In the GAP framework, the contributions to the total machine-learned energy prediction can be supplied by multiple terms using different descriptors38. The commonly-used combination of a two-body (2b) and the high-dimensional smooth overlap of atomic positions (SOAP) descriptor can distinguish different atomic configurations with high sensitivity and leads to high accuracy55,56. However, the computational cost of soapGAP is fairly high, which limits the accessible spatiotemporal domain of MD simulations (with reasonable computational effort) only to the order of 104 atoms and 1 ns. To extend the length and time scales of an MD simulation to millions of atoms and tens of nanoseconds, we also employ the recently developed framework of tabulated low-dimensional GAPs (tabGAP)57. In tabGAP, a combination of only low-dimensional descriptors is used (2b + 3b + EAM). Here, 3b refers to the three-body cluster descriptor58 and EAM is a descriptor corresponding to the pairwise-contributed density used in embedded atom method potentials57 (see Supplementary Note 3 for details). The low dimensionality of the descriptors ultimately limits the flexibility, and hence also the accuracy, of the interatomic potential. However, the advantage is that the low dimensionality allows for tabulating the machine-learned energy predictions onto one- and three-dimensional grids, so that the energy can be computed with efficient cubic-spline interpolations rather than the Gaussian process regression of GAP. A speedup of two orders of magnitude is achieved by this tabulation.

With these strategies, we propose two versions of ML-GAPs: (1) a soapGAP that is very accurate but computationally slow (~8 × 102 MD steps per (satom)) and (2) a tabGAP that has lower accuracy compared to the soapGAP, but much higher computational efficiency (~3.2 × 105 MD steps per (satom)). The former is suitable for detailed medium-scale MD simulations and the latter, which is 400 times faster than soapGAP (more details in Supplementary Note 7) is advisable for large-scale or long-time MD simulations, including high energy collision cascade simulations.

The validation of the soapGAP and tabGAP is shown in Fig. 2, where both potential energies and force components are plotted against the reference DFT data. The potential energies predicted by soapGAP follow tightly the DFT calculations in the entire range from the global minimum (Ep(β-Ga2O3)= − 5.969 eV per atom) to the −4.5 eV per atom (well above the average energies of the high-temperature liquid phases). All points lie along the solid diagonal line in Fig. 2a within the two dashed lines with the reference deviation of ±0.1 eV per atom, indicating a good agreement with the DFT data. The tabGAP-predicted points follow the the diagonal line tightly from −5.969 to −5.750 eV per atom, but have several grouped points deviating from the line in the higher energy range. Indeed, the soapGAP yields better accuracy and precision comparing to the tabGAP as further confirmed by the standard deviation of the energy error distribution in Fig. 2b, where σtabGAP 2.4σsoapGAP. This is expected, because the high-dimensional SOAP descriptors can better distinguish the small differences in atomic environments, especially for element-specific differences. Nevertheless, the tabGAP shows good accuracy for the important crystalline configurations. We note that one particular difficulty for tabGAP is the non-stoichiometric GaO phases which are shown as the grouped points outlying the ±0.1 eV per atom reference zone around −5.0 eV per atom in Fig. 2a. However, this issue is not critical for the general utility of the tabGAP, as demonstrated later. More detailed notes of the GAP-predicted non-stoichiometric GaOx phases is included in Supplementary Note 5. Second, the validation of force components indicates a comparable trend between the soapGAP and tabGAP as shown in Fig. 2c, d. The tabGAP-predicted points are slightly more scattered than the soapGAP-predicted ones, with a very small number of outlying points far from the ±2.0 eV Å−1 reference zone. The standard deviations of the force error distribution are σtabGAP 3.33σsoapGAP, similar to that of the energy errors.

Fig. 2: The validation of the soapGAP and tabGAP.
figure 2

Scatter plots of (a) energies and (c) force components versus DFT data, and the corresponding histogram plots of the probability densities of the errors. Note that the validated range of the energies and forces in (a) and (c) are two orders of magnitude larger than the span of the errors in (b) and (d). The distributions of the errors are fitted with the Gaussian distribution with the means (μ) and standard deviations (σ) listed in (b) and (d). This validation is done for all the important solid-bulk configurations, i.e., crystalline Ga2O3, non-stoichiometric GaOx, amorphous, and random structure search, with 1202 energies and 205,830 force components in total.

We note that both the soapGAP and tabGAP errors are two orders of magnitude smaller than the validation span for both potential energies and force components. These overall balanced small errors, and more importantly, correct physical predictions are optimized and achieved with many iterations of benchmark training and testing. For the two ML-GAPs, our primary aim is to achieve good accuracy for the Ga2O3 polymorphs, while retaining physically reasonable predictions in a wide region of the configuration space. The raw energy and force component errors tell little about the actual physically important properties. For example, the energy difference between some of the polymorphs is very small, so that even with a small error in the energy one might still have an incorrect order of stability for the crystal phases, such as a false global minimum different from the β-phase. Therefore, we continue testing the soapGAP and tabGAP in more realistic physical scenarios in the next sections.

Five individual polymorphs

We begin the demonstration of generalized accuracy of our interatomic potentials by discussing the GAP-predicted structural properties of the five Ga2O3 polymorphs. The ground-state lattice parameters of the five polymorphs are listed in Table 1. Strikingly, for these multi-symmetry and diverse configurations, both the soapGAP and tabGAP are in a great agreement with DFT, with the largest error being only 0.35% from DFT values (corresponding to the κ-phase, tabGAP-predicted a values).

Table 1 Lattice parameters (Å) and angle β () for five polymorphs predicted by GGA-DFT, soapGAP and tabGAP.

This great agreement of the structural properties also extends to strained systems, as shown in Fig. 3a. The energies are plotted against the atomic volumes under isotropic strain with the internal atomic positions relaxed to the corresponding local minima. The soapGAP- and tabGAP-predicted energies follow the harmonic curves nicely with a small energy drift of 2–3 meV from the DFT curves, spanning a wide range of volumetric strain from 91.27% to 109.27% (lattice strain from −3% to 3%). This also means that the GAP-predicted bulk moduli of these polymorphs follow the similar trend of DFT data with acceptable deviation (see Supplementary Note 6). The order of phase stability (in reverse order of the zero-strain potential energies) is β > κ > α > δ > γ, the same as the recent accurate hybrid-functional DFT data in ref. 16, and the GGA-DFT data in this work as well. We now focus on an intriguing β/κ/α/δ energy-degenerate (±8 meV per atom) region (see the shadowed box in Fig. 3a), which attracted much attention in a number of recent studies18,27,29,59,60. This region is reported to be a thermodynamically balanced point of α → β/κ → β solid-state phase transitions occurring when the Ga2O3 system is under pressure of 24–44 GPa. Here, we note that the energy-volume curve of the δ-phase lies close to the β/κ/α curves in this region, so it is included in the comparison. As shown in Fig. 3b–d, the β/κ/α balanced point is accurately reproduced by soapGAP with a marginal relative shift of 2 meV per atom in energy. The fine relative energy-volume balance is almost identical to the DFT curves. On the other hand, the tabGAP-predicted energy drifts are larger than the ones from the soapGAP, however, the overall energy-degenerate region lies well within a small energy-volume range of ΔE = ± 4 meV per atom and ΔV = ± 0.05 Å3 per atom. Given the fact that the average atomic kinetic energy at 300 K is ~38.8 meV per atom, the tabGAP-predicted ΔE–ΔV is a good approximation of the energy-degenerate point.

Fig. 3: Five individual polymorphs.
figure 3

Left panel: Comparison of the DFT, soapGAP and tabGAP energies for the five experimentally identified β/κ/α/δ/γ polymorphs. In the overall plot (a), the gray shadow region corresponds to the energy-crossing region magnified and compared closely in (b, c, d). The energy difference between DFT and soapGAP data is less than 2 meV per atom. Right panel: RDFs, g(r), of NPT-MD simulations at 900 K and 0 bar for the five Ga2O3 polymorphs (ei), respectively. The DFT AIMD simulations are run with small 120–160-atom cells, hence the cutoff distances of the AIMD g(r) are set based on the shortest side length of the corresponding cells.

One essential aim of using our interatomic potentials is to attain ab initio accuracy with temporal and spatial limits which cannot be accessed with DFT. Therefore, we further show the radial distribution functions (RDFs), g(r), in Fig. 3 for the five polymorphs from isobaric-isothermic (NPT) MD at 900 K and 0 bar. The soapGAP and tabGAP MD simulations are run with 1000–2000-atom cells, while the AIMD simulations are run with 120–160-atom cells. As shown in Fig. 3e–i, both soapGAP and tabGAP MD runs lead to almost indistinguishably similar RDFs as compared to the AIMD runs, with a marginal scaling mismatch in the pair distance longer than 4 Å. Similarly for all the five polymorphs, the first peaks of the RDFs at ~2 Å correspond to the average Ga-O bond length, whereas the long-range peaks vary depending on the symmetry of oxygen stacking and Ga-site occupation arrangement.

To assess the ability of our potentials to predict thermal properties of the different polymorphys of Ga2O3, we calculate the phonon dispersion curves of the β/κ/α/δ polymorphs using both potentials and compare these curves against the results obtained directly with the DFT method. The results are shown in Fig. 4. Compared to the β-Ga2O3 phonon dispersion calculated using the specific β-Ga2O3 ML-GAP potential46, we see that our general-purpose soapGAP is capable to reproduce most of the analytical phonon branches with only slightly larger deviation (Fig. 4a). As expected, the tabGAP-predicted phonon dispersion curves deviate more from the DFT curves, however, all results follow closely the expected trends and are well within the expected whole phonon band, with one exceptional deviation in the lowest acoustic branch at the Y point (Fig. 4e). Overall, a similar trend is seen for the other polymorphs. Another notable deviation is in the δ-phase (Fig. 4d) where the soapGAP-predicted lowest acoustic branch is larger than the DFT one. However, the soapGAP is more accurate than the tabGAP with respect to the overall mapping of the phonon dispersion curves. Given the fact that the phonon dispersion and thermal properties of the Ga2O3 polymorphs exhibit high complexity and anisotropy, more specifically trained potentials with specialized database are the suitable choice to improve the accuracy for this purpose.

Fig. 4: Phonon dispersion of the five polymorphs.
figure 4

ah Phonon dispersion curves for the β/κ/α/δ-Ga2O3 phases compared with GGA-PBE-DFT calculations. The black, blue, and orange bands are from DFT, soapGAP, and tabGAP, respectively. The non-analytical-term correction of the longitudinal optical (LO)-transverse optical (TO) splitting in an ionic solid at the Γ point can be further included with the DFT-calculated Born effective charge which are currently not included here for a consistent comparison. il The corresponding β/κ/α/δ unit cells (upper) and the first Brillouin zones of the reciprocal lattices labeled with the high-symmetry k-points (lower).

As the last test of properties of specific polymorphs, we run NPT MD simulations at zero pressure and different temperatures to obtain the thermal expansion curves. As shown in Fig. 5, both the soapGAP and the tabGAP predict the lowest expansion rate for the β-phase, but a different order for the remaining four polymorphs. The absolute values of the β/κ/δ-phases are very similar between the soapGAP and tabGAP. The main differences occur in the α- and γ-phases where the tabGAP slightly overestimates the thermal expansion of α-phase, and underestimates the γ-phase. Moreover, the γ-Ga2O3 at elevated temperatures should be treated with a special caution as both of the thermal-expansion curves for this phase deviate from the parabolic function form under such condition. This is likely because the γ-phase is a disordered defective spinel lattice. The average atomic volume is close to the β-phase (Fig. 3) and the sub-lattice of oxygen follows face-centered-cubic (fcc) stacking in both the β- and γ-phases, which may explain why the less flexible tabGAP cannot distinguish the thermal expansion of these two phases. The experimental reference of thermal expansion of the Ga2O3 polymorphs is not very extensive, however, the first-principles calculation61 is in the same order as shown here for soapGAP.

Fig. 5: Thermal expansion.
figure 5

The order of the thermal expansion for the five polymorphs predicted by (a) soapGAP: β < ακγ < δ; and (b) tabGAP: βγ < κ < δ < α. The order predicted by the soapGAP agrees well with the first-principles calculations in ref. 61 (the four dotted lines for β/κ/α/δ in panel (a) with the same colors as the soapGAP ones). The main differences between the soapGAP and tabGAP are the α-phase and γ-phase. The error bars represent the standard deviations caused by temperature/pressure fluctuation.

Disordered liquid and amorphous structures

As general-purpose interatomic potentials, the capabilities of reproducing the disordered structures with ab initio accuracy are vital for complex high-temperature applications. Liquid and amorphous phases of the Ga2O3 are of tremendous interest for experimental studies and applications, such as edge-defined film-fed growth and thermally-driven crystallization.

It is well-known that the amorphous phase is physically defined as a material state with present short-range ordering and absent long-range ordering. Specifically for the amorphous Ga2O3 system, the short-range ordering is predominated by the highly ionic nature of the Ga-O bond (the Pauling’s ionicity of Ga2O3, ~0.4962). Therefore, the short-range ordering can be considered as the high-symmetry localized tetrahedral (4-fold) and octahedral (6-fold) Ga sites with possible over- and under-coordinated sites. On the other hand, the long-range ordering of a given Ga2O3 system relies on the synergetic ordering of the Ga and O sub-lattices. The O sub-lattices in crystalline polymorphs follow close-packed stacking orders, e.g., β-O in fcc, κ-O in 4H-ABCB, and α-O in hcp. The Ga sub-lattice further determines the ratio of the 4-fold to the 6-fold sites in different polymorphs15. Therefore, depending on the ordering of the Ga/O sub-lattices, a Ga2O3 system can be classified into three types: (i) perfect polymorphs; (ii) defective cell; and (iii) disordered cell. The types (i) and (iii) are easily understood with conventional definitions, while the type (ii) is specifically defined here as a Ga2O3 system that has long-range ordered O sub-lattice in close-packed stacking and rather random Ga sub-lattice with low symmetry. This aspect will be discussed in detail in Section Liquid-solid phase transition.

In the following, we analyze the performance of the soapGAP and tabGAP for homogeneous disordered liquid-amorphous Ga2O3 structures. We perform these MD simulations in canonical (NVT) ensemble at 2200 K using a 10,000-atom 50.5- Å3 cubic cell for tabGAP, a 1250-atom 25.25- Å3 for soapGAP, and a 160-atom 12.725- Å3 cubic cell for AIMD. The densities of all three cells are the same 4.84 g cm−3, matching the experimentally reported density of liquid Ga2O3 at 2123 K63. The amorphous Ga2O3 system was simulated in a wide-spread range of densities 3.9–5.3 g cm−364. Exemplary snapshots of the three cells are illustrated in Fig. 6a–c.

Fig. 6: Disordered liquid and amorphous structures.
figure 6

Left panel: Three liquid/amorphous cells with consistent density of 4.84 g cm−3 used for (a) AIMD, (b) soapGAP and (c) tabGAP runs. Middle panel: RDFs and PRDFs of the (d, e) liquid and (f, g) amorphous configurations. Right panel: Bond angle distributions (h, i) of the Ga-O bonds with the cutoff distance of 2.4 Å where the first valley of the RDF lies.

As shown in Fig. 6d–i, we validate the soapGAP and tabGAP for both liquid and amorphous structures by analyzing the RDFs, partial RDFs (PRDFs) and bond-angle distributions. Strikingly, both the soapGAP and tabGAP describe the liquid structures in good agreement with the AIMD reference. For the RDFs, the short-range fingerprint of the liquid structure is the ratio of the first peak to the second peak. Moreover, the element-pair-wise PRDFs can reveal whether these peaks are partially contributed by more than one types of bonds. As clearly seen in Fig. 6e, the first peaks of the liquid RDFs (Fig. 6d) are fully composed of Ga-O bonds, indicating no chemical segregation in the liquid phase. The first peak of the tabGAP RDF is slightly smaller than those of the AIMD and soapGAP RDFs, indicating slightly lower coordination of Ga atoms in the tabGAP liquid system. The second peak is composed of the merged peaks of the first-nearest O-O and Ga-Ga neighbors, which is again in a very good agreement between the results of all three cells. Moreover, we also see in Fig. 6h that the Ga-O bond-angle distributions agree very closely for all three methods. The presented comparisons indicate that the short-range Ga-O interactions within the first RDF peak are captured accurately by both soapGAP and tabGAP.

The amorphous NVT MD simulations are conducted at 300 K and 1 bar pressure by quenching the corresponding liquid systems. The quenching rates are set to 170, 34, and 3.4 K ps−1, for the AIMD, soapGAP, and tabGAP runs, respectively, and the analyses are conducted with additional 300-K NVT simulations after the quenching process. As shown in Fig. 6f, g, the first peaks of the RDFs of amorphous structures are stronger and narrower than those of the liquid phase, with nearly-zero valleys between the first and second peaks (note the different scales between Fig. 6d and f). Very shallow third peaks appear around 4.5 Å corresponding to the second shells of Ga-O PRDFs. The overall agreement of all three methods is good again. Only minor differences are observed at the height of the first peaks of RDFs in soapGAP and tabGAP, and shifted maxima of the bond-angle distributions (Fig. 6i). These two differences indicate a small deviation in the short-range Ga-O bonding configurations. However, some key trends in the bond-angle distributions, such as the vanished signals around 60–70° and the lower shoulder at 150°, are predicted to be exactly the same as the AIMD results. The comparison of fine details in the distributions should be made with caution, since there is a fifty times difference in quenching rate for the three amorphous runs. It is highly probable that some slow atomic movements are not caught in AIMD, which is run on a ~10-ps time scale. We note that our soapGAP-predicted bond-angle distribution is similar to the results obtained with the soapGAP-relaxed cell of the same density reported in ref. 48. Furthermore, a test result shows our potentials can lead to a similar topological ring-size statistics which were reported in ref. 48 as well, which is worthy for a future systematical investigation. Therefore, we conclude that both the soapGAP and tabGAP can reproduce the important features of the disordered Ga2O3 system with high accuracy. This fact encourages us to further employ the interatomic potentials to the complex heterogeneous liquid-solid interface system.

Liquid-solid phase transition

We now utilize the developed interatomic potential to examine the liquid-solid (β) phase transition process which is crucial for the synthesis of Ga2O3 via liquid-phase growth methods65,66 and must be understood fundamentally at the atomic level.

For the tabGAP simulation, we join vertically a perfect 5760-atom β-phase slab with a (100) top surface and a pre-thermalized disordered 5760-atom Ga2O3 cell. The periodic boundary conditions are applied in all directions, as illustrated in Fig. 7a. This way we simulate two liquid-solid interfaces simultaneously in one simulation cell of the size ~36 × 36 × 100 Å3. The two interfacial regions are first relaxed to a local energy minimum to avoid atoms overlapping. We then run NPT MD at 1500 K and 1 bar for 10 ns. The evolution of the atomic structure correlated with the change of potential energy per atom is shown in Fig. 7a, b. Intriguingly, we can distinguish three different stages of the phase transition process that are labeled in Fig. 7b as slow transition, fast transition and only Ga migration. The initial slow transition is characterized by heterogeneous nucleation of the ordered phase within the liquid near the interface which is followed by the spontaneous self-assembling of O into a fcc stacking, while Ga tends to occupy randomly the tetra- and octahedral sites. This slow transition propagates with the rate of ~7 Å ns−1 from the liquid-solid interface.

Fig. 7: Liquid-solid phase transition simulation running with the tabGAP at 1500 K.
figure 7

a Evolution of atomic configurations viewed from [010] direction. The O and Ga are in red and brown, respectively. b Potential energy: three distinct regimes are identified. c Analysis of the atomic displace magnitude (between the 50-ps and 3000-ps frames). A group of unusual low-mobility O atoms are revealed. This group is corresponding to the rapid formation of the fcc O atoms at the defective region of the interface, as illustrated in the inset atomic configurations.

After 3000 ps, the thickness of the liquid layer is down to ~14 Å, which is the critical point when the fast transition phase begins. A distinct first-order transition finishes within 100 ps, and results in a completely ordered O sub-lattice and defective Ga arrangement. Then, the subsequent process only involves migration of Ga atoms, as can be seen in the snapshots at 6000/8000/10,000 ps in Fig. 7a, where the defective Ga atoms gradually recover to perfect β-phase and have a collective mobility as a defect cluster.

For quantitative analysis of the slow transition process, we plot the distribution of atomic displacements comparing the simulation cell snapshots at 50 ps and 3000 ps. The histogram includes all the initially liquid atoms, as shown in Fig. 7c. The overall displacement distributions are fairly uniform and similar for O and Ga atoms at the large magnitudes (5–40 Å), suggesting that a large fraction of Ga and O atoms have similar mobility. However, a peculiar peak appears at 2 Å in the distribution of O, indicating a highly constrained group of O atoms. We identify this group of O atoms by closely tracking all the O atom trajectories, and find that it corresponds exactly to the initial formation of the defective layer at the interface. As shown in the inset figure in Fig. 7c, the O atoms quickly align to fcc stacking (dashed green lines guide the eye) and stay constrained locally throughout the whole transition process. Rather counter-intuitively, this solid-confined low mobility does not occur for the Ga atoms, as the displacement distribution of Ga spread evenly in a wide range of values, indicating that the mobility of Ga atoms is less constrained by the solidification process. In the view of the entire process, this is expected, as the further recovery of the defective layer proceeds only via the movements of mobile Ga atoms.

As a final remark, we note that although the phase transition simulations were performed with tabGAP, the same promotion of fcc stacking alignment and local confinement of the interfacial O atoms is seen in the soapGAP simulations as well. In principle, this curious phenomenon could be verified and studied experimentally with in-situ scanning transmission electron microscopy or other atomic-level imaging techniques.

Ga2O3 is emerging as a promising semiconductor material for industrial applications, but its structural complexity with many stable polymorphs makes it an extremely challenging material to model in large-scale atomistic simulations. We have developed two versions of generalized ML-GAP interatomic potentials for the Ga2O3 system, soapGAP and tabGAP, offering different balances between computational speed and accuracy. Our results demonstrate that both interatomic potentials are capable of describing the five β/κ/α/δ/γ stable polymorphs as well as the amorphous and liquid phases with high accuracy. The simulation of the liquid-solid phase transition reveals the fast formation of the constrained fcc O atoms at the interfacial defective layer followed by the slow migration of Ga atoms. The Ga2O3 database of structures developed in this work can be readily transferred and used as input data for other ML frameworks. In a broad perspective, our interatomic potentials together with the highly intensive ongoing experimental investigations will enable atom-level design of Ga2O3-based applications.

Methods

The DFT calculations were conducted using the Vienna Ab initio Simulation Package (VASP)67, employing the projected augmented-wave (PAW) method68 with 13 (3d104s24p1) and 6 (2s22p4) valence electrons for Ga and O, respectively. The Perdew-Burke-Ernzerhof (PBE) version69 of the generalized gradient approximation was used as the exchange-correlation functional. In the DFT calculations, the electronic states were expended in plane-wave basis sets with an energy cutoff of 700 eV. The Brillouin zone was sampled with Γ-centered k-mesh grids with a maximum spacing of 0.15 Å−1 which was equivalent to a dense 3 × 12 × 6 grid for a monoclinic 12.461 × 3.086 × 5.879 Å unit cell. Gaussian smearing with a width of 0.03 eV was used to describe the partial occupancies of the electronic states. The detailed convergence tests on the plane-wave energy cutoff, and the k-mesh grid are attached in Supplementary Note 1. We chose 10−6 eV and 5 × 10−3 eV Å−1 as the energy and force convergence criteria for the optimization of the electronic and ionic structures, respectively. We note that the high accuracy and consistency of the electronic energy and force sampling is necessary to guarantee the energy consistency of DFT database. This consistency is essential for constructing smooth potential surfaces with GAP and both 2b + SOAP and 2b +3b + EAM descriptors. The low-dimensional 2b + 3b + EAM GAP is for speedup reasons tabulated by mapping the energy predictions of each term onto suitable grids. The pair term (2b) is tabulated as a function of distance, the three-body term (3b) on a grid of \([{r}_{ij},{r}_{ik},\cos ({\theta }_{ijk})]\) points, and the EAM term becomes a traditional EAM potential file where the pairwise density is tabulated as a function of distance and the embedding energy as a function of EAM density. The final energy and force are evaluated with cubic spline interpolations for each term (one-dimensional spline for the pairwise and EAM terms, and three-dimensional spline for the three-body term). More details can be found in our previous works 57,70 and in the Supplementary Note 3. The trainings of the ML-GAPs were done with the QUIP package40,71. The phonon dispersion calculations were performed using the Phonopy package72. The testing ring statistics analysis was done with the R.I.N.G.S. package73.