Introduction

Iron oxides play a crucial role in the modern world. In particular, iron(III) oxide (Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)) has risen to a prominent position among other minerals as a valuable industrial asset, from renewable energy applications1 to innovations in electromagnetic-based devices2,3. For example, magnetite (Fe\(\phantom{0}_3\)O\(\phantom{0}_4\)) and maghemite (\(\gamma\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)) can serve as key components in magnetic separation of labeled cells, targeted drug delivery, tumor treatment via hyperthermia, and as contrast-enhancing agents for magnetic resonance imaging4,5,6.

In particular, hematite (\(\alpha\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)) is likely the most significant of all Fe\(\phantom{0}_2\)O\(\phantom{0}_3\) polymorphs, not only due to its widespread use across various industrial sectors but also for its substantial economic impact within the multi-billion dollar steel industry. As the primary iron ore in steel production, hematite is indispensable in global infrastructure development7. Beyond metallurgy, it plays a crucial role in environmental applications, such as the degradation of organic pollutants and water purification8,9,10. Additionally, it is a key material in emerging technologies, serving as a basis for renewable hydrogen fuel generation, more efficient solar panels, and lithium-ion batteries11,12.

Although computational simulations based on empirical interatomic potentials are important tools for studying a wide range of materials’ structural properties, they still face significant challenges in certain scenarios. One major limitation affecting the reliability of results from classical molecular dynamics (MD) is related to model transferability. Potential models are typically adjusted using experimental data or quantum mechanical simulations under specific conditions for particular substances. Thus, they may fail to produce accurate results when applied to the same substances in different environments. On the other hand, finite-size effects and limited simulation times further hinder the ability of computational experiments using more accurate first principles methods, thus preventing us from realistically mimicking the real world. Machine learning interatomic potentials (MLIP) allow bridging this gap, enabling, at least in principle, large molecular dynamics simulations with accuracy equivalent to ab initio calculations13,14. Recent work by Bienvenu et al15 made a step in that direction by generating an MLIP for some oxides. Nonetheless, an important aspect is the correct description of surface energies as it is crucial for describing the interaction of FeO with other materials.

In this work, we introduce an MLIP trained on accurate density functional theory (DFT) data for use in MD simulations of bulk hematite, surfaces, and other iron oxides. We benchmark various properties predicted by our model against those obtained via DFT calculations and experimental measurements, comparing these results with those derived from existing empirical potentials reported in the literature.

Potential training

The main advantage of machine learning potentials is that they can, in principle, fit any functional form – being able to adapt to describe the interactions between atoms – and don’t need to conform to predetermined functions, as classical potentials do. They can show accuracy comparable to the training data set, enabling ab initio-like precision, at only a fraction of the computational cost13. Here, we have employed message-passing graph neural networks (MPGNN)16,17 to optimize an interatomic potential for \(\alpha\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\) based on DFT data (energies, forces, and stresses). Initially, configuration phase-space sampling was carried out using molecular dynamics simulations with a classical potential at various temperatures and pressures, by applying strains and random displacements, and thermal displacements as well, to the equilibrium geometry. This procedure allowed for a quick generation of geometries for the data set that are representative of the thermodynamic conditions of interest and the elastic properties of the material. The energies, forces, and stresses of these configurations were then calculated using DFT, generating the data needed to optimize the potential. Finally, an interatomic potential based on MPGNNs was trained to reproduce the DFT data.

Configuration space sampling

Fig. 1
Fig. 1
Full size image

Atomic structures of (a) hematite (\(\alpha\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)), (b) magnetite (Fe\(\phantom{0}_3\)O\(\phantom{0}_4\)), and (c) maghemite (\(\gamma\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)). The maghemite structure was generated using the Supercell program to properly describe the distribution of cation vacancies along the c-axis18. Panels (d) and (e) show the hematite surfaces: (d) the {0001} surface, with side and top views (upper and lower images, respectively), and (e) the {01\(\bar{1}\)2} surface, also displayed in side (upper) and top (lower) views.

The quality of the data set is critically important for ML models. Ideally, the used geometries should span the configuration space visited during simulations at the thermodynamic conditions of interest, while being as small as possible, and to include as much information as possible at the same time. Configurations for a hematite supercell containing 120 atoms (\(2\times 2\times 1\) repetitions of the standard unit cell, shown in Fig. 1) were generated via MD simulations using the Clay empirical potential19,20 and the LAMMPS code21. The NPT ensemble was employed with the Nosé-Hoover barostat22,23, where temperatures of 100 to 2000 K and pressures of 1 to 100 atm were applied. This system was simulated for 100 ns, with configuration snapshots taken every 100 ps. Furthermore, a set of 1,530 configurations for a single standard unit cell (30 atoms) was generated by applying random displacements to the atoms (with standard deviations of 0.01, 0.02, and 0.030 Å) concomitantly with strains (\(0\%\), \(\pm 1\%\) and \(\pm 3\%\)) in directions x, y, z, xy, xz, yz and xyz of the unit cell lattice vectors. The same procedure was applied to generate another set of 1020 configurations for a unit cell of the 001 surface (-Fe\(\phantom{0}_{2}\)O\(\phantom{0}_{3}\) termination) with 10 layers (53 atoms), and 520 configurations of the 110 surface with 7 layers (210 atoms). Finally, we also applied random thermal displacements to a single unit cell of bulk hematite to sample 334 configurations representative of temperatures ranging from \(T=300\) K to \(T=2100\) K.

DFT data

The data set was generated by DFT calculations on the previously sampled configurations. We used the SIESTA code24,25 with a double zeta polarizable basis set, a 7\(\times\)7\(\times\)5 k-point grid, and a grid for real space integration with an equivalent plane wave cutoff of 300 Ry. Additionally, we applied a grid cell sampling technique, averaging forces calculated over displaced real space grids (by half of the grid spacing) corresponding to a Monkhorst-Pack-like sampling of 2\(\times\)2\(\times\)2. The calculations were performed with the PBE xc-correlation functional26. A Hubbard correction term of \(U=4.35\) eV was applied to Fe d levels, chosen to yield an energy band gap of 2.2 eV and atomic magnetization of 4.25 \(\mu _B\) for Fe atoms27. The set of calculated energies, forces, and stresses was randomly split into training sets (85.6%), validation sets (4.4%), and test sets (10%).

Training

Graph neural networks, in a natural way, represent connected topological structures such as atoms in molecules and materials. The nodes of the graph encode the atom states, while the interactions between nearby atoms, that is, the local chemical environment, can be factored into the network by exchanging information between neighboring atoms (graph nodes) by message passing. Here, we have used the MACE code16,17 to train a committee of four MPGNN interatomic potentials for hematite with 64 channels, up to \(L=1\) equivariant features and 2 message passing layers (effectively doubling the receptive field of the model) for a maximum of 300 epochs. Using a committee (ensemble average) improves the final accuracy of predictions due to error cancellation, and provides a measurement of the quality of the models, given by the standard deviation of the results of the single models, which translates to a quantitative measurement of extrapolation risk. The RMS errors of the committee for energies, forces, and stresses were 1.1 meV/atom, 29 meV/Å, and 0.59 meV/Å\(\phantom{0}^{3}\), respectively. See the Supplemental Material for additional details.

It is important to point out that the trained MLIP can only be as accurate as the data it was trained on, i. e., all the errors that are intrinsic to the DFT calculations such as the typical overestimation of lattice constants and underestimation of bulk moduli, when using DFT+U with GGA xc-functionals28, will inherently be present in the trained potential.

Potential validation

We benchmarked our trained potential against experimental results and classical simulation data for hematite, generated using several established empirical force fields, namely: a Core-Shell model (CS)29, Clay19,20, Tersoff30,31, Interface (IFF)32,33, and Reax34,35. The Supplemental Material provides detailed specifics on the LAMMPS implementation for each of these potentials.

We compare lattice constants, elastic properties, surface energy, and vibrational frequencies obtained using our ML potential (henceforth called Fe-MLIP) against the selected empirical potentials, ab initio calculations, and experimental values. With regard to the experimental results, in order to consolidate the existing measurements into a single value for each property, we use the bootstrap median estimation of the reported results as the literature consensus. We applied this procedure for the lattice constants, the elastic bulk, Young and shear moduli, and derived properties, such as compressibility. For the elastic tensor constants \(C_{ij}\), very few results have been reported; therefore, in this case, we take the results of Liebermann36 as reference. See more details in the Supplemental Material. For each calculated quantity \(x_i\) we compute its relative deviation from the corresponding experimental result \(x_\text {exp}\) as \(\sigma _i = (x_i-x_\text {exp})/x_\text {exp}\), and also report its average relative deviation, namely

$$\begin{aligned} \langle \sigma \rangle = \frac{1}{n}\sum _{i=1}^n |\sigma _i|, \end{aligned}$$
(1)

where n is the number of calculated parameters.

To benchmark the vibrational properties of the potentials, we have calculated phonon frequencies at the \(\Gamma\) point using the Phonopy code37. The irreducible representations of each mode were used to select Raman and IR active modes. For hematite, Raman active modes belong to the Eg and A1g irreducible representations, IR active modes, to Eu and A2u38,39.

Additionally, we have assessed the thermal expansion properties of our potential. The volume-temperature profiles of hematite calculated with DFT+U and the Fe-MLIP (see the SI for methodology) are compared in Fig. 2 alongside experimental results obtained by Nešković et al.40 and Gorton et al.41. The DFT V(T) curve agrees well with experiment, especially that of Nešković et al.40, up to the Néel temperature, \(T_{N\acute{e}el}=950\) K42, at which hematite undergoes a phase transition to a paramagnetic state. The Fe-MLIP reproduces the DFT reference method up to the Debye temperature, \(T_{Debye}\approx 488\)K (taken as the average of the results in Refs43,44,45., which range from 324 K to 641 K). Above \(T_{Debye}\), the Fe-MLIP V(T) presents almost linear behavior with respect to temperature (see Fig. S3). This behavior points that although structures corresponding to high temperature configurations, and with different stresses have been included in the training set, they do not fully capture the dependence of phonon frequencies with respect to volume for \(T>T_{Debye}\).

Fig. 2
Fig. 2
Full size image

Volume expansion of hematite with temperature calculated with DFT, and our Fe-MLIP potential, compared to the experimental data of Nešković et al.40 and Gorton et al.41.

The volumetric thermal expansion coefficient (\(\alpha _V\)) was calculated using MD as well. This approach includes full anharmonic effects, though it is computationally much more demanding. Thus we calculated \(\alpha _V\) only at \(T = 298\) K for bulk hematite using a \(3 \times 3 \times 1\) supercell in the NPT ensemble at 1.0 bar. Systems were simulated from 200 to 400 K with 100 ps equilibration and 100 ps production phases. \(\alpha _V\) was determined via linear regression of the volume-temperature data:

$$\begin{aligned} \alpha _V = \frac{1}{V_0} \left( \frac{\partial V}{\partial T} \right) _P \end{aligned}$$
(2)

where \(V_0\) is the reference volume at 298 K.

As shown in Table 1, the Fe-MLIP potential provides the best agreement with experimental values, followed by the CS model. The Clay and IFF potentials overestimate expansion, Reax substantially underestimates it, whereas the Tersoff potential fails qualitatively, predicting an unphysical negative expansion (not shown).

Table 1 Volumetric thermal expansion coefficient from MD at \(T = 298\) K and deviation from experimental value41.

Results

Structural properties of Hematite

Table 2 Lattice constants obtained from our classical and DFT simulations. The lattice parameters a, b, and c are given in Å. Relative deviations from experimental values46 are color-coded. The color bar for each row is scaled by the maximum absolute deviation from the experimental result in which red indicates a positive deviation and blue indicates a negative one. The average deviation, \(\langle \sigma \rangle\), was calculated using Eq. (1) and is expressed as a percentage.

The most stable structure of hematite is trigonal, with space group \(R\bar{3}c\), and its cell is shown in Fig. 1(a). From X-ray experiments46, the lattice parameters are \(a = 5.038 \pm 0.002\) Å and \(c = 13.772 \pm 0.012\) Å, giving \(c/a = 2.733 \pm 0.015\). Overall, results from classical simulations show good agreement with experimental data, as can be seen in Tab. 2. The majority of calculated values deviate by less than a few percent from experimental results, which is expected given that lattice constants are typically primary parameters in empirical potential parametrization. Notably, the simplest potentials – IFF and Clay – demonstrate higher accuracy in reproducing experimental lattice parameters, closely matching X-ray diffraction data. The Fe-MLIP and DFT calculations also achieve good agreement, slightly overestimating lattice parameters a and c by less than \(2.5\%\). From this, we can infer that the main differences on our ML model compared to experiment stem from the dataset used for training, as the PBE xc-functional systematically overestimates lattice parameters of solids. Conversely, CS, Tersoff, and Reax display progressively larger deviations, in this order. For an overview of all calculated deviations we refer the reader to the Supplemental Material.

We now focus our attention on the elastic properties of hematite. For doing so, we calculated the stress tensor \(\varvec{C}\) and a number of elastic quantities derived from its elements, the so-called \(C_{ij}\) tensor elements (see the Suppl. Mat. for the theory behind the stress tensor and its derivations). In short, for calculating \(C_{ij}\), we applied strains from \(-0.5\%\) to \(0.5\%\) to the relaxed structure, maintaining a linear stress-strain regime. The \(C_{ij}\) values were then determined from linear regression slopes of the resulting stress-strain curves.

Table 3presents our results. DFT data were retrieved from Ref47. and the experimental data were taken from Ref36.. An analysis of the calculated elastic tensor constants (Table 3) reveals varied performance across the different computational methods when compared to experimental data. In general, DFT yields the best results, with an average relative deviation [see Eq. (1)] of 20.2%. As such, Fe-MLIP (23.4%) shows similar accuracy. The classical force fields perform significantly worse, indicating the importance of a model fitted to \({\mathop {\textrm{ab}}\limits ^{\frown }}\) initio data with a large degree of complexity, capable of capturing a wide range of features.

Table 3 Elastic constants \(C_{ij}\)calculated using empirical potentials and our trained ML potential. Quantities are given in GPa units. DFT data comes from Ref47. and experimental data are from Ref36.. Colors are based on relative deviation from experiments. The color bar for each row is scaled by the maximum absolute deviation from the experimental result in which red indicates a positive deviation and blue indicates a negative one. The average deviation, \(\langle \sigma \rangle\), was calculated using Eq. (1) and is expressed as a percentage.

From the elastic stiffness tensor components, we calculate the elastic properties. For obtaining the bulk (K) and shear (G) moduli we used the Voigt-Reuss-Hill averaging scheme48. Subsequently, the isotropic Young’s modulus (Y), Poisson’s ratio (\(\nu\)), and compressibility (\(\beta\)) can be derived from the bulk modulus (K) and shear modulus (G) (see the Supplemental Material). The ratio K/G is commonly referred to as the Pugh ratio49. Finally, another important quantity that can be derived from the bulk and shear moduli is the well-known Vickers hardness, a parameter often used to distinguish between ductile and brittle materials. The experimental protocol involves pressing a diamond-shaped indenter with a known force into the surface of the material and measuring the size of the resulting indentation. The Vickers hardness number (H) is then calculated from the applied load and the surface area of the indentation, providing a quantitative measure of the material’s resistance to plastic deformation50. The Vickers hardness can be obtained from elastic parameters by using the approximation proposed by Tian et al.51,

$$\begin{aligned} H = 0.92 \left( \frac{G}{K}\right) ^{{1.137}}G^{0.708}. \end{aligned}$$
(3)
Fig. 3
Fig. 3
Full size image

Star plots of the mechanical properties of hematite (bulk, K, Young, Y, and shear, G, moduli, compressibility, \(\beta\), Poisson’s ratio, \(\nu\), Pugh ratio, K/G, and Vickers hardness, H) relative to the estimated consensus value of experimental data reported in the literature (see the supplementary information for more details) for different interatomic potentials, and DFT. In parentheses, the average relative deviations, \(\langle \sigma \rangle\) (see main text).

Our results for the calculated elastic properties are summarized in Fig. 3, where relative deviations from experimental consensus of literature are shown (see Table S10 as well),and again DFT (and consequently, our Fe-MLIP) shows reasonable deviations from experiments. It comes to attention how poorly Reax performed on derived elastic quantities, heavily overestimating Y, K, G, and H, and underestimating \(\beta\), \(\nu\), and K/G. Both CS potential and IFF show trends similar to Reax but with less intense deviations. Interestingly, the simple Clay model and Tersoff show the best results among the empirical potentials studied. The Young’s modulus calculated by Tersoff was particularly accurate, and the G value given by Clay proved to be the best. The Fe-MLIP is particularly good for Vickers hardness prediction.

As a whole, based on the average relative deviations, Eq. (1), we can split the results into three categories: good, comprising Tersoff (11.00%), Clay (9.93%) and Fe-MLIP (11.00%) - which are comparable to DFT accuracy (10.38%) - fair, with CS (38.80%) and IFF (25.90%), and poor, with Reax (64.52%). The overall performance of each potential is summarized in Table 4, which presents the average absolute deviations (\(\langle \sigma \rangle\)) for the lattice parameters, elastic tensor elements (\(C_{ij}\)), and elastic moduli. To aid in visualizing the quality of the results, values are color-coded into three tiers: good (green), fair (yellow), and poor (red). The rationale for these intervals is based on the deviation of DFT results from experimental data, which was used as a benchmark. In general, results with a deviation similar to DFT were considered good, those up to twice the DFT deviation were deemed fair, and anything beyond that was classified as poor.

Fig. 4
Fig. 4
Full size image

Calculated frequencies of Raman (black squares) and IR (red circles) active modes of the investigated potentials against experimental ones. The black lines indicate an ideal match, and the blue lines are a linear fit to the calculated frequencies. Raman experimental data is taken from Refs52,53,54., and IR from Refs52,55,56,57..

To assess the performance of the potentials regarding vibrational properties, we compared calculated frequencies with Raman and IR experimental data for hematite52,52,53,54,55,56,57 (notice that there are no experimental measurements of the phonon spectrum of hematite reported in the literature). Fig. 4 shows parity plots comparing calculated frequencies at the \(\Gamma\) point for active Raman and IR modes of hematite with the average of experimental results (see also the Supplemental Material). As can be seen, the Fe-MLIP shows the best performance, with the lowest RMSE (47.7 cm\(\phantom{0}^{-1}\)), followed by the Clay potential (53.1 cm\(\phantom{0}^{-1}\)), and the Tersoff potential (83.6 cm\(\phantom{0}^{-1}\)), while Reax and IFF have RMSEs higher than 100 cm\(\phantom{0}^{-1}\). Our Fe-MLIP tends to produce slightly softer vibrational modes, underestimating frequencies, especially for higher frequency modes, while the Clay potential shows opposite behavior, producing harder vibrational modes, overestimating frequencies. The Tersoff potential softens(stiffens) modes below(above) \(\sim\)290 cm\(\phantom{0}^{-1}\), and the Reax and IFF potentials systematically overestimate the frequencies.

As seen, our ML model presents the best results compared to other interatomic potentials. This is a reflection of the good agreement of our DFT data with experiments. It also indicates the good quality of the training (see Figs. S7-S11), as we are able to capture quantities not included in the training data. The remaining force fields show a poorer performance, especially for derived quantities such as elastic and vibrational properties.

Clay shows good performance for most properties except for elastic constants.

Table 4 Average relative deviations (%) calculated using Eq. (1) for lattice parameters, elastic tensor elements \(C_{ij}\), and elastic moduli. The criteria for the color code are defined as follows: (i) for lattice parameters, values are green if \(\langle \sigma \rangle < 5.0\%\) and yellow if \(5.0\% \le \langle \sigma \rangle < 10.0\%\). (ii) For the elastic tensor elements (\(C_{ij}\)), the tiers are defined by \(\langle \sigma \rangle < 25.0\%\) (green), \(25.0\% \le \langle \sigma \rangle < 50.0\%\) (yellow), and \(\langle \sigma \rangle \ge 50.0\%\) (red). (iii) For the elastic moduli, the intervals are \(\langle \sigma \rangle < 15.0\%\) (green), \(15.0\% \le \langle \sigma \rangle < 30.0\%\) (yellow), and \(\langle \sigma \rangle \ge 30.0\%\) (red).

Surface energy

Table 5 Comparison of relaxed surface energies (J/m\(\phantom{0}^2\)) for hematite slabs. Literature values are from Refs58,59,60,61,62,63\(\phantom{0}^a\),64\(\phantom{0}^b\),61,62,65\(\phantom{0}^c\),66\(\phantom{0}^d\), and67\(\phantom{0}^e\).

We also calculated the surface energies for different crystallographic planes. We have calculated the relaxed surface energies for the Fe-terminated {0001} and {01\(\bar{1}\)2} hematite slabs using our ML potential and the other classical force fields, as shown in Table 5 and illustrated in Fig. 1(d)-(e). Details of the methodology used are given in the Supplementary Material.

Our MLIP yields surface energies of 1.315 J/m\(\phantom{0}^2\) for the {0001} surface and 1.159 J/m\(\phantom{0}^2\) for the {01\(\bar{1}\)2} surface. Notably, for both our potential and the literature values, the {01\(\bar{1}\)2} surface consistently exhibits a lower surface energy, indicating it is the more stable of the two. These results are in excellent agreement with the range of values reported in the literature from quantum mechanical calculations. For instance, previous studies using the Generalized Gradient Approximation (GGA) and GGA+U methods report {0001} surface energies between 0.99 J/m\(\phantom{0}^2\) and 1.70 J/m\(\phantom{0}^2\)58,59,60,61,62,63,65, and {01\(\bar{1}\)2} surface energies between 0.99 J/m\(\phantom{0}^2\) and 1.056 J/m\(\phantom{0}^2\)64,66. The values predicted by our ML potential fall well within these ranges, demonstrating its capability to accurately model surface phenomena with DFT precision. The Clay force field also presents good results, with values of 1.343 J/m\(\phantom{0}^2\) for {0001} and 1.072 J/m\(\phantom{0}^2\) for {01\(\bar{1}\)2}, aligning with the DFT data. In contrast, other tested potentials show significant deviations. This highlights the challenge for more general or reactive potentials to capture the specific surface chemistry of hematite accurately. Finally, for completeness, it is also informative to compare these results with other ab initio methods. Hybrid DFT functionals like B3LYP produces notably higher surface energies. This is notably arising due to the data set based on GGA+U used in training.

Transferability and limits of applicability

In order to investigate the limits of applicability of our ML model, specifically trained on hematite DFT data, we also approached other iron oxides, namely, maghemite and magnetite, as well as Fe and O vacancies.

Magnetite (Fe\(\phantom{0}_3\)O\(\phantom{0}_4\)) is a ferrimagnetic material that crystallizes in the inverse spinel structure with a cubic unit cell belonging to the space group \(Fd\overline{3}m\) [see Fig. 1 (b)]. In this arrangement, the O\(\phantom{0}^{2-}\) anions form a face-centered cubic (fcc) lattice, while the iron ions occupy interstitial sites. The trivalent Fe\(\phantom{0}^{3+}\) ions are equally distributed between tetrahedral and octahedral sites, and the divalent Fe\(\phantom{0}^{2+}\) ions occupy the remaining octahedral sites. The experimental lattice constant for this cubic phase is reported to be 8.3967 Å68. In this work, we used a \(2\times 2\times 2\) cubic supercell for magnetite.

Maghemite (\(\gamma\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)) is a weathering product of magnetite and shares a similar structure, with all iron ions in the trivalent state (Fe\(\phantom{0}^{3+}\)). Charge neutrality is maintained by the presence of balancing vacancies on the octahedral sites of its defect-spinel structure [see Figure 1 (c)]. Depending on the specific ordering of these cation vacancies, maghemite can be classified into different symmetries. These include a disordered cubic structure (space group \(Fd\overline{3}m\)), an ordered cubic structure (\(P4_332\)), or an ordered tetragonal structure (\(P4_32_12\))69. Based on powder neutron diffraction experiments, Greaves proposed that the lowest energy symmetry of maghemite is the tetragonal \(P4_32_12\) supercell, with lattice parameters of \(a = 8.3396\) Å and \(c = 24.966\) Å70. This finding is further supported by theoretical calculations from Grau-Crespo et al., which found the tetragonal configuration to be energetically favorable over the non-tetragonal ones71. In this work, we used the structure proposed by Greaves, relaxing a \(1\times 1\times 1\) supercell.

Table 6 Lattice and elastic parameters for maghemite (\(\gamma\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)). \(\phantom{0}^\dagger\)Lattice parameters a and c are in Å and were obtained from experiments70. \(\phantom{0}^\ddagger\)Elastic tensor elements \(C_{ij}\) and bulk modulus K (given in GPa) are from DFT (LCAO, GGA+U) calculations66,72. The color bar for each row is scaled by the maximum absolute deviation from the experimental result in which red indicates a positive deviation and blue indicates a negative one. The average deviation, \(\langle \sigma \rangle\), was calculated using Eq. (1) and is expressed as a percentage.
Table 7 Lattice and elastic parameters for magnetite (Fe\(\phantom{0}_3\)O\(\phantom{0}_4\)). Lattice parameter a is in Å, whereas elastic tensor elements \(C_{ij}\), elastic moduli (YKG), and Vickers hardness H are given in GPa. Compressibility \(\beta\) unit is GPa\(\phantom{0}^{-1}\) while Poisson ratio \(\nu\) is dimensionless. Experimental values in this table are summarized in the Supplemental Material. The color bar for each row is scaled by the maximum absolute deviation from the experimental result in which red indicates a positive deviation and blue indicates a negative one. The average deviation, \(\langle \sigma \rangle\), was calculated using Eq. (1) and is expressed as a percentage.

The transferability for maghemite and magnetite was evaluated by applying the same performance rating system based on the average relative deviation \(\langle \sigma \rangle\), Eq. 1, from reference values. Our results are summarized in Tables 6 and 7. The Reax potential is not listed in Tab. 6 because we were not able to achieve structure relaxation within the force tolerance of 1 meV/Å. Clay, IFF, and Core-Shell force fields are not listed in Tab. 7 because charge parametrization for these potentials is not suitable for the Fe\(\phantom{0}_3\)O\(\phantom{0}_4\) compound, i.e., the system is not neutral. Different from hematite and magnetite, maghemite is scarce on experimental data. Since we did not find reliable experimental data for pure maghemite, we adopt experimental results for lattice constants70 and DFT calculations (LCAO, GGA+U) for elastic properties66,72 as reference.

When tested on maghemite, our Fe-MLIP provided among the best results with average deviation of 15.8%. Other potentials showed varied success with CS (15%), Clay (23.7%), and IFF (25.0%), giving reasonable results and Tersoff showed the higher deviations, mostly due to poor performance for elastic properties. We note that our Fe-MLIP lattice parameters overestimation is aligned with DFT (LCAO, GGA+U) results from Guo and Barnard66 with reported values for \(a=8.598\) Å and \(c=25.718\) Å.

The analysis of magnetite also revealed different performance trends, highlighting how a potential’s accuracy can depend on the specific crystal structure and chemical composition. The Fe-MLIP again showed good transferability to this mixed-valence system, with a lattice error of 2.82% and an overall error of 13.3%. The performance of other potentials degraded significantly. Both Tersoff and Reax struggled to predict the mechanical behavior of magnetite, with average errors exceeding 50% for the former and 37.7% for the latter. A complete deviation values table for maghemite and magnetite can be found in the Supplemental Material.

The formation energies of Fe and O neutral atomic vacancies (\(E_{form}^{vac}=E_{bulk}-E_{vac}-\mu _{atom}\)) calculated with DFT and the Fe-MLIP are shown in Table 8. The Fe-MLIP reproduces the stability trend of DFT calculations, \(E_{form}^{\text {Fe }vac}> E_{form}^{\text {O }vac}\). However, it underestimates the formation energies by \(\sim 20\%\) as the result of an overestimation of \(E_{vac}\) for both vacancy defects. Considering the number of missing bonds in Fe vacancies (6), and O vacancies (4), our Fe-MLIP overestimates the missing covalent bond energies by 0.36 eV for Fe vacancies, and 0.40 eV for oxygen. Nonetheless, it is important to point out that this information is not available for classical force fields, and including some configurations of defects in the training set can significantly improve this value.

Table 8 Formation energies of Fe and O vacancies calculated with DFT+U and the Fe-MLIP, and the error per covalent missing bond, \(\Delta E/N_{bond}\), of the Fe-MLIP relative to DFT.

Conclusion

In this work, we have developed and validated a machine learning interatomic potential based on density functional theory data for hematite (\(\alpha\)-Fe\(\phantom{0}_2\)O\(\phantom{0}_3\)) using a message-passing graph neural network. We showed that this potential reproduces the experimentally measured structural properties with good overall accuracy, with average relative deviations of 1.8% for lattice parameters and 22.9% for elastic constants, the latter being the lowest among the potentials considered. We further demonstrated its transferability by computing the structural properties of additional iron oxides, namely maghemite and magnetite, as well as those of distinct hematite surfaces, such as the \(\{01\overline{1}2\}\) facet.

The Fe-MLIP potential shows consistently good accuracy, approaching DFT-level results at a substantially reduced computational cost. It is able to reproduce a wide range of structural and mechanical properties, including anisotropic elastic behavior, indicating that the Graph Neural Networks-based model provides a reliable and transferable description for iron ores.

While the presented MLIP is currently restricted to iron oxides within the chemical space Fe-O, it can serve as a foundation for the development of more sophisticated iron-based models, including those that encompass alternative bonding topologies, point defects, and extrinsic impurities. Moreover, existing NN-based MLIPs generally do not provide disentangled or explicit information about the individual physical mechanisms underlying covalent bonding and long-range interactions. Instead, they primarily offer a flexible framework for fitting the potential energy surface, implicitly capturing all interaction mechanisms in a single effective representation. This entanglement can impede a detailed, mechanistic analysis of complex phenomena such as electron localization, charge states, and defect-related physics. Nonetheless, the present work constitutes a significant step towards enabling large-scale molecular dynamics simulations of hematite with DFT-level accuracy, thereby facilitating more realistic investigations of its behavior across a broad range of contexts, from geophysical and geological processes to industrial technologies and nanostructured applications.