Introduction

The redox state of Earth’s upper mantle, which is quantified by oxygen fugacity (fO2), governs the behavior of redox-sensitive elements in magmas, the formation of ore deposits, and has profoundly impacted Earth’s long-term climate1,2,3,4,5,6,7. The elevated fO2 of the upper mantle and mantle-derived melts may have facilitated the Archean–Proterozoic Great Oxidation Event (GOE)8,9,10. However, intriguingly and oppositely, increased oxygen concentrations in the atmosphere and oceans during the Neoproterozoic Oxygenation Event (NOE) ~0.8–0.4 billion years ago (Ga)11,12,13,14,15,16,17 are suspected to have controlled the oxidation of the sub-arc mantle through the increased input of marine sulfate via subducted slabs18. Today, the sub-arc mantle is more oxidized than the mid-ocean ridge basalt (MORB) mantle, primarily due to the influx of oxidized materials from subducted slabs19,20,21,22,23,24. Accordingly, arc magmas are more oxidized than MORBs1,25,26. However, it remains unclear whether the fO2 of the sub-arc mantle has changed over time, particularly across the Proterozoic–Phanerozoic boundary, despite the pivotal importance of this information for unraveling the relationships between mantle fO2, the evolution of Earth’s surface environments, and animal diversification11,12,14,27.

Two critical steps are needed to address these gaps in our understanding: (1) establishing a reliable redox proxy and (2) identifying suitable samples that faithfully record sub-arc mantle fO2 over time. Fe3+/ΣFe measurements in mineral inclusions and glasses are ideal, but are limited by sample availability and high analytical costs28. As an alternative, the multivalent element vanadium (2+, 3+, 4+, 5+) demonstrates increasing incompatibility under increasingly oxidizing conditions, and the trivalent element scandium behaves incompatibly during mantle partial melting29,30,31. Accordingly, petrological modeling of non-modal partial melting indicates that the V/Sc ratio in basalts is a reliable proxy for mantle fO2 because V and Sc exhibit analogous partitioning behaviors during mantle melting processes and are relatively insensitive to mantle lithologies and chemical compositions32.

To overcome the inaccuracies of traditional methods used to distinguish arc basalts from non-arc basalts, we used a high-dimensional machine learning (ML) approach to classify basalts accurately and identify ancient arc-type basalts (hereafter, simply “arc basalts”)33. As a widely used ML technique, eXtreme Gradient Boosting (XGBoost) is a supervised ML model; it is trained by applying decision trees to a known and labeled dataset, then makes predictions when applied to an unknown application dataset. Therefore, we trained an XGBoost ML model on geochemical data from modern arc and non-arc basalts (Supplementary Figs. S1S4; see “Methods”). To constrain the redox state of the sub-arc mantle, we excluded back-arc and fore-arc basalts from the training dataset. We applied the trained model to identify global arc basalts erupted since ~0.8 Ga (Supplementary Figs. S5S8; see “Methods”). Then, we calculated the global mean V/Sc values of primitive arc basalts since ~0.8 Ga and translated them using petrological modeling to estimate changes in the sub-arc mantle fO2 through time (see “Methods”).

Results and discussion

ML modeling results

Based on the input features (Nb/La, Ti, U/Nb, La, Nb, Rb/Nb, Pb/Nd, Ba, Nb/Yb, U, Sr/Nd, Ba/La, Ba/Nb, Th, P, Pb/Nb, Nb/Th, La/Yb, Zr/Y, Sr, Y, Ce, Zr, Pb, Rb, Ti/Y, Ta)32, our arc vs. non-arc classification model achieved an average accuracy and F1 score of 0.95 ± 0.01 (1 standard deviation; s.d.) and 0.94 ± 0.01 (1 s.d.), respectively, when applied to the training subset (70% of the training dataset). The F1 score is widely used to evaluate the performance of a classification model (see “Methods”), and the confusion matrix visualizes model performance by comparing known and predicted data labels. As shown in the confusion matrix (Fig. 1b), the model similarly achieved an accuracy and F1 score of 0.95 and 0.94, respectively, when applied to the testing subset (30% of the parental training dataset). These high accuracies and F1 scores illustrate that our ML model can reliably distinguish between arc and non-arc basalts.

Fig. 1: Results after applying our XGBoost model to classify arc vs. non-arc basalts.
figure 1

a The relative importance of each input feature in training the model (see “Data Availability”). All feature importances are <0.09, indicating that no single feature dominates the classification model. The color scale shows the relative feature importance. b The confusion matrix comparing known and predicted data labels when the model was applied to the testing subset (30% of the training dataset). The confusion matrix shows a high accuracy of 0.95 and an F1 score of 0.94, suggesting the strong performance of our trained high-dimensional model.

XGBoost incorporates algorithms to calculate the relative importance of each feature to the overall ML model. Feature importance scores are widely used to improve model interpretability and better understand the high-dimensional geological processes modeled33,34,35,36,37,38,39,40. Here, our results show that Nb/La, Ti, U/Nb, La, and Nb are the five most important features in our classification model (Fig. 1a), consistent with the observations of Liu et al.33. Furthermore, all feature importances are <0.09, implying that all input features were important when building the classification model. Therefore, unlike traditional binary discrimination diagrams, the high-dimensional model with multiple parameters is less affected by individual elements33,40.

Temporal variations of the mean V/Sc ratios of primitive arc basalts

Here, we applied our classification model to predict the probability that each sample in our basaltic application dataset is an arc basalt (Supplementary Fig. S8; see “Methods”). We then determined the mean V/Sc ratios of primitive arc basalts (8–17 wt% MgO, 60–72 Mg#, where molar Mg# = 100 × MgO/(MgO + FeOtot); Supplementary Fig. S7)20 at 100-million-year (Myr) intervals using a weighted bootstrap resampling technique41,42 (Supplementary Fig. S9). The secular evolution of mean V/Sc values in primitive arc basalts is consistent regardless of the time interval chosen (100 Myr in Fig. 2a; 50 and 25 Myr in Supplementary Fig. S10), indicating that the observed trend is not biased by the choice of interval. Furthermore, the weighted bootstrap resampling method incorporates a weight that is inverse to a sample’s age and location to minimize spatiotemporal bias (see “Methods”). Our results reveal that the mean V/Sc ratios of primitive arc basalts have remained consistent around a value of 7.16 ± 0.91 (2 s.d.) since ~0.8 Ga (Fig. 2a). Within uncertainties, this mean V/Sc ratio aligns with the contemporary global average in primitive arc basalts (7.50 ± 1.67, 1 s.d.) (Fig. 2a).

Fig. 2: Secular variations in global arc basalts since the late Neoproterozoic.
figure 2

a Variations of the mean V/Sc ratio in global primitive arc basalts in 100-Myr bins (see “Data Availability”). Stars indicate the average V/Sc ratios of modern primitive arc basalts (~7.50, red) and MORBs (~7.10, gray), compiled from the GEOROC and PetDB databases, respectively. The histogram shows the number of primitive arc basalts known to have erupted through time (n ≈ 840). The kernel density estimate (KDE) is a nonparametric estimation of the probability density function of the basalt sampling dataset102. The relatively large uncertainties on V/Sc at ~0.65 Ga is due to the scarcity of basalts erupted during 0.7–0.6 Ga. b, c Estimated mean melting P and T of primary arc basalts through time, respectively, assuming a water content of 4 wt%57,58. Error bars in ac indicate 2 s.e.m. uncertainties. Horizontal lines and shaded bands in ac illustrate average values since ~0.8 Ga and 95% confidence intervals, respectively. Cen. Cenozoic.

Validation of the ML classification model

Some non-arc basalts are influenced by slab-derived fluids and sediments, resulting in arc-like geochemical signatures. These signatures include enrichment in large-ion lithophile elements and depletion in high-field-strength elements43,44. Therefore, it is essential to compile independent datasets to test whether our ML model can accurately discriminate between arc basalts and non-arc basalts with arc-like signatures33. One such case is Cenozoic intraplate basalts in Northeast China, which exhibit high H2O/Ce and Ba/Th ratios and low Ce/Pb ratios, suggesting the influence of subducted fluids and sediments from the flat slab in the mantle transition zone beneath the region45,46,47. Previous ML models, including random forest and deep neural networks, predicted that most of these Cenozoic intraplate basalts were affected by stagnant-slab-derived fluids48. Our ML model classified these Cenozoic intraplate basalts (n = 331) from Northeast China48 as non-arc basalts with ~99% accuracy.

Furthermore, some island arc basalts, such as those from the Mota arc (Vanuatu), exhibit atypical arc geochemical signatures, such as Nb enrichment44,49. Importantly, we incorporated typical Nb-enriched arc basalts, such as those from Panama, Costa Rica, and Kamchatka, into our training model50,51. Our ML model accurately recognized all of the atypical Mota arc basalts (n = 6)44. Consequently, the results of our independent model validation indicate that our ML model robustly distinguishes between arc and non-arc basalts. We stress, though, that our ML model is primarily designed to classify arc vs. non-arc tectonic settings only for basalts and basaltic andesites (45–54 wt% SiO2), i.e., the SiO2 range included in our training dataset.

Secular changes of the sub-arc mantle fO2 since the late Neoproterozoic

A previous study suggested that the sub-arc mantle fO2 increased during ~0.8–0.4 Ga, based on the higher mean Fe3+/ΣFe and V/Sc ratios of Phanerozoic island arc igneous rocks compared to those of their Precambrian counterparts18. However, directly calculating mean values for individual island arc segments may be susceptible to preservation and sampling biases due to the uneven spatiotemporal sample distribution41,42, which could lead to highly variable observed mean Fe3+/ΣFe and V/Sc ratios in Precambrian rocks18. To mitigate this, we employed the weighted bootstrap resampling technique to minimize spatiotemporal preservation bias when estimating mean V/Sc ratios through time. Our analysis of V/Sc ratios in primitive arc basalts suggests a relatively stable value since the late Neoproterozoic (Fig. 2a).

Furthermore, the identified island arc igneous rocks may have been compromised by traditional tectonic-magmatic discrimination methods, which are less accurate32. Additionally, their compiled dataset of island arc igneous rocks exhibits significant compositional variability, including mafic, intermediate, and felsic compositions (~40–80 wt% SiO2). However, it is only plausible to constrain the sub-arc mantle fO2 using medium- to high-silica magmatic rocks if the degrees of partial melting, fractional crystallization, and assimilation are well understood. Otherwise, these processes may obscure the fO2 information inherited from the mantle source. In comparison, primitive basalts effectively reflect the composition and redox state of the mantle source.

Moreover, some compiled whole-rock Fe3+/ΣFe data were obtained using wet chemical methods18, including colorimetry and titration. Colorimetric measurements are known to overestimate Fe3+/ΣFe ratios in basaltic samples, whereas titration methods are more reliable52. Furthermore, Mariana back-arc basalts are systematically more reduced than the sub-arc mantle there19. Considering that the back-arc rocks were included in the analysis18, the temporal evolution of Fe3+/ΣFe and V/Sc ratios they reported may be biased by the preservation and sampling of both arc and back-arc rocks. In summary, the secular evolution of Fe3+/ΣFe and V/Sc ratios previously reported in island arc igneous rocks is undermined by issues of sample representativeness and data reliability (Supplementary Figs. S11S14). Notably, variations in these ratios may not fully reflect changes in sub-arc mantle fO2 because they are also influenced by melting pressure (P), temperature (T), mantle source composition, and the degree of partial melting. Therefore, it is necessary to more quantitatively constrain the secular evolution of the sub-arc mantle fO2.

We meticulously curated a high-quality dataset by manually reviewing the geochronological data and geochemical compositions of Precambrian samples (Supplementary Fig. S6; see “Methods”), significantly improving the quality of data obtained from current geochemical databases such as EarthChem33,40,53. Additionally, our ML model incorporates some fluid-mobile elements (e.g., U and Pb) as input features, which may be modified during metamorphism and post-alteration processes. That said, if our model were highly biased by fluid-mobile elements, it would have incorrectly classified highly altered samples. However, the relative importances of U and Pb were <0.04 in our classification model (Fig. 1a). To further confirm that our model is not biased by fluid-mobile elements, we applied it to weathered Cretaceous arc basalts from the Benbow Inlier in eastern Jamaica (n = 8)54 and Neogene non-arc basalts from Hainan Island, South China (n = 21)55; most Cretaceous arc (n = 7) and Neogene non-arc basalts (n = 20) were correctly identified. These validation results further indicate that our ML model can accurately discriminate arc from non-arc basalts and minimize misclassifications. Consequently, our refined dataset and ML classification model provide a more reliable record of the temporal evolution of V/Sc ratios in global primitive arc basalts.

Elevated mean V/Sc ratios of primitive arc basalts could be attributed to subaerial oxidation (weathering). If this were the case, though, we would expect a decrease in the mean V/Sc ratio since the NOE, because V5+ is increasingly fluid-mobile (as H2VO4−) under increasingly oxidizing conditions56. The consistent V/Sc ratio over time instead suggests that weathering processes had a limited impact (Fig. 2a). Additionally, the temporal evolution of the mean V/Sc ratio might be influenced by differences in the preservation of continental vs. oceanic arc basalts because our classification model does not differentiate between them. However, the mean V/Sc ratio of modern global primitive continental arc basalts (8.02 ± 1.70, 1 s.d.) largely overlaps with that of modern oceanic arc basalts (7.23 ± 1.59, 1 s.d.), indicating similar oxidation states over a given period (continental and oceanic arc basalts extracted from the GEOROC database). This suggests that the preservation bias between continental and oceanic arc basalts also has a minimal effect on the temporal evolution of the mean V/Sc ratio. Our estimated mean V/Sc ratio of primitive arc basalts thus provides a first-order constraint on the evolution of the sub-arc mantle fO2.

Furthermore, V/Sc ratios in primary melts are not solely governed by mantle fO2, but are also significantly affected by the P-T conditions during melting and by the degree of partial melting30. By considering the evolution of melting P-T conditions during arc magma genesis, we calculated the average P-T conditions during mantle melting that produced the primitive arc basalts since ~0.8 Ga57,58 (Supplementary Fig. S15; see “Methods”). The average melting P and T conditions have remained relatively stable through time, fluctuating within respective narrow ranges of 1.3 ± 0.2 GPa (2 s.d.), largely within the spinel stability field30, and 1255 ± 35 °C (2 s.d.) (Fig. 2b, c). Under relatively oxidized conditions (e.g., ΔFMQ + 1; ΔFMQ representing the deviation of a system’s fO2 from the fayalite-magnetite-quartz buffer, FMQ), V/Sc in primary melts slightly increases with decreasing degree of partial melting30. Because La is more incompatible than Nd during mantle melting, the constant mean La/Nd ratio through time suggests that the degree of partial melting has had a minimal effect on the mean V/Sc ratio of primitive arc basalts (Supplementary Fig. S16).

Accordingly, we converted the mean V/Sc values of primitive arc basalts into estimates of sub-arc mantle fO2 by assuming a mantle lithology of spinel lherzolite with a depleted MORB mantle (DMM) chemical composition59 (see “Methods”). Our results indicate that the average sub-arc mantle fO2 has remained relatively oxidized at ΔFMQ + 1.36 ± 0.89 (2 standard errors of the mean, s.e.m.) since the late Neoproterozoic (Fig. 3a). Such mean fO2 conditions are consistent with the modern sub-arc mantle value (ΔFMQ + 1.0–ΔFMQ + 1.6) inferred from Fe3+/ΣFe ratios in olivine-hosted primary melt inclusions from Mariana arc magmas19. Intriguingly, our estimate suggests that the sub-arc mantle has been consistently more oxidized than the ambient upper mantle (ΔFMQ − 1– ΔFMQ + 1)10 since ~0.8 Ga (Fig. 3a).

Fig. 3: The evolution of sub-arc mantle fO2 and geological records over time.
figure 3

a The secular mean sub-arc mantle fO2 evolution since ~0.8 Ga. We first simulated arc mantle melting using a non-modal partial melting model to establish the relationship between V/Sc and fO2 (see “Data Availability”). Subsequently, we calculated the melting pressure-temperature (P-T) conditions of primitive arc basalts using the Fractionated-PT software57. By combining the V/Sc-fO2 relationship, melting P-T conditions, and mean V/Sc values, we estimated the temporal evolution of the mean sub-arc mantle fO2 relative to fayalite-magnetite-quartz (FMQ) buffer since the late Neoproterozoic (see “Methods”). The modern arc mantle fO2 was estimated from primary melt inclusions in Mariana arc basalts19. Vertical error bars in a are 1 s.d. uncertainties. The orange horizontal line and shaded orange band represent the average V/Sc since ~0.8 Ga and the 95% confidence interval, respectively. The evolution of the ambient upper mantle fO2 was estimated by Zhang et al.10. b The occurrence of blueschists and low-T, high-P metamorphic rocks since the late Neoproterozoic76. c, d Secular changes of atmospheric O2 contents13,17,89 and ocean dissolved O2 contents16,89, respectively. In ad, the light blue shaded band indicates the time interval of the Neoproterozoic Oxygenation Event (NOE)11,12,13,14,15,16,17. eg The secular evolution of mean Rb/La, Sr/Nd, and Th/La ratios in primitive arc basalts, respectively. Dashed lines indicate the average values of modern primitive MORBs (data extracted from the PetDB repository, n ≈ 530). The mean Sr/La, Rb/La, and Th/La ratios in primitive arc basalts are consistently higher than those in MORBs through time. Vertical error bars in eg are 2 s.e.m uncertainties. Cen. Cenozoic, PAL Present Atmospheric Level.

The long-term oxidation of the sub-arc mantle

Prior research has indicated that latitude-dependent variations of Cenozoic sub-arc mantle fO2 have been influenced by the presence of less oxidized sediments in subducted slabs at lower latitudes60. Nonetheless, latitude-dependent redox effects cannot be the sole cause of the consistently oxidized sub-arc mantle since the late Neoproterozoic for two reasons. First, despite similar V/Sc ratios in low-latitude arc samples and MORBs, low-latitude sub-arc mantle fO2 values remain more oxidized than those of the MORB mantle due to the lower melting temperatures in arcs20. Second, available geological records do reflect this latitude effect as minor variations, but do not show a significant effect on the global mean sub-arc mantle fO2. For instance, the fragmentation of the supercontinent Rodinia into low-latitude microcontinents could have led to increased low-latitude subduction61,62, potentially resulting in the relatively less oxidized sub-arc mantle fO2 values at ~0.7–0.6 Ga. Conversely, the migration of the continents towards higher latitudes might have led to the positive shift in sub-arc mantle fO2 values during ~0.6–0.5 Ga61,62,63. Additionally, the aggregation of the continents at low latitudes could have contributed to the relatively less oxidized sub-arc mantle from ~0.5–0.1 Ga63,64. However, these fluctuations are within the 2 s.e.m. errors of the trend (Fig. 3a), and the relatively stable mean V/Sc ratios and sub-arc mantle fO2 since ~0.8 Ga indicate that the latitude effect has had only a limited impact on the long-term evolution of global sub-arc mantle fO2.

It has been suggested that the increased atmospheric and oceanic oxygen levels during the late Neoproterozoic and early Paleozoic may have directly affected the oxidation state of the sub-arc mantle18. However, the relatively stable oxidized state of the sub-arc mantle since ~0.8–0.4 Ga suggests that increased atmospheric and marine oxygen concentrations did not directly lead to the oxidation of the mantle wedge via elevated marine sulfate contents (Fig. 3c, d).

Alternatively, serpentinization of mafic and ultramafic rocks can oxidize oceanic rocks and the upper lithospheric mantle, whose eventual subduction could oxidize the sub-arc mantle18,19,24,65,66,67,68,69,70,71,72. Although it is difficult to quantify the degree of serpentinization of the oceanic lithosphere, worldwide serpentinites in subducted oceanic slabs contain abundant carbon, hydrogen, and sulfur, which may contribute to the oxidation and induced melting of the mantle wedge65,71,73,74,75. If this mechanism were the primary driver of mantle oxidation, we would expect the sub-arc mantle fO2 to have been oxidized since the onset of subduction. Indeed, we observe that the sub-arc mantle has maintained an oxidized state since ~0.8 Ga, supporting this hypothesis. The increased occurrence of blueschists and low-T, high-P metamorphic rocks since the late Neoproterozoic76 suggests that the evolution of modern subduction may have facilitated the deserpentinization of subducted slabs, releasing oxidized materials into the sub-arc mantle (Fig. 3b) and thus maintaining its oxidized state.

This observation suggests that the long-term oxidation of the sub-arc mantle may be driven by geological processes involving serpentinization and subduction, rather than by the direct input of marine dissolved oxygen and sulfate. Consequently, the oxidation of the sub-arc mantle appears to be influenced by deep Earth processes rather than being a direct consequence of changes in surface oxygen levels.

Given the greater mobility of Sr and Rb in fluids compared to Nd and La, Sr/Nd and Rb/La ratios have been used to trace the transport of aqueous fluids into the mantle wedge24,77,78. The enrichment of Th in terrigenous sediments79 implies that elevated Th/La ratios in arc basalts relative to MORBs typically reflect the movement of sediment melts from the subducted slab into the sub-arc mantle19. Our results show significantly higher mean Sr/La, Rb/La, and Th/La ratios in primitive arc basalts compared to those in MORBs through time, indicating that oxidized slab-derived fluids and melts have likely enhanced the oxidation of the sub-arc mantle since ~0.8 Ga (Fig. 3d–f).

The oxidized nature of modern arc magmas compared to MORBs has been attributed to either oxidized materials from subducted slabs19,24,66,71 or differentiation processes80,81. Our results indicate that the sub-arc mantle has been oxidized since the late Neoproterozoic, influencing the long-term oxidation state of arc magmas. Moreover, it has long been suggested that mantle fO2 changes have elevated Earth’s surface oxygen levels (e.g., the GOE) through volcanic degassing over geological time4,8,82,83,84. However, the sub-arc mantle fO2 did not increase during ~0.8–0.4 Ga, which is asynchronous with the oxidation of the atmosphere and the deep ocean during that period. The oxidation of the sub-arc mantle was therefore not a direct trigger of the NOE (Fig. 3). Therefore, alternative mechanisms must ultimately have driven the NOE: e.g., the prosperity of primary producers (e.g., algae)13, the burial of organic carbon and pyrite17, and/or increased marine phosphorus concentrations85,86,87,88. Nonetheless, the relatively stable oxidation of the sub-arc mantle since the late Neoproterozoic may have contributed to the secular oxidation of Earth’s surficial environments by increasing the flux of oxidized volcanic gases to the Earth’s surface and potentially promoting biological evolution (e.g., animal diversification)89,90. Further studies on potential links between the fO2 of the broader mantle and surficial oxidation and biological evolution should integrate records of igneous rocks, sedimentary rocks, and geobiodiversity through time.

Methods

Training dataset

The global arc and non-arc basaltic geochemical datasets (each basalt representing a single sample) were previously compiled and checked by Liu et al.33 from the GEOROC (https://georoc.eu/georoc/new-start.asp; accessed 15 January 2023)91 and PetDB databases (https://www.earthchem.org/petdb/; accessed 18 January 2023). Here, our training dataset includes arc (n = 5069) and non-arc (n = 8500) basalts (45–54 wt% SiO2, ≤17 wt% MgO). This geochemical training dataset comprises typical modern (most younger than 65 Ma) continental and oceanic arc basalts and non-arc basalts (i.e., oceanic island basalts, MORBs, oceanic plateau basalts, and continental basalts) (Supplementary Fig. S1) (see “Data Availability”).

Feature selection

Before operating the XGBoost training model, the input features in the training dataset were first tested for collinearity to both reduce model dimensions (some elements have similar geochemical behaviors) and save computing resources33,40. Here, we used the Pearson correlation coefficient (−1 ≤ ρ ≤ 1) to test the collinearity of every feature, with |ρ| values closer to 1 representing increased correlation between two features (Supplementary Fig. S2). Following Liu et al.33, we only selected features with |ρ| <0.88. Accordingly, we used 27 input features for the XGBoost classification model.

Scatter and box plots for each geochemical feature included in the arc and non-arc basaltic training datasets are shown in Supplementary Fig. S3. The distribution of each input feature between arc and non-arc basalts is nearly identical, further indicating that no single element can effectively classify these two types of basalts. Furthermore, some features contained missing values because some elements were not measured or reported in the original publications. However, the proportion of missing values was <30% for most features (Supplementary Fig. S4). Nonetheless, the XGBoost ML method is designed to deal with missing values92.

Application dataset

To apply our model, we compiled a global application dataset (~0.85 Ga to present) mainly comprising two data populations: (a) basalts ranging from ~0.85–0.45 Ga (n ≈ 1680) from the EarthChem repository (http://www.earthchem.org/; accessed 11 June 2021) and (b) basalts from ~0.8–0 Ga (n ≈ 12,730) (each basalt representing a single sample), with nearly all samples being Phanerozoic (n ≈ 12,520), from the igneous geochemical dataset of Keller and Schoene41 (Supplementary Fig. S5) (see “Data Availability”). It has been shown that uncorrected geochronological data archived in the EarthChem database cannot be directly used, especially for Precambrian samples33,40,53. Therefore, to control the data quality, we manually checked the basalt data against the original publications via multiple steps, including removing duplicate samples, correcting geochemical and geochronological data, and expert data review33,40. We observed that the geochronological data for numerous Proterozoic samples were inconsistent with the original publications, whereas the Phanerozoic samples matched their sources accurately (Supplementary Fig. S6). Accordingly, the newly refined basaltic dataset was used to calculate secular changes of the mean V/Sc ratio in primitive arc basalts, and thereby estimate the sub-arc mantle fO2 since ~0.8 Ga in 100-Myr bins. The primitive arc basalts were filtered to include only those with 8–17 wt% MgO and Mg# = 60–72 to reduce the influence of mineral crystallization/accumulation, magma mixing, and crustal contamination (Supplementary Fig. S7)20.

XGBoost modeling

XGBoost is a commonly used supervised ML technique that applies gradient-boosted decision trees to solve classification and regression problems92,93. It trains a model on a labeled dataset using decision trees and then predicts outcomes on unseen data. Specifically, it adds new trees iteratively to run gradient boosting, which predicts the residuals of prior trees. The gradient descent algorithm guides this process to minimize the loss function.

In our modeling, we implemented a binary classification for the training dataset. Accordingly, our classification model comprised the following steps. (1) We executed 10-fold cross-validation (i.e., dividing the parental training dataset into training and testing subsets 10 times to evaluate model performance) to test whether our training dataset was influenced by a class imbalance problem. Class imbalance is a prevalent challenge in ML, particularly in classification tasks, where the distribution of classes in the dataset is significantly uneven (see details in the next subsection). This process gives a mean value with a standard deviation (s.d.) based on ten performance scores. (2) We randomly divided the parental training dataset into a training subset (70%) and a testing subset (30%). (3) We optimized hyperparameters (e.g., gamma, learning rate, max depth, and alpha) via grid-search cross-validation and 10-fold cross-validation to obtain the best training classification model. Gamma (≥0) is the minimum loss reduction that controls the complexity of decision trees and prevents overfitting of the model. Overfitting occurs when a ML model achieves high accuracy on the training dataset but performs poorly on new data. The learning rate (0–1) decreases the step size to reduce overfitting. Max depth (≥0) represents the maximum depth of a tree, determining the complexity of the model. Alpha (≥0) refers to a regularization term to prevent overfitting. After hyperparameter tuning, the optimal training model was achieved with the following hyperparameters: gamma = 0, learning rate = 0.6, max depth = 7, and alpha = 0.1. (4) We utilized the testing subset to evaluate model performance via a confusion matrix (Fig. 1). (5) We applied the best training model to the application dataset and predicted the probability of a given sample being an arc basalt (Supplementary Fig. S8). Here, we took 0.5 as the threshold probability between arc (yi ≥ 0.5) and non-arc basalts (yi < 0.5) in our model training, validation, and prediction (see “Code Availability”). We performed the XGBoost ML modeling using the scikit-learn package in Python94 and confirmed within the Geochemistry π Python Framework (v.0.7.0)95. The open-source documentation for XGBoost is available at https://github.com/dmlc/xgboost and https://xgboost.readthedocs.io/en/stable/.

Accuracy and the F1 score are commonly used to estimate ML model performance. Accuracy is defined as:

$${{\mathrm{Accuracy}}}=\frac{{P({arc})}_{{correct}}+{\,P({non}-{arc})}_{{correct}}}{L({total})}$$
(1)

where P(arc)correct and P(non-arc)correct represent the number of correct model predictions for arc and non-arc basalts, respectively, and L(total) is the total number of labeled arc and non-arc samples. Here, we focused on the model predictions of arc basalts. The F1 score is defined as the harmonic mean of precision and recall:

$${{{\rm{F}}}1\;{{\mathrm{score}}}=\frac{2\times {Precision}\times {Recall}}{{Precision}+{Recall}}}$$
(2)
$${Precision}=\frac{{{P}{(}{arc}{)}}_{{correct}}}{{{P}{(}{arc}{)}}_{{total}}}$$
(3)
$${Recall}=\frac{{P({arc})}_{{correct}}}{L({arc})}$$
(4)

where P(arc)total and L(arc) refer to the total number of predicted and labeled arc basalts, respectively. Compared to traditional accuracy, the F1 score better estimates model reliability in cases of imbalanced classifications36.

Class imbalance in the ML model

Class imbalance problems arise when the amount of data is significantly different in each class of the training dataset, which can result in underestimated probabilities for scarce classes35,96,97. For instance, an imbalance ratio (IR = M/m, with M and m denoting the number of samples in the majority and minority classes, respectively) exceeding ~25 can significantly diminish the accuracy of the training model96. Here, the difference between the number of arc (n = 5069) and non-arc basalts (n = 8500) in our training dataset (IR ≈ 1.7) is not significant enough to result in a class imbalance problem. That said, if there were an important class imbalance issue in our training dataset, the 10-fold cross-validation procedure would generate very low accuracies and F1 scores. However, our 10-fold cross-validation resulted in a relatively high accuracy of 0.86 ± 0.07 (1 s.d.) and a high F1 score of 0.81 ± 0.10 (1 s.d.) before any model training processes. Accordingly, our training dataset is not strongly affected by class imbalance.

Overfitting by the ML model

Overfitting issues in ML models arise when a model performs well on the training dataset but fails to predict new, unseen data. A prevalent method used to assess overfitting is 10-fold cross-validation, which randomly splits the training and testing subsets ten times to evaluate model performance. After 10-fold cross-validation, our training and testing subsets both exhibited accuracies of 0.95 and F1 scores of 0.94. These relatively high accuracies and F1 scores indicate that our ML model is not affected by overfitting. Furthermore, XGBoost includes regularized objectives and algorithms that contribute to smoothing the final learned weights to avoid overfitting92.

Weighted bootstrap resampling method

We implemented a weighted bootstrap resampling technique to calculate the mean V/Sc (Fig. 2a), Rb/La, Sr/Nd, and Th/La ratios (Fig. 3e–g) of primitive arc basalts through time, which can minimize spatiotemporal preservation bias41,42. To reduce spatiotemporal bias, the resampling technique introduces weights inversely related to each sample’s age and location. During the resampling process, rarer samples are resampled frequently and denser data less frequently. The resampling process was implemented as follows. (1) Each sample was assigned a weight, calculated as the inverse of its spatiotemporal density, to minimize spatiotemporal preservation bias. (2) A Monte Carlo simulation, a statistical method that uses repeated random sampling to obtain numerical results, was applied to resample proportional to the samples’ weights and achieve a resampled dataset. (3) Based on the samples’ age data and their associated uncertainties, we separated the resampled dataset into 100-Myr bins in which we calculated elemental ratios (e.g., V/Sc, Rb/La, Sr/Nd, Th/La). (4) Steps (1) to (3) were repeated 100 times. (5) Finally, based on the 100 simulations, we calculated the mean value for each bin and estimated the 2 s.e.m. uncertainties to quantify the variability of our estimates.

Petrological modeling

We employed a near-fractional melting model to quantitatively simulate V/Sc ratios in primary melts during sub-arc mantle partial melting20. In the melting model, the concentration of element i in the instantaneously produced melt is given as:

$${{C}}_{{{i}},{melt}}^{{{n}}+1}=\frac{{{{C}}}_{i,{residue}}^{n}}{{{{D}}}_{i,{bulk}}^{n+1}+{{F}}\left(1-{{{D}}}_{i,{bulk}}^{n+1}\right)}$$
(5)

where \({C}_{i,{melt}}^{n+1}\) is the concentration of element i in the melt at melting step n + 1, \({C}_{i,{residue}}^{n}\) is that of element i in the residue after melt extraction at step n, \({D}_{i,{bulk}}^{n+1}\) is the bulk partition coefficient at step n + 1, and F is the extracted melt fraction at each step. In addition, the concentration of element i in the aggregate melt at step n is given as:

$${C}_{i,{aggr}}^{n}=\frac{{\sum }_{1}^{n}{{C}}_{i,{melt}}^{n}{M}_{{melt}}^{n}}{\sum {M}_{{melt}}^{n}}$$
(6)

Where \({{C}}_{i,{melt}}^{n}\) and \({M}_{{melt}}^{n}\) are the concentration of element i in the instantaneous melt and the mass of melt produced at step n, respectively. \({C}_{i,{aggr}}^{n}\) and \(\sum {M}_{{melt}}^{n}\) are the overall concentration of element i and the bulk mass of the aggregate melt at step n, respectively.

We calculated the melting P and T conditions of global primitive arc basalts at a water content of 4 wt% using the Fractionated-PT software44,45. Our results show that the average melting P was nearly constant through time at 1.3 ± 0.2 GPa (2 s.d.), which is within the spinel stability field30 (Fig. 2b). Therefore, we modeled the initial mineral assemblage in the sub-arc mantle as a spinel lherzolite (57% olivine (Ol), 28% orthopyroxene (Opx), 13% clinopyroxene (Cpx), 2% spinel (Spl))98 of DMM chemical composition59. The generation of arc magmas was assumed to initiate at ~1150 °C (deduced from Supplementary Fig. S15). The water-bearing melting reactions were chosen from Gaetani and Grove99 and Parman and Grove100.

The partition coefficients for element i between mantle minerals and the melt were calculated via the empirical equations of Wang et al.20:

$${\log }_{10}\left({D}_{{V}}^{{Ol}/{{\mbox{melt}}}}\right)=-2.30-0.258{{\rm{\times }}}\Delta {{\mbox{FMQ}}}+\frac{1871}{T}-0.24{{\rm{\times }}}\frac{{\mbox{NBO}}}{{{\rm{Tetra}}}}$$
(7)
$${\log }_{10}\left({D}_{{\mbox{V}}}^{{\mbox{Opx}}/{\mbox{melt}}}\right)=-4.61{{\mbox{-}}}0.198{{\rm{\times }}}\Delta {{\mbox{FMQ}}}+\frac{5981}{T}{\mbox{+}}4.48{{\rm{\times }}}{{\mbox{Al}}}^{{\mbox{Tetra}}}$$
(8)
$${\log }_{10}\left({D}_{{\mbox{V}}}^{{\mbox{Cpx}}/{\mbox{melt}}}\right)=- 4.24{{\mbox{-}}}0.267{{\rm{\times }}}\Delta {{\mbox{FMQ}}}+\frac{5717}{T}{\mbox{+}}3.64{{\rm{\times }}}{{\mbox{Al}}}^{{\mbox{Tetra}}}$$
(9)
$${\log }_{10}\left({D}_{{\mbox{V}}}^{{\mbox{Spl}}/{\mbox{melt}}}\right)=- 1.09{{\mbox{-}}}0.186{{\rm{\times }}}\Delta {{\mbox{FMQ}}}+\frac{2447}{T}{\mbox{+}}0.004{{\rm{\times }}}{{\mbox{Cr}}}^{{\mbox{\#}}}$$
(10)
$${\log }_{10}\left({D}_{{\mbox{Sc}}}^{{\mbox{Opx}}/{\mbox{melt}}}\right)=3.31 {{\mbox{-}}} 73{{\rm{\times }}}\frac{P}{T}{\mbox{-}}0.038{{\rm{\times }}}{{\mbox{Mg}}}^{{\mbox{\#}}}$$
(11)
$${\log }_{10}\left({D}_{{\mbox{Sc}}}^{{\mbox{Cpx}}/{\mbox{melt}}}\right)=-1.51+\frac{2342}{{\mbox{T}}}-160{{\rm{\times }}}\frac{P}{T}+2.44{{\rm{\times }}}{{\mbox{Al}}}^{{\mbox{Tetra}}}$$
(12)

The ratio of non-bridging oxygens to tetrahedrally coordinated cations (NBO/Tetra) is a measure of melt polymerization101, calculated as:

$$\frac{{\mbox{NBO}}}{{\mbox{Tetra}}}=\frac{4{{\rm{\times }}}{{\mbox{X}}}_{{\mbox{Tetra}}}-2{{\rm{\times }}}{{\mbox{X}}}_{{\mbox{O}}}}{{{\mbox{X}}}_{{\mbox{Tetra}}}}$$
(13)

where XTetra and XO represent the atomic proportions of tetrahedrally coordinated cations and oxygen, respectively. To estimate this, we compiled global primitive arc basalts from the GEOROC repository (n = 785) and calculated the major element compositions of primary arc melts assuming a water content of 4 wt%57,58, from which we calculated the average NBO/Tetra value of global primary arc melts.

AlTetra is the tetrahedrally coordinated site (atoms per formula unit) of Al in Opx and Cpx20. We estimated mean AlTetra values for Opx and Cpx in global spinel lherzolites based on the compilations and verifications of Qin et al.37. The molar Cr# (100× Cr2O3/(Cr2O3 + Al2O3)) and Mg# were employed in empirical Eqs. (10) and (11), respectively20. We determined the mean Cr# and Mg# values for global spinel lherzolites based on the compilations and verifications of Qin et al.37. Finally, combined with a non-modal near-fractional melting model and average melting P-T conditions (Fig. 2b, c), we converted the mean V/Sc values of primitive arc basaltic samples to mean sub-arc mantle fO220.