Introduction

Corrosion is the degradation of a material resulting from its interaction with the environment, with corrosion presenting as a significant barrier to the reliability and service life of structural materials in applications ranging from infrastructure to aerospace1,2,3,4,5. Studies suggest that 25–30% of annual corrosion costs could be avoided through corrosion management6,7,8. Traditional alloys, designed with a single principal element, often struggle to balance corrosion resistance with other critical properties such as strength and ductility9. In recent years, multi principal element alloys (MPEAs), also inclusive of high-entropy alloys (HEAs), have been developed as a novel category of metallic materials10,11. Unlike conventional alloys, MPEAs incorporate multiple principal elements in near-equiatomic or complex proportions, yielding an expansive compositional space with adjustable properties12,13,14.

The corrosion behaviour of MPEAs, however, is highly variable and far from straightforward15. Studies have shown that their corrosion rate depends on a complex interplay of composition, phase stability, and environmental conditions such as electrolyte type and pH16,17,18. For instance, certain MPEAs form protective passive films that enhance pitting resistance, whereas others exhibit accelerated degradation from phase heterogeneity or micro galvanic effects between constituent elements16. The number of possible MPEA combinations is vast and can be estimated using the formula:

$${N=(n+1)}^{(k-1)}$$
(1)

where n is the number of alloying elements and k is the number of principal elements (assuming alloying is in equal atomic ratios)19. Considering non-equiatomic compositions with large numbers of n and k, the number of possible alloys expands to 1020, making traditional experimental and computational approaches inadequate for systematically mapping composition-property relationships20,21,22. Experimental trial-and-error approaches are often time-consuming and resource-intensive, while atomistic simulations struggle to capture the full scope of corrosion dynamics under varied conditions. For example, CALPHAD (i.e. thermodynamic calculations) approaches are typically applied to alloys containing a restricted number of elements, because of constraints in the existing thermodynamic databases20,23.

To address these challenges, machine learning (ML) has been increasingly adopted as a tool for predicting material properties, including corrosion performance. Machine learning models, such as random forests, neural networks, and XGBoost have demonstrated effectiveness in predicting corrosion metrics for more conventional alloys like stainless steels and aluminium alloys24,25,26,27. In such studies, accuracies for modelling corrosion behaviour were reported to be around 82–93% with decision tree model24, 80% with RF model25 and 98% with XGBoost model26. Typical data set sizes for corrosion studies to date, which utilise machine learning (ML), are in the range of 50–50028,29. It is known that ML models trained on so-called small datasets (i.e. those with less than 100 entries) can result in models with reduced or limited accuracy, and of which have significant limitations when extrapolating beyond known data bounds.

The application of ML models to rationalise and interpret the corrosion of MPEAs is constrained by limited experimental datasets and the complexity of the MPEA feature space. A recent study however16, compiled a list of 619 unique entries for corrosion properties of MPEAs. That study is the largest holistic compilation of corrosion data for MPEAs to date and highlighted the complexities of visual and human level interpretation of MPEA corrosion. This is because the number of variables in the alloys studied (from the alloy composition to the microstructures developed), along with the test electrolyte and electrolyte concentration utilised, complicated the extraction of simple trends via data analysis. On this basis, there is a clear need to explore supervised machine learning on the issue of MPEA corrosion.

However, in the context of ML modelling being frustrated by the availability of datasets with only limited information or size, it is noted that generative adversarial networks (GANs) have shown promise in materials science by generating synthetic data to augment training sets30,31. For instance, Lee et al.32 employed a conditional GAN for phase prediction of HEAs that improved the accuracy of a deep neural network model from ~85 to 93%. Similarly, Zeni et al.33 developed a GAN model (MatterGen) capable of generating synthetic inorganic materials with over twice the likelihood of novelty and stability, tailored to specific chemistry, symmetry, and mechanical, electronic, and magnetic properties. The augmented data improved prediction accuracy and reduced generalisation error of the ML models.

Generative adversarial networks (GANs), introduced by Goodfellow et al. in 201434, are advanced deep learning models designed to generate synthetic data replicating a given training distribution. A GAN comprises two neural networks: a generator, that generates candidate data samples, and a discriminator, which can assess produced ‘samples’. These components are trained simultaneously in a competitive framework (i.e. a so-called ‘minmax game’), where the generator aims to fool the discriminator by improving its ability to produce realistic data; whilst the discriminator improves its ability to differentiate true data from generated samples. This adversarial interplay refines both networks until the generated data tends towards being statistically indistinguishable from the real dataset - which is posited to enable effective data augmentation for applications including alloy design35,36. To date, the use of GANs in corrosion studies, particularly for MPEAs, remains largely unexplored. While GANs have been used in corrosion image generation for automobiles37 and uncoated steel plates38, research in quantitative corrosion evaluation has mostly focused on supervised modelling39. However, Woldesellasse et al. used a conditional GAN to improve the prediction accuracy of a neural network (from 86% to 96%) for corrosion pit depth of onshore oil and gas buried pipelines through data augmentation29. Additionally, Li et al.22 proposed a GAN model (cardiGAN) for classifying the alloy phases in design and discovery of MPEAs. The proposed network effectively generated novel MPEA compositions and highlighted the potential to accelerate the exploration of new alloys with desirable phase-related properties including corrosion properties.

This present study seeks to advance the field by developing a supervised ML model for MPEA corrosion. The selection of supervised ML models investigated was done on the basis of exploring techniques which are reported to be capable of handling data of high dimensionality. Additionally, this model is developed with GAN-based data augmentation. This combination is, to the best of our knowledge, not yet reported in the literature10,12,20. To overcome data scarcity, a GAN was employed to generate a synthetic dataset, in order to enable accurate predictions upon ML model training. Specifically, a non-dominant sorting optimisation-based generative adversarial network (NSGAN) framework was employed. The novelty of this work lies in synergistically combining these techniques to accelerate the exploration of MPEA design space. Specifically, the study presents a computationally efficient data-driven approach for more accurately predicting and optimising the corrosion performance of MPEAs. Leveraging augmented datasets of compositional and environmental features, ML models were expected to predict the key corrosion metrics, namely: corrosion potential, corrosion current density, and pitting potential. The workflow of our method is illustrated in Fig. 1, providing an intuitive overview of the approach.

Fig. 1
figure 1

Schematic of the workflow and methodology in the present study; noting that empirical data originates from the archival literature.

Results

Modelling

The predictive performance of four ML models for corrosion properties of MPEAs was evaluated across three key metrics: corrosion current density (icorr), corrosion potential (Ecorr), and pitting potential (Epit), as presented in Fig. 2. Specifically, the data on the left of Fig. 2 is most relevant to discuss first, since such data corresponds to ML modelling with the experimental dataset.

Fig. 2: The predictive performance of machine learning models for first row: corrosion current density, second row: corrosion potential, and third row: pitting potential of MPEAs.
figure 2

In each row, model performance with (left) real data, and (right) synthetic data is presented for comparison. The x-axes represent four regression models namely LassoIC (Lasso model regularised with an Akaike information criterion), KRR (kernel ridge regression), RF (random forest), and NN (neural network). The performance of each model variant is calculated through average of train (red bars) and test (blue bars) scores within 10 repetitions. The y-axes represent prediction scores based on the coefficient of determination (R2), with error bars indicating the 95% confidence interval.

The supervised regression models included a linear model, LassoIC and three non-linear models: KRR, RF and NN. For icorr, kernel ridge regression model achieved the highest training score (R² ≈ 0.87), but its test score (R² ≈ 0.35) indicated overfitting, a trend also observed in RF. In contrast, neural network model (R² ≈ 0.40 and 0.34 for train and test scores) and LassoIC (R² ≈ 0.34 and 0.25 for train and test scores) demonstrated more consistent performance between training and testing phases, suggesting better generalisability despite lower overall scores. Among the models, RF model exhibited the highest test score for icorr (R² ≈ 0.38) highlighting its superior predictive capability, albeit that when using the experimental dataset with 619 entries, the overall train and test scores for all models were not aspirational. Through the work herein, this again highlights the challenge in materials science, that working with datasets of limited size can frustrate the ability of ML models to perform at high levels of accuracy.

In predicting Ecorr, RF again outperformed other models with the highest test score (R² ≈ 0.66) and a training score of R² ≈ 0.88, while LassoIC and NN exhibited closer train-test alignment, showing less overfitting. Kernel ridge regression model exhibited the second highest prediction accuracy for Ecorr prediction with a test score of R² ≈ 0.60. For Epit, all models showed high performance, with RF and KRR achieving the highest test scores (R² ≈ 0.78). However, similar to Ecorr, NN and LassoIC demonstrated a better alignment in train and test scores, suggesting noticeable generalisability for this property prediction. Across all metrics, error bars representing 95% confidence intervals highlighted variability, particularly in non-linear models, emphasising the importance of proper model selection to balance predictive accuracy and model stability.

To address the data limitations identified in the initial modelling, particularly the constrained predictive accuracy for icorr and the deviations at extreme values for Ecorr and Epit, the use of synthetic data derived from the NSGAN was employed to augment the MPEA corrosion dataset. These synthetic datasets were designed to capture the underlying distribution of MPEA compositional, microstructural, and environmental features, providing a more diverse and representative training set. This approach has the potential to significantly tackle the scarcity of existing data for the high-dimensional space of MPEAs, as shown in Figs. 3 and 4. These scatter plots visualise the real (solid dots) and synthetic (hollow dots) datasets with colour coding that represents different electrolyte types. As it is evident, the original and synthetic GAN-generated data points for each electrolyte type overlap significantly across the range of all three corrosion properties, indicating that the NSGAN captured the distribution of corrosion behaviour for MPEAs under diverse environmental conditions. These notable overlaps indicate the utility of the data augmentation via employed approach, however, slight discrepancies at the extremes (below −1100 mV for Ecorr and above 2000 mV for Epit) suggest the synthetic data may underrepresent rare conditions.

Fig. 3: Scatter plot of corrosion current density (icorr, μA/cm²) vs.
figure 3

corrosion potential (Ecorr, mV vs. SCE) for MPEAs, representing data augmentation using a non-dominant sorting optimisation-based generative adversarial network. Solid dots (real data) and hollow dots (GAN-generated synthetic data) are colour-coded by electrolyte type.

Fig. 4: Scatter plot of corrosion current density (icorr, μA/cm²) vs.
figure 4

pitting potential (Epit, mV vs. SCE) for MPEAs, representing data augmentation using a non-dominant sorting optimisation-based generative adversarial network. Solid dots (real data) and hollow dots (GAN-generated synthetic data) are colour-coded by electrolyte type.

Following the data augmentation, the models were retrained to re-predict the corrosion behaviour of MPEAs. As shown in Fig. 2, upon re-modelling with the combination of original and GAN-generated datasets, improvements in predictive performance were observed for all corrosion metrics. For icorr, predictive accuracy of retrained models exhibited a substantial increase compared to the initial models. Random forest achieved the highest test score that is increased from R² ≈ 0.38 to R² ≈ 0.80, with a reduced train-test score gap (train R² ≈ 0.65). Kernel ridge regression and NN showed very closed test scores to RF, with R² ≈ 0.82 for KRR and R² ≈ 0.81 for NN. The linear LassoIC and NN models demonstrated no train-test score gap, while for KRR the gap significantly narrowed from 0.52 to 0.12. The results indicate better generalisation, reduced overfitting, and stronger capability in capturing the complex relationships of corrosion current density with the compositional and environmental features for augmented data.

Similarly, for Ecorr and Epit, all test scores improved and the gap between train and test scores reduced likely due to the increased data diversity capturing a broader range of corrosion behaviours. For Ecorr, the predictive accuracy of RF increased from R² ≈ 0.66 to R² ≈ 0.90 (train R² ≈ 0.98), while the test score of KRR improved from R² ≈ 0.60 to R² ≈ 0.88. Neural network and LassoIC achieved the test scores of R² ≈ 0.83 and 0.67, respectively. For Epit, where the initial performance was already accurate (R² ≈ 0.58–0.78), RF and KRR both achieved a test score of R² ≈ 0.95 (train R² ≈ 0.98), with improved prediction accuracy at the boundaries. Neural network and LassoIC also benefited, with the test scores rising to R² ≈ 0.88 for NN, and R² ≈ 0.74 for LassoIC, alongside a tighter train-test alignments, further confirmed the improved performance of the models. A summary of models’ predictive performance with experimental and augmented data are summarised in Table 1.

Table 1 Comparison of model performance with the experimental and augmented data

The augmented dataset provided a more diverse training set, not only mitigated overfitting in RF and KRR but also enabled all models to better handle the complex high-dimensional compositional space of MPEAs, as evidenced by reduced variability in the 95% confidence intervals across all metrics (Fig. 2). The results highlight the effectiveness of combining supervised ML models with data augmentation via GANs (integrated GAN-ML approach) in overcoming data scarcity, addressing the limitations observed in the initial modelling, improving predictive accuracy, and accelerating the understanding and design of corrosion-resistant MPEAs.

To further validate the improved predictive performance with data augmentation, parity plots of predicted versus actual values for current density, corrosion potential, and pitting potential were generated, as shown in Figs. 5, 6, and 7, respectively. The retrained models exhibited tight clustering of points along the 45-degree line for all metrics, confirming the high predictive performance reported in Fig. 2 and Table 1. For icorr (Fig. 5), RF and KRR and NN showed slight scatters, reflecting their substantially improved test scores compared to the initial models. Similarly, for Ecorr (Fig. 6), no systematic biases at lower and higher values were noticed, particularly for RF and KRR, aligning with their improved performance. For Epit (Fig. 7), all models displayed near-perfect alignment along the diagonal, consistent with their high-test scores. These parity plots reinforce the effectiveness of the NSGAN-augmented dataset in improving model accuracy and generalisability.

Fig. 5: Parity plots of four machine learning models trained with augmented data for prediction of corrosion current density (icorr).
figure 5

Top left: LassoIC (Lasso model regularised with an Akaike information criterion), top right: KRR (kernel ridge regression), bottom left: RF (random forest), and bottom right: NN (neural network).

Fig. 6: Parity plots of four machine learning models trained with augmented data for prediction of corrosion potential (Ecorr).
figure 6

Top left: LassoIC (Lasso model regularised with an Akaike information criterion), top right: KRR (kernel ridge regression), bottom left: RF (random forest), and bottom right: NN (neural network).

Fig. 7: Parity plots of four machine learning models trained with augmented data for prediction of pitting potential (Epit).
figure 7

Top left: LassoIC (Lasso model regularised with an Akaike information criterion), top right: KRR (kernel ridge regression), bottom left: RF (random forest), and bottom right: NN (neural network).

Graphical user interface

To facilitate practical application of the framework developed in present study, a graphical user interface (GUI) webtool was designed using Google Colab (provided by Google®), that enables users to access the RF models for predicting corrosion properties of any arbitrary MPEA entry (Fig. 8). This tool allows researchers to input compositional and environmental features and obtain predicted corrosion metrics, streamlining the design and screening of corrosion-resistant MPEAs. The GitHub code associated with this Google Colaboratory notebook is publicly accessible through the following hyperlink:

Fig. 8
figure 8

Graphical user interface (GUI) webtool on Google Colab for predicting corrosion properties of arbitrary MPEA entries using the RF model.

https://colab.research.google.com/drive/1Q504PnMCWydQAQGnZplrNur8h3aMtFHk#scrollTo=LbTtrrB_9dg2

Discussion

This study reveals the impact of integrating supervised machine learning with generative adversarial networks (GAN-ML) for predicting corrosion behaviour of multi principal element alloys (MPEAs):

  • Initial ML models trained on the experimental dataset of 619 entries, struggled to achieve aspirational accuracy, particularly with icorr prediction. The highest performance was achieved using a random forest model, achieving a test score of 0.38.

  • The NSGAN-augmented dataset enabled non-linear models to capture complex composition-property relationships. After retraining with augmented data, random forest, as the most accurate model, achieved test scores of 0.83, 0.90, and 0.95 for icorr, Ecorr, and Epit, respectively.

  • The GAN-ML approach mitigated data scarcity, reduced overfitting, and improved predictive accuracy and generalisability of the models. A workflow for achieving this was presented herein.

  • This methodology facilitated the design of corrosion-resistant MPEAs, streamlining alloy development. This was demonstrated by the development of a user tool (GUI) to permit the prediction – for the first time – of corrosion properties of MPEAs for various environments.

Methods

Experimental data collection and format

In ML scenarios where input data is scarce, the efficacy of ML predictions relies more on the volume and quality of training data than on the specific ML algorithm selected40,41. Hence corrosion data of 619 MPEAs (i.e. 619 unique entries) available from published literature42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138 were collected in our preliminary work139 for training the ML models used in this research. The dataset was composed of inputs that included electrochemical testing (i.e. environmental) conditions, including electrolyte type and concentration; microstructural phases present; precise alloy compositions; and outputs that included corrosion properties. This data was derived from potentiodynamic polarisation experiments conducted in a range of electrolytes, including chloride-containing solutions, acidic media, and alkaline environments. The key outputs in the form of corrosion properties included corrosion potential (Ecorr), corrosion current density (icorr), and pitting potential (Epit). It is noted that the dataset only included 306 entries for the property of pitting potential (as this parameter was unable to be measured in all empirical tests, since not all electrolyte-alloy combinations promoted alloy passivity). The compiled dataset can provide a robust foundation for training ML models to predict MPEA corrosion performance, offering a practical screening tool despite the inherent variability of literature-sourced data.

Synthetic data generation via a generative adversarial network

This study utilised GANs to capture the underlying data distribution through an iterative process of validating synthesised data. The approach herein involved concurrently generating and optimising 20 sets of novel MPEAs and their output corrosion properties (each set containing 10,000 data points). This was achieved using a non-dominant sorting optimisation-based generative adversarial network (NSGAN) framework. This approach integrates the multi-objective optimisation capabilities of the NSGA-II algorithm with the generative power of a Wasserstein GAN with Gradient Penalty (WGAN-GP)36, enabling the model to implicitly learn and refine data distributions from the real dataset. The NSGAN operates across latent and design spaces, mapping high-dimensional alloy features into a lower-dimensional latent space for efficient multi-objective optimisation. This implementation utilised the Pymoo library for NSGA-II and PyTorch for WGAN-GP for data augmentation. In total 200000 synthetic alloys (and their associated characteristics and outputs) were generated available to augment experimental data. The details of how to generate such synthetic data, have been previously reported by the authors in refs. 140,141, including expanded descriptions and implementation tools.

Data visualisation

To facilitate the analysis of the MPEA corrosion data and evaluate the effectiveness of the NSGAN model in data augmentation, two visualisation techniques were employed using the scikit-learn Python library: distribution plots and the t-distributed stochastic neighbour embedding (t-SNE) dimension reduction algorithm. Kernel density estimation (KDE) was applied to plot and critically analyse the probability distributions of corrosion potential and corrosion current density for both real and GAN-generated MPEA data (Fig. 9).

Fig. 9
figure 9

Probability distribution plots of (left) corrosion potential, and (right) corrosion current density for real (619 entries) and GAN-generated (200,000 entries) MPEA data.

To explore the high-dimensional feature space of MPEAs comprising 36 features (chemical composition, phase type, electrolyte type and concentration), the t-SNE algorithm was implemented for dimension reduction. This non-linear technique projects the high-dimensional data into a two-dimensional space while preserving local structures and revealing patterns in the dataset. In the t-SNE scatter plot (Fig. 10), each point represents an MPEA entry, with colour-coded markers for real and GAN-generated synthetic data. This approach enables a visual evaluation of the NSGAN model ability to generate synthetic data that aligns with the feature distributions observed in the real dataset.

Fig. 10: t-SNE scatter plot for real (pink) and GAN-generated (blue) MPEA data.
figure 10

Each point represents an entry from the 36-dimensional feature space.

Machine learning models

To predict the corrosion performance of MPEAs, this study employed a set of supervised ML models: Linear Lasso with Information Criterion (LassoIC), Kernel Ridge Regression (KRR), Random Forest (RF), and Neural Networks (NN). These models were selected for their ability to handle regression tasks across diverse feature spaces and their established efficacy in materials property prediction. The input dataset comprised 36 features is shown in Table 2, including chemical composition of alloys (24 constituent elements), microstructural phase of alloys, electrolyte types, and electrolyte concentration. Data utilised is from testing that is reported to have been conducted at ambient/room temperature (~23 ± 2 °C). The categorical variables, phase and electrolyte type, were transformed into numerical representations using one-hot encoding, expanding the feature set to accommodate their discrete nature while preserving model interpretability.

Table 2 The input dataset comprised 36 features

Hyperparameter tuning

Model training and hyperparameter optimisation for both real and synthetic data were conducted using a 10-fold cross-validation framework implemented via grid search cross-validation (GSCV) from the scikit-learn library. This approach systematically partitioned the training set into 10 subsets, iteratively training on nine folds and validating on the remaining fold, to ensure unbiased performance assessment and minimise overfitting. For each model, GSCV explored a pre-defined hyperparameter space: LassoIC was tuned for the regularisation parameter (α); KRR for the kernel type (e.g., radial basis function) and regularisation strength (α); RF for the number of trees, maximum depth, and minimum samples per split; and NN for the number of hidden layers, neurons per layer, and learning rate.

Evaluation of model performance

Model performance was evaluated using the coefficient of determination (R2) on both training and validation sets. Coefficient of determination as the performance metric is defined as:

$${{\rm{R}}}^{2}=1-\frac{\sum {\left(y-\hat{y}\right)}^{2}}{\sum {\left(y-\bar{y}\right)}^{2}}$$
(2)

where y and \(\hat{y}\) are the actual and predicted values of the target corrosion property.

The 10-fold cross-validation process yielded average performance metrics, guiding the selection of the optimal model configuration for each algorithm. This rigorous evaluation ensured that the models effectively captured the relationships between compositional, structural, and environmental features and the corrosion properties of MPEAs.