Predicting stress corrosion cracking in downhole environments: a Bayesian network approach for duplex stainless steels

Rojas Zuniga, Abraham; Bakhtiari, Sam; Aldrich, Chris; Calo, Victor M.; Iannuzzi, Mariano

doi:10.1038/s41529-025-00646-y

Download PDF

Article
Open access
Published: 08 October 2025

Predicting stress corrosion cracking in downhole environments: a Bayesian network approach for duplex stainless steels

Abraham Rojas Zuniga¹,
Sam Bakhtiari¹,
Chris Aldrich²,
Victor M. Calo³ &
…
Mariano Iannuzzi¹

npj Materials Degradation volume 9, Article number: 122 (2025) Cite this article

593 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

This study presents a Bayesian network (BN) for the holistic assessment of stress corrosion cracking (SCC) risk. The model is designed to interrogate the optimal operating conditions of duplex stainless steels (DSSs) in downhole environments, addressing the perceived overly conservative limits by current industry standards, particularly those from ISO 15156—Part 3. A knowledge-based dataset on DSS performance was compiled from diverse sources. Machine learning and deep learning techniques facilitated data pre-processing and identification of feature interactions, supporting the BN structure’s development. Extensive cross-validation demonstrated that the BN model accurately predicted the occurrence of both pitting corrosion and SCC with over 90% accuracy. Using the BN model, inference analyses were undertaken to examine SCC risks for DSSs under diverse sour conditions. The results indicate that DSSs could withstand more aggressive conditions than those currently permitted by ISO 15156—Part 3, suggesting potential for broader and more effective use in oilfield applications.

Systemic and dynamic risk analysis of drilling construction based on bayesian network and system dynamics model

Article Open access 31 July 2025

Electrochemical behavior of 2205 duplex stainless steel in simulated solution containing high concentration Cl⁻ and saturated CO₂ at different temperatures

Article Open access 12 July 2022

Data-driven pitting evolution prediction for corrosion-resistant alloys by time-series analysis

Article Open access 11 November 2022

Introduction

In the petroleum industry, engineered tools and structures integral to production systems operate under some of the most severe industrial environments¹. This distinction is further exemplified through the working conditions in technically challenging fields, such as high-pressure and high-temperature (HPHT) reservoirs, deep and ultra-deep fields, and remote Arctic locations. In these settings, pressure and temperature conditions reach upwards of 160 MPa and 300 °C, respectively, at depths exceeding 10,000 m^1,2.

The presence of corrosive agents, predominantly chloride (Cl⁻), carbon dioxide (CO₂) and hydrogen sulphide (H₂S), further escalates the technical difficulties of recovering crude oil and natural gas^3,4. Moreover, the severity of downhole conditions can be aggravated by the presence of elemental sulphur (S⁰), which is likely to occur if H₂S concentration exceeds a 5–10% threshold within the gas phase^5,6,7. In the same vein, deep reservoirs may contain traces of organic acids (mainly acetic acid) and a variety of other contaminants, including liquid metals^3,8,9. Furthermore, the use of completion fluids (typically rich in Cl⁻) and highly acidic stimulation chemicals (e.g., hydrochloric and hydrofluoric acids), in conjunction with enhanced recovery methods, can considerably affect the corrosivity of the environment throughout the life cycle of production wells^10,11.

Given the aggressive characteristics of downhole environments, addressing degradation mechanisms associated with environmentally assisted cracking (EAC) becomes imperative, chiefly stress corrosion cracking (SCC)^3,5. Fundamentally, SCC is an anodic form of EAC, derived from the synergy between mechanical stresses (residual or applied) and a reactive environment^12,13. SCC constitutes a prevalent threat to the integrity of hydrocarbon production equipment, as it can markedly accelerate the mechanical failure of exposed components¹⁴. Therefore, exhaustive protocols of material selection, extending from well testing to the completion stage, are critical to mitigating the risks posed by SCC^15,16.

Selection criteria for metallic alloys focus primarily on robust structural integrity, superior corrosion resistance, affordability, and mechanical properties that satisfy operating conditions^17,18. Thus, commonly employed materials such as carbon and high-strength low-alloy steels, while adequate as casing materials, generally do not meet the requirements for other downhole applications due to their limited corrosion resistance^10,16. Instead, corrosion-resistant alloys (CRAs) are preferred for tubing, liners, and critically exposed components (e.g., tubing hangers, wellhead flow crosses and control valves), given their superior resistance to aggressive forms of corrosion such as pitting^4,19. For the most severe environments, CRAs employed include super austenitic stainless steels, nickel-based and nickel-cobalt alloys, as well as titanium alloys²⁰. However, the most commonly used CRAs are 13–22 wt% chromium (Cr) alloys, such as martensitic stainless steels and, in more demanding environments, duplex stainless steels (DSSs)^20,21.

Due to their high resistance to localised corrosion and SCC, DSS alloys are used in sectors other than oil and gas, such as chemical processing, power production, and desalination²². In terms of chemical composition, duplex grades contain from 19 to 30 wt% Cr, and additions of nickel (Ni), molybdenum (Mo), nitrogen (N) and tungsten (W), with an optimally balanced austenite/ferrite phase ratio, approaching 50:50^23,24. Thereby, DSS integrates the ferrite phase’s high strength with the austenite’s ductility and toughness²⁵. Moreover, the high strength and hardness also endow DSSs with remarkable resistance to erosion, cavitation, and corrosion fatigue^26,27,28. The chemical composition of DSS promotes the formation of a passive Cr-rich oxy-hydroxide layer. Consequently, DSS alloys maintain relatively low corrosion rates when exposed to both CO₂ and H₂S within oilfield settings at temperatures not exceeding 100 °C²⁹. Regarding the mechanical resistance, the yield strength (${\sigma }_{{YS}}$) of DSS alloys ranges from 450 MPa when solution-annealed to approximately 1100 MPa through cold working, rendering them suitable for both shallow and deep well applications^5,30. Notably, the DSS family is regarded as a cost-effective alternative compared to coated carbon steels, high-resistance stainless steels and, in certain instances, Ni-based alloys²⁵.

Notwithstanding these attributes, industry standards currently provide an incomplete representation of DSS alloys’ performance in oil and gas environments. Particularly, the standard governing the material selection of CRAs in H₂S-containing services, ISO 15156—Part 3 (ISO 15156−3)³¹, imposes strict limits concerning critical corrosion factors. These include partial pressures of H₂S (pH₂S) and CO₂ (pCO₂), solution pH, temperature, and Cl⁻ concentration³¹. For DSSs, the standard prescribes an operational threshold for downhole tubular components based primarily on pH₂S, ranging from 0.3 to 3.0 psi (~ 0.02–0.2 bar). However, such operational boundaries are often perceived as overly conservative, and diverge from numerous studies indicating that DSS alloys can withstand higher H₂S levels^32,33,34.

A prominent example is the seminal review by Cassagne et al.³⁵, which emphasised that the SCC susceptibility of DSSs extends beyond pH₂S levels, involving temperature, Cl⁻ concentration and pH as critical determinants. While this study is observational in nature, comparing data from the field and experimental work, it suggests that DSS alloys may resist approximately 2.0 bar pH₂S. This tolerance depends on maintaining Cl⁻ concentrations below 10,000 ppm, a minimum pH of 4.5, at temperatures ranging from 80 to 100 °C.

While extensive literature has documented the SCC resistance of DSSs^{25,36,37,38,39,40}, a comprehensive assessment of failure risks under service conditions remains elusive. Moreover, the disparity between empirical evidence and standardised limits highlights, in principle, the necessity of more integrative frameworks to inform material selection and application in production systems^36,37. Addressing this need, the present study introduces a novel data-driven model, based on Bayesian networks (BNs), to determine the SCC risk for DSS alloys in downhole environments. Fundamentally, BNs enable the representation of conditional dependencies among variables via a directed graphical structure. This study therefore employs BN modelling to encode the direct dependencies of critical determinants driving SCC, such as corrosive agents, temperature, and stress state.

Despite consolidating base practices and lessons learned, industry guidelines and international standards such as ISO 15156-3 do not address the multifactorial nature of SCC and EAC more broadly^38,39. The inherent complexity of EAC phenomena contributes to this shortfall. EAC processes are characterised by high variability and dimensionality, compounded by uncertainties related to boundary conditions and material-environment interactions⁴⁰. As a result, SCC modelling remains challenging, as it requires coupling chemical, electrochemical, and mechanical effects within a singular model⁴¹.

Multiphysics approaches have been used in the hierarchical modelling of material degradation phenomena, including SCC⁴². These first-principle methodologies, while often applied in isolation, have been instrumental in modelling electronic structures, atomic-scale interactions, and stress-induced damage. For example, atomistic modelling, grounded in density functional theory (DFT) and solid-state physics, is extensively utilised to calculate electronic structures and predict atomic-level reaction pathways⁴³. The DFT application offers profound insights into the micro-mechanisms driving SCC, including localised corrosion and hydrogen embrittlement (HE)⁴⁴. Molecular dynamics simulations complement these efforts by modelling the dynamic behaviour of atoms, including bond breaking and formation, as well as the effect of stress fields around the crack tip^45,46. Thermodynamic models have contributed to predicting the structure and composition of passive films, while assessing the aggressiveness of corrosive species in HPHT conditions^47,48,49. These models focus on estimating both corrosion and repassivation potentials, which are crucial for evaluating the onset of localised corrosion that precedes SCC initiation.

The finite element method (FEM) has been widely employed to determine localised corrosion rates and structural damage such as cracks and fractures. FEM achieves this by coupling electrochemical and mass transport models with continuum-scale structural mechanics analyses^50,51. In addition, phase-field models bridge the gap between atomistic and continuum models⁵². These mesoscale modelling techniques enable the simulation of phase transformations with evolving geometries. Such capability has facilitated a detailed examination of cracking initiated from corrosion pits^53,54.

Despite the advances offered by current physic-based models, a unified framework coupling the relevant temporal and spatial scales involved in SCC has yet to be realised^39,55. Furthermore, first-principles models frequently rely on classical theories, empirical correlations, or idealised assumptions grounded in measurable parameters and controlled boundary conditions⁵⁶. These characteristics can ultimately restrict their predictive capacity in real-world environments.

Alternatively, data-centric modelling offers significant support in analysing SCC failures. A key advantage lies in utilising real data, which inherently captures the stochastic nature of corrosion-induced failures⁴⁰. Recent studies in corrosion engineering prominently feature the integration of techniques in machine learning (ML) and artificial intelligence (AI)^{57,58,59,60,61,62}. These data-driven approaches have led to significant progress in visual detection and classification of different corrosion patterns^57,58, as well as the detailed analysis of chemical and electrochemical reactions⁵⁹, solid-state processes^60,61, and microbiologically induced degradation⁶².

In the oil and gas sector, BN modelling has allowed for holistic analyses of relevant degradation phenomena, including uniform material loss, localised corrosion, erosion, and under-deposit microbial corrosion^63,64,65,66. The proposed BN models yielded insightful information regarding the interactions between qualitative and quantitative factors affecting corrosion. These encompass hydrodynamic conditions (e.g., liquid hold up, partial pressures, velocity), temperature, medium pH, soil conditions, pipe characteristics (e.g., surface condition, types of coating, cathodic protection), and the presence of organic decay products.

BN modelling has been purposefully tailored to investigate EAC mechanisms. For instance, Sridhar et al.⁶⁷ proposed a BN model addressing localised corrosion risk (i.e., pitting and crevice) for Ni-Fe-Cr-Mo-N alloys in seawater. It estimates localised corrosion probabilities by integrating key input variables, including Cl⁻ and sulphate (SO₄²⁻) concentrations, temperature conditions, crevice tightness, and alloy chemistry. This BN application also incorporates established repassivation models⁴⁷, and a unifying parameter conceptually similar to pitting resistance equivalent (PRE)⁶⁸, which quantifies alloying effects on corrosion resistance.

Focusing on a different failure mode, Taylor et al.⁶⁹ employed a BN model to assess corrosion fatigue initiation in high-strength Al alloys (e.g., AA 7075, AA 2070). Inputs for this model included cathodic current density and intermetallic particle size, sourced from microstructural analysis and corrosion literature. Target nodes within the network are associated with pitting kinetics rates and pit-to-crack transition, which were parametrised using theoretical models from Harlow and Wei⁷⁰ and the Kondo criterion⁷¹, respectively.

A BN model has recently been developed to predict hydrogen stress cracking (HSC)⁷². This model facilitates probabilistic assessment of HSC-related damage, which includes uniform corrosion, pitting, grooving, sulphide stress cracking, and HE. Discrete and continuous nodes are incorporated into the BN model, representing a wide range of variables, such as metallurgical characteristics (e.g., ${\sigma }_{{YS}}$, microstructure), environmental chemistry (e.g., H₂S, pH, Cl⁻), mechanical loadings (e.g., strain rate, stress intensity) and operational conditions (e.g., cathodic protection, galvanic effects). Critical factors in the model include electrochemical potentials and the concentration of mobile hydrogen. However, conditional probabilities in the BN model are derived primarily from expert knowledge.

BNs present a compelling pathway for assessing SCC risks, especially in data-scarce environments prevalent in multiple real-world industrial applications, including oil and gas systems. In this regard, the implementation of data-centric techniques in SCC studies is still underdeveloped, as existing efforts are hindered by insufficient data and the lack of well-defined implementation strategies^{73,74,75,76,77}. In fact, a significant proportion of BN applications in corrosion engineering largely depend on expert knowledge for the specification of model parameters and network architectures⁷⁶. This reliance can nonetheless introduce subjectivity and cognitive biases, such as anchoring, availability bias, and overconfidence^78,79. Nevertheless, Sridhar et al.^16,80 emphasise the ability of BNs to combine diverse information sources, both empirical and expert-derived, thereby enabling probabilistic reasoning of complex systems and associated uncertainties.

Therefore, this work leverages the flexibility of BNs to synthesise a wide array of data within a computationally tractable framework. To construct our BN model, we compiled several data sources (i.e., industry standards, technical guidelines, and scientific papers), all pertinent to SCC testing of DSS alloys under sour conditions. Advanced preprocessing techniques, such as multiple data imputation and synthetic minority oversampling, were applied to prepare the dataset for analysis. The BN structural design benefited from other ML techniques, namely extreme gradient boosting (XGBoost) and Shapley additive explanations (SHAP), which optimised its predictive accuracy. The BN model from this research provides insights regarding the interactions of the most critical factors leading to cracking, and ultimately aims to interrogate the boundaries within which DSS alloys can effectively resist SCC in downhole settings.

To ensure clarity for readers, it is essential to establish a fundamental understanding of the factors influencing DSS vulnerability to SCC. Duplex alloys are distinguished by their exceptional corrosion resistance; however, their behaviour under SCC-inducing conditions remains a subject of extensive research. Of particular interest is the role of localised corrosion processes, either pitting or intergranular corrosion, in promoting the initiation of SCC⁸¹.

Although the underlying mechanisms of SCC are not thoroughly understood, it is hypothesised that initial crack incubates originate from pitting events¹². These nucleate after the disruption of the passive film and tend to grow preferentially along slip planes or grain boundaries³. Therefore, the formation of pits is considered a precursor to SCC, as these localised surface attacks act as stress concentrators, potentially leading to cracking^82,83. However, it is critical to recognise that not all corrosion pits transition into cracks. Instead, the pit-to-crack transition is a multi-component process, governed by more than mechanical stresses, metallurgical characteristics, and environmental dynamics; the morphological aspects of the pit itself (i.e., size, shape, and aspect ratio) also play a significant role⁸². Particularly, pits with shaper geometries lead to higher strain concentrations, which in turn increase the likelihood of evolving into cracks^84,85.

In examining the localised corrosion behaviour of DSS, various investigations indicate that pitting susceptibility is significantly influenced by Cl⁻ content and temperature^86,87. For example, experimental work in 1.0 M sodium chloride (NaCl) solutions has shown that standard DSS samples, containing 22 wt% Cr, typically undergo pitting above critical temperatures exceeding 60 °C⁸⁷. Nonetheless, critical pitting temperatures (CPTs) can extend to 150 °C, at Cl⁻ concentrations as low as 100 ppm⁸⁸. Interestingly, standard DSSs can exhibit pitting corrosion even at moderate temperatures around 30 °C. This occurs when they are exposed to high Cl⁻ levels (>159,00 ppm) in solutions with different ionic compounds, such as NaCl, calcium chloride (CaCl₂) or magnesium chloride (MgCl₂)⁸⁹. Notably, the increase in Cr content enhances pitting resistance; for instance, DSS alloys with 25–28 wt% Cr (also referred to as super DSSs) exhibit an average CPT of around 90 °C^86,87.

A standard parameter for quantifying localised corrosion resistance and SCC susceptibility of CRAs is the PRE, given by⁶⁸

$${PRE}= \% {Cr}+3.3\left( \% {Mo}+0.5 \% W\right)+16 \% N$$

(1)

This metric empirically correlates higher levels of Cr, Mo, W and N with enhanced resistance to localised corrosion in chloride-rich environments^90,91. Accordingly, lower PRE values indicate a greater susceptibility to pitting corrosion⁹². Since localised attacks are a precondition in SCC, PRE is then used to establish the operational limits of DSSs for oil and gas applications. As downhole tubular, ISO 15156-3 standard³¹ dictates that conventional DSSs, with PRE values between 30 and 40, are eligible for environments where pH₂S does not exceed 0.3 psi (~0.02 bar). This applies irrespective of temperature, Cl⁻ concentration, and pH. Comparatively, super duplex grades, having PRE values from 40 to 45, can operate up to 3.0 psi (~0.2 bar) pH₂S and a maximum Cl⁻ concentration of 120,000 ppm.

However, the publications by Craig⁹³ and Garfias-Mesias⁹⁴ highlight that PRE is insufficient for evaluating the corrosion resistance of DSS in oil and gas settings. Both authors argue that, whereas PRE accurately correlates with the CPT for single-phase austenitic stainless steels in oxidising conditions, its predictive value fails for DSSs due to their dual-phase structure, or when considering anoxic environments. An example of this is the study conducted by Kane and Abayarathna⁹⁵. This investigation demonstrated that CPT for standard DSSs (PRE = 34) is limited to about 115 °C under sour conditions. Such observation contrasts sharply with the 200 °C achieved by Ni-based alloys N08825 (PRE = 31) and N06255 (PRE = 45), which have comparable or lower PRE values. In addition, Craig⁹³ and Garfias-Mesias⁹⁴ pointed out that PRE’s formulation neglects the combined effects that contribute to SCC, which involve mechanical stresses, corrosive species, and particularly, the influence of specific chemical constituents. Among these are impurities such as phosphorus (P), sulphur (S), as well as additions of manganese (Mn) and Ni; all of which can drastically alter the performance of DSS.

At downhole conditions, extensive research has demonstrated that DSS is prone to pitting, and subsequent cracking, in the presence of H₂S at temperatures between 60 and 180 °C^96,97,98. These investigations have explored a wide range of experimental conditions, including pH₂S levels spanning from 0.1 to 1.0 bar, Cl⁻ concentrations up to 120,000 ppm, and pH values between 3.0 and 4.5, while DSS specimens have been subjected to constant loads equivalent to 90% ${\sigma }_{{YS}}$. Notably, the evidence from these investigations elucidates the synergistic influence of H₂S and Cl⁻ in accelerating SCC. However, uncertainties persist regarding the conditions under which SCC is most likely to occur. This is further illustrated in Fig. 1a, b, which correlate pH₂S and Cl⁻ concentration across a variety of tests on standard and super DSSs, respectively. Table 1 complements these figures by detailing the properties of the tested DSS specimens.

**Fig. 1: SCC test outcomes for DSS alloys as a function of H₂S partial pressure and Cl⁻ concentrations.**

Table 1 Summary of DSS alloys employed in SCC studies presented in Fig. 1

Full size table

As seen in Fig. 1a, b, the data exhibit trends that associate increased cracking susceptibility with rising levels of H₂S and Cl⁻, although they do not provide clear-cut criteria for determining SCC boundaries. More importantly, Fig. 1 shows that both, standard and super DSSs, may resist SCC in conditions far exceeding those outlined in ISO 15156-3³¹. The data in Fig. 1 cover a broad range of experimental conditions, extending beyond H₂S and Cl⁻ concentrations. These include a temperature range from 25 to 200 °C, pH values between 2.7 and 5.4, pCO₂ up to 92 bar, and tensile stresses from 30 to 160% of DSS’s nominal ${\sigma }_{{YS}}$. Based on these data, it has been observed that DSSs exhibit less propensity for cracking under high H₂S levels, and low Cl⁻ concentrations, or when pH values are not strongly acidic; albeit the impact of mechanical stresses has yet to be quantified.

For example, the study published by Francis and Byrne⁹⁹ involved C-ring tests with a brine containing 46,000 ppm Cl⁻ at 80 °C (pH ≈ 4.3). The tests varied pH₂S from 0.125 to 0.375 bar within a CO₂/H₂S mixture, while stress levels were equivalent to the nominal ${\sigma }_{{YS}}$ of the DSS samples. Here, pitting and fine cracks were detected in standard DSS specimens starting from 0.25 bar pH₂S, whereas super DSS samples showed no signs of SCC, even at 0.375 pH₂S. Comparatively, Holmes et al.¹⁰⁰ observed principally pitting attacks in various SCC tests at constant load, without consistent development of SCC. These tests employed only DSSs with 22 wt% Cr, which were subjected to tensile stresses at 90% ${\sigma }_{{YS}}$, while exposed to 0.35 bar pH₂S and immersed in a highly concentrated solution with 100,000 ppm Cl⁻ (pH ≈ 4.5).

For super DSS, Woolin and Malingas¹⁰¹ found that more severe conditions are required to observe SCC failures. Here, authors conducted C-ring tests at a loading state equal to the nominal ${\sigma }_{{YS}}$ of the samples, in an environment comprising 20,200 ppm Cl⁻ and 2.0 bar pH₂S (pH ≈ 3.5) at 85 °C. Interestingly, Seigmund et al.³² noted that standard and super DSS alloys can resist SCC while exposed to a range of 0.5–1.0 bar pH₂S, and high Cl⁻ levels of around 45,000 ppm within a pH interval of 4.2–5.0. These experiments were undertaken under varied temperature conditions (i.e., from 28.5 to 180 °C) and maintaining a constant uniaxial loading for all DSS specimens (i.e., 90% ${\sigma }_{{YS}}$). Despite the aggressive experimental settings, DSS tensile probes underwent only moderate pitting corrosion, with penetration depths in the order of 20–50 μm, although no cracking was observed.

The preceding discussion has emphasised the prevailing uncertainty in identifying the conditions that lead to SCC of DSS alloys in downhole settings. Addressing this problem necessitates a sophisticated approach due to the complex interplay among numerous contributing factors. To this end, we present a BN model that visualises the relationships among variables affecting SCC in DSS alloys. Drawing on literature data, this model leverages BN inference to manage SCC uncertainty and interrogate the viability of DSSs for oil and gas applications in a probabilistic manner.

Results and discussion

Data preparation

This research involved compiling a dataset of 2535 instances from localised corrosion and SCC experiments on DSS. The dataset incorporates diverse information sources, including standards, technical guidelines, and scientific and conference papers. The supplementary materials accompanying this study provide a detailed list of these sources.

The generated dataset includes the specifications of DSS specimens, detailing their mechanical properties, such as ${\sigma }_{{YS}}$, ultimate tensile strength (${\sigma }_{{UTS}}$), and elongation (${\varepsilon }_{f}$), as well as their chemical composition and PRE. The DSS dataset also contains experimental parameters relevant to SCC occurrence, including temperature, pH₂S and pCO₂, Cl⁻ concentration, medium pH, as well as applied stresses (${\sigma }_{{app}}$). Test outcomes in the dataset are categorical variables: Pitting Corrosion and SCC. Each outcome is binary, indicating either the presence (YES) or absence (NO) of the event. The stress ratio (${\sigma }_{R}$), defined as the ratio of ${\sigma }_{{app}}$ to ${\sigma }_{{YS}}$, was also determined to assess whether the macroscopic stress state of DSS samples was elastic (${\sigma }_{R}$ < 1) or plastic (${\sigma }_{R}$ > 1). Thus, ${\sigma }_{R}$ was calculated as follows:

$${\sigma }_{R}=\frac{{\sigma }_{{app}}}{{\sigma }_{{YS}}}$$

(2)

However, the heterogeneity of source materials compromises data uniformity by introducing missing data points, particularly related to experimental features and results. These inconsistencies are largely attributed to variations in experimental protocols, including differences in sample handling and testing methods; for example, experiments at constant deflection (e.g., C-ring, U-bend, and 4-point bend tests) and constant load (e.g., proof rings), as well as slow strain rate tests. Consequently, many studies reported only the incidence of SCC, with control variables limited to environmental settings (e.g., pH₂S, pCO₂, temperature, and Cl⁻ content). Conversely, measurements of pH, equivalent stresses, or the incidence of pitting corrosion were either inconsistently documented or not feasible. For a concise overview, Table 2 summarises the continuous attributes of the DSS dataset, while Table 3 describes the categorical variables regarding pitting corrosion and SCC.

Table 2 Descriptive statistics and missing data percentages for continuous variables in the DSS dataset

Full size table

Table 3 Descriptive statistics and missing data percentages for categorical variables in the DSS dataset

Full size table

As reported in Table 3, the pitting corrosion column in the dataset exhibits approximately 47.1% missing data points. Since pitting is a prerequisite for SCC, any instances where pitting corrosion was not observed are exclusively associated with cases where SCC did not occur. Therefore, positive cases of SCC invariably imply positive instances of pitting corrosion. Moreover, Table 3 reveals a significant class imbalance in the pitting corrosion column, where positive instances of pitting corrosion exceed the negative ones by a ratio of approximately 8.73 to 1. This is of particular concern, as data-driven models can be heavily biased towards the majority class, resulting then in poor performance on the minority class¹⁰².

Missing data and class imbalance can compromise the performance and generalisability of predictive models derived from the DSS dataset. To mitigate these data quality issues, data preprocessing methodologies were employed. Specifically, missing values were imputed using generative adversarial networks (GANs). Subsequently, the observed class imbalance was managed through the application of the synthetic minority over-sampling technique (SMOTE). The following sections will elaborate upon the strategies adopted for preparing the DSS dataset.

Multiple data imputation

In this work, generative adversarial imputation nets (GAIN)¹⁰³ were employed to resolve the missing values in our dataset. The GAIN method implemented builds upon the source code provided by Yoon et al.¹⁰⁴, which has been adapted to be compatible with the Python library PyTorch¹⁰⁵. This adaptation also integrates Bayesian hyperparameter optimisation (BHO) via the Python library Optuna¹⁰⁶, enabling fine-tuning of the GAIN settings to achieve optimal performance. The corresponding techniques, GAIN and BHO, are further described in the “Methods” section.

The primary justification for employing data imputation in this work is, in principle, to ensure the completeness and robustness of the dataset. In data-centric models, missing data points can significantly impact their accuracy and reliability, leading to biased estimates and incorrect inferences about the relationships between variables¹⁰⁷. The supplementary material of this investigation offers detailed insights on the hyper-parameterisation process, as well as a comprehensive analysis of the data imputations.

Before the imputation process, the dataset underwent feature normalisation. This rescaling method was specifically utilised for the GAIN imputation process. Here, we employed Min-Max scaling¹⁰⁸, which transforms the minimum value (${x}_{\min }$) of each feature to 0 and the maximum value (${x}_{\max })$ to 1. Hence, scaled features (${x}_{{scaled}}$) are determined as follows

$${x}_{{scaled}}=\frac{x-{x}_{\min }}{{x}_{\max }-{x}_{\min }}$$

(3)

Min-Max normalisation preserves the original distribution of the data and the inter-relationships between feature values¹⁰⁹. In the GAIN algorithm, feature normalisation is crucial for stabilising the adversarial training process. Here, a generator network imputes missing values, while a discriminator component evaluates the imputation quality relative to the authentic observations¹¹⁰. Fundamentally, feature normalisation ensures that each feature exerts a proportional influence on the adversarial loss function, which guides the adversarial dynamics between the discriminator and the generator¹⁰³.

Figure 2 illustrates the data imputation results using GAIN. Here, the histograms compare the frequency distributions between imputed (red bars) and observed (blue bars) values for features with missing data. Through an adversarial training process, the GAIN algorithm effectively approximates the conditional distribution of the missing instances given the observed data^111,112. This results in the generation of statistically plausible data points that maintain the dataset’s structural and distributional integrity. Figure 2 exemplifies this outcome, where the close alignment between the imputed and real data distributions showcases GAIN’s accuracy in mimicking the actual underlying data distribution of the DSS dataset.

**Fig. 2: Comparison of frequency distributions between observed data and imputed data.**

The GAIN imputation accuracy was also quantitatively assessed. Figure 3 shows the behaviour of the root mean squared error (RMSE) throughout the training iterations. For clarity, this metric quantifies the average error between imputed values and actual data. In this respect, the GAIN algorithm was adapted to compute RMSE for both training and testing subsets, allowing the model’s performance to be monitored throughout the training process until stabilisation. In this study, 20% of the data was withheld as a test set, and during each iteration, 10% of this subset was deliberately obscured to simulate the presence of missing data. Thus, the RMSE values obtained for the training and test sets were 0.11 ± 0.05 and 0.19 ± 0.02, respectively, indicating relatively low error rates.

**Fig. 3: RMSE behaviour across GAIN training iterations.**

Dataset balancing

Following the multiple imputation procedure, a significant adjustment in the class distribution of the pitting corrosion variable was observed. The initial imbalance, where positive instances outnumbered negative cases by approximately 8.74:1 (see Table 3), was reduced to 1.98:1 post-imputation. However, this remaining disparity necessitated further corrective measures. Thus, data augmentation was applied using SMOTE for nominal and continuous variables (SMOTE-NC)¹¹³. This method oversamples the minority class (i.e., pitting negative cases), achieving a more equitable distribution of classes that ensures unbiased predictive modelling.

Figure 4 illustrates the results obtained using SMOTE-NC, comparing the distribution of cases for pitting corrosion and SCC before and after applying synthetic oversampling. Specifically, the number of negative pitting corrosion cases increased to 80% of the positive cases. This was the optimal threshold detected that preserves the balance of SCC cases. Figure 4a, b demonstrate a substantial increase in the number of negative instances of pitting corrosion from 850 to 1348, while the count of positive instances remained steady at 1685.

**Fig. 4: Class distribution balancing for pitting corrosion and SCC.**

Similarly, Fig. 4c, d show the impact of the dataset balancing on SCC instances. Here, the number of negative SCC cases rose from 1384 to 1653, and positive cases adjusted from 1151 to 1377. This demonstrates that, while effectively oversampling the pitting corrosion minority class, SMOTE-NC did not negatively skew the balance of SCC cases. By implementing these adjustments, the final imbalance ratios for pitting corrosion and SCC were around 1.25:1 and 1.20:1, respectively.

XGBoost modelling and SHAP analysis

After preparing the dataset through imputation and balancing, an XGBoost classification model was trained to predict SCC in DSS. However, our primary objective extended beyond simple prediction, aiming to elucidate feature interactions contributing to SCC susceptibility. Decision-tree ensembles, such as XGBoost, are inherently adept for this end, as their hierarchical structure models how combinations of variables influence the final prediction^114,115. Thus, feature contributions within the XGBoost classifier were subsequently investigated using SHAP values¹¹⁶. These quantify feature importance and pairwise synergies, providing data-driven insights that guide variable selection and potential connections for the BN model.

Firstly, the XGBoost model performance was optimised utilising BHO¹¹⁷, which required the coupled framework provided by the Python libraries XGBoost¹¹⁸ and Optuna¹⁰⁶. This optimisation process targeted maximising the area under the receiver operating characteristic curve (AUC-ROC), which increases the model’s ability to discern between the classes¹⁰⁸. To extensively explore the parameter space and ascertain the optimal model settings, a total of 3000 trials were executed during the BHO procedure. Full particulars concerning the hyperparameter ranges and final settings are presented within the supplementary material accompanying this manuscript.

During the BHO process, the predictive performance of the XGBoost classifier was assessed using a stratified cross-validation (CV) scheme¹¹⁹, structured into five distinct folds. This validation strategy partitions the dataset while preserving the original class proportionality (i.e., the ratio of SCC-positive to SCC-negative instances), preventing potential biases introduced by class imbalance¹²⁰. For each fold, 80% of the data formed the training set for the BHO-derived model configuration, with the remaining 20% used as the unseen test set. Figure 5 presents the CV results across the five folds, corresponding to the best hyperparameter configuration found through BHO.

**Fig. 5: Performance evaluation of the XGBoost classifier after CV.**

Figure 5a displays the aggregated confusion matrix, summarising the performance across the CV test folds. The optimised XGBoost model correctly classified 1682 SCC-positive instances (i.e., true positives, TP) and 1248 SCC-negative instances (i.e., true negatives, TN). Misclassifications were significantly lower, comprising 144 false positives (FP) and 112 false negatives (FN). All these counts formed the basis for deriving key performance indicators, as outlined in the “Methods” section.

Overall, the XGBoost model achieved an accuracy of 91.97%, indicating high agreement between predicted and actual classes. The XGBoost classifier demonstrated strong performance with a true positive rate (TPR), or recall, of 93.76% and a precision of 92.11%. The F1-score was 92.93%, indicating an optimal model’s effectiveness in predicting the SCC-positive class. The true negative rate (TNR), also termed specificity, was 89.66% and indicates the model’s capacity to correctly classify SCC-negative instances. The XGBoost model exhibited a false positive rate (FPR) of 10.34% and a false negative rate (FNR) of 6.24%. Further assessment is provided by the ROC curves in Fig. 5b, which illustrate the trade-off between TPR and FPR for each CV fold. Here, the AUC scores ranged from 0.899 to 0.991, yielding a high mean AUC of 0.967 ± 0.036. This metric indicates that the XGBoost model consistently distinguishes between SCC-positive and SCC-negative cases. The performance metrics of the XGBoost classification model are outlined in Table 4.

Table 4 Performance metrics of the optimised XGBoost classification model

Full size table

Figure 6 details the results of SHAP-based feature importance and interaction analyses. Model interpretation utilised the TreeSHAP explainer, as implemented in the Python package SHAP¹²¹. Fig. 6a illustrates the SHAP summary plot, displaying the distribution of SHAP values for each feature across all data points. This visual representation ranks features by the magnitude of their mean absolute SHAP values, indicating overall importance. The horizontal axis reflects the additive contribution of features towards shifting the model’s output. Thus, rightward shifts suggest an increased probability of SCC occurrence, while leftward shifts indicate a decreased probability. The colour gradient shifts from blue to red, denoting the impact of feature values, with blue representing a lower impact and red a higher one. In the case of categorical variables, such as pitting corrosion, SHAP values are coloured-coded in blue when pitting does not occur and red when it does.

**Fig. 6: SHAP analysis of feature importance and interactions for SCC predictions.**

As seen in Fig. 6a, key predictors with high SHAP values, such as ${\sigma }_{R}$, pH₂S, and pitting corrosion, demonstrate significant influence on XGBoost predictions towards increased SCC probability. In contrast, features such as pCO₂, temperature, Cl⁻, and pH exhibit varied effects on the classifier’s predictions, suggesting a dual role in the model’s predictive dynamics, or dependency on other features. Figure 6a highlights the importance of DSS alloy characteristics in the XGBoost model’s response, such as PRE, ${\sigma }_{{YS}}$, and ${\varepsilon }_{f}$. The relevance of PRE can be attributed to the alloying elements (i.e., Cr, Mo, N, W) determining its value, which collectively prevent pitting corrosion. Regarding the mechanical properties, ${\sigma }_{{YS}}$ and ${\varepsilon }_{f}$, these are inherently associated with toughness; a measure of the energy a material can absorb before fracturing. In fact, toughness serves as a key metric in SCC studies to measure how corrosive environments affect an alloy’s strength and ductility^122,123.

Figure 6b provides insights into pairwise feature contributions to model output through SHAP interaction values, quantifying their joint effect beyond individual contributions. Notable interactions include combinations of stress levels (i.e., ${\sigma }_{R}$) and environmental variables (e.g., pH₂S, Cl⁻ and pH). This observation aligns with the understanding that susceptibility to SCC is heavily affected by the interaction of mechanical loadings and environmental chemistry. Figure 6b indicates recurring interactions among ${\sigma }_{R}$, pH₂S, temperature, PRE, as well as strength and ductility variables (i.e., ${\sigma }_{{YS}}$ and ${\varepsilon }_{f}$). As opposed to this, interactions with individual alloying additions (e.g., Ni, Cr, Mo and Cu) are observed to be the least frequent. This pattern is consistent with the feature importance rank in Fig. 6a, where chemical constituents appeared less significant to the XGBoost model’s response.

Table 5 ranks the feature importance and interaction effects in the XGBoost classification model. Based on SHAP analyses, the top 10 variables exhibiting the most predominant interactions were selected for the BN model. It is important to emphasise that our SHAP value analysis, derived from training an XGBoost classifier, does not fully explain causality in SCC. Nonetheless, the insights obtained inform the potential connections for designing the BN structure. Thus, the XGBoost-SHAP framework in this investigation allowed for a more detailed understanding of the critical interdependencies among variables, which cannot be readily established through expert knowledge or theoretical comprehension regarding SCC.

Table 5 SHAP-based ranking of feature importance and interaction effects for the XGBoost classifier

Full size table

BN model

Figure 7 presents the BN model designed to predict SCC of DSSs, which has been developed using the software BayesiaLab 11.3.1 (Bayesia S.A.S. Ltd., France). The network incorporates nodes that represent the most critical attributes influencing SCC occurrence, as identified by our XGBoost–SHAP framework. The selected nodes encompass environmental variables (e.g., pH₂S, pCO₂, Cl⁻, temperature, pH), stress conditions (${\sigma }_{R}$), as well as specific material characteristics (i.e., ${\sigma }_{{YS}}$, ${\varepsilon }_{f}$ and PRE) associated with both mechanical resistance and chemical composition of DSSs. The BN design allows for the systematic interrogation of pitting corrosion and SCC, given the combined effect of sour conditions and tensile loading on DSSs.

What stands out in Fig. 7 is the directionality of the arcs pointing from the SCC node to the predictor nodes. In this respect, our BN model adopts an augmented naïve Bayes (ANB) structure, where the reverse directionality of connections emphasises a discriminative modelling approach^124,125. Unlike typical BN designs that frequently represent causal pathways, our model is structured to assess scenarios where SCC is assumed, shifting the focus to how various environmental and material factors influence this condition probabilistically. More importantly, ANB-based networks enable the explicit modelling of inter-variable dependencies. These BN structures often demonstrate enhanced classification performance by relaxing the conditional independence assumption in traditional naïve Bayes classifiers, which is often untenable in complex real-world applications¹²⁶.

In Fig. 7, the colour-coded arcs differentiate dependency types within the BN model. Blue arcs denote direct dependencies from the SCC node (i.e., target node) to predictors, establishing primary pathways that quantify the direct influence of each predictor on SCC risk. In contrast, pink arcs indicate additional inter-variable dependencies, highlighting the interactions that indirectly impact SCC risk. In this regard, feature interactions from SHAP analyses primarily guided the inclusion of inter-variable dependencies. Methodologically, network construction commenced with a basic naïve Bayes structure (generated via BayesiaLab) comprising only direct connections from the SCC target node to predictors. Afterwards, the BN structure was systematically augmented based on SHAP interactions, as reported in Table 5. To assess their impact on the accuracy of the BN model, the inter-variable dependencies were evaluated iteratively using a stratified five-fold CV.

Figure 8 shows the CV results obtained for the final BN model, evaluated with an 80:20 training-test data split. This analysis quantifies the model’s predictive performance for the pitting corrosion and SCC nodes, which are inherently correlated events. As seen in Fig. 8a, the confusion matrix demonstrates significant accuracy during the CV process for the pitting corrosion node, which registered an overall accuracy of 90.21%. Across all test sets, the model successfully classified 90.34% of positive instances (i.e., recall) and 91.43% of negative instances (i.e., specificity). Figure 8b illustrates the ROC curves for the pitting corrosion node, where AUC scores across CV folds ranged from 0.936 to 0.960, with a mean AUC value of 0.950 ± 0.008.

**Fig. 8: Performance evaluation of the BN Model using stratified CV.**

An accuracy of 91.39% was obtained for SCC classification. As illustrated in Fig. 8c, the BN model exhibited optimal performance during the CV process. Through all test sets, the BN model yielded 90.21% and 92.74% of recall and specificity, respectively. The ROC analysis for the SCC node yielded AUC scores ranging from 0.931 to 0.958, with an average of 0.945 ± 0.008, as observed in Fig. 8d. Collectively, these results demonstrate that the BN model consistently maintained an optimal level of accuracy across diverse testing conditions. Additional performance metrics for both target nodes, pitting corrosion and SCC, are presented in Table 6.

Table 6 Performance metrics for pitting corrosion and SCC nodes in the BN model

Full size table

Sensitivity analysis

A sensitivity analysis was conducted to elucidate the contributions and probabilistic dependencies in the BN model. The results are visually summarised in Fig. 9. The variables with the most significant impact on the target node SCC are depicted in Fig. 9a. Here, the size of each node indicates the direct contributions to the SCC node, as determined by BayesiaLab. For clarity, the direct contributions represent the causal effect of a given variable on the target node while holding other variables constant¹²⁷. Thus, larger nodes represent variables with greater direct influence on SCC.

In Fig. 9a, the red arcs highlight the most critical interdependencies among variables, which are determined by Kullback-Leibler (KL) divergence values¹²⁸. Specifically, KL divergence quantifies the information gain by examining the mutual relationship between two variables, as opposed to assuming they are independent. Therefore, higher KL divergence values indicate a stronger relationship between variables. Figure 9b depicts the KL matrix from the BN model, showcasing the strength of interdependencies between variables. It is important to note that the KL matrix is asymmetric, as KL divergence reflects the directional comparison between parent nodes (listed along the vertical axis) and their child nodes (listed along the horizontal axis)¹²⁹.

As seen in Fig. 9a, the outcomes of the BN model are predominantly influenced by three main nodes, namely ${\sigma }_{R}$, pitting corrosion, and pH₂S. These findings are consistent with results from the feature importance analysis using the XGBoost classifier (see Fig. 6a). Figure 9b indicates that the most critical relationships in the BN model are associated with pitting corrosion, where high KL values correspond to environmental factors, such as pH₂S, pH, Cl⁻ and temperature. Regarding DSS properties, the PRE node demonstrates a significant effect on the pitting corrosion node. In fact, the PRE node holds a key position in the BN model, demonstrating strong interdependencies with the mechanical properties (i.e., ${\sigma }_{{YS}}$ and ${\varepsilon }_{f}$), which in turn interact with stress levels at the ${\sigma }_{R}$ node.

It is noteworthy that directed arcs between pH₂S and Cl⁻ nodes are strongly interconnected, indicating a synergy affecting the pitting corrosion node. Comparatively, the direct interdependence between the temperature and Cl⁻ nodes appears less significant in the BN model, despite their well-known influence on pitting corrosion. This observation can be attributed to SCC-promoting conditions where H₂S is present, which alter the corrosion behaviour of stainless steels, including DSSs. In this regard, several published reviews argue that H₂S disrupts passive films synergistically with Cl⁻, rendering pH₂S the dominant variable triggering localised corrosion compared to variations in temperature and Cl⁻ concentration alone^4,5,81.

Table 7 offers an overview of the BN model configuration, detailing node discretisation, associated probabilities, mean values, and calculated direct effects on the SCC node. Importantly, the probabilistic parameters within the BN model incorporate uncertainty. For clarity, BayesiaLab performs Monte Carlo simulations (typically employing 1000 samples) to determine parameter uncertainty, which yields 95% confidence intervals for model probabilities¹³⁰. Meaning that, there is a 95% probability that the true parameter value lies within the given range, conditional upon the model and data.

Table 7 Overview of the BN model configuration

Full size table

Evaluation of pitting corrosion and SCC

The BN model demonstrated significant efficacy in predicting both pitting corrosion and SCC. This dual capability enables a thorough evaluation of SCC by discerning whether operating conditions are conducive to SCC or merely localised corrosion. Understanding this differentiation is crucial, as relying solely on pitting corrosion as a direct proxy for SCC initiation overestimates the perceived failure risk. This, in turn, may result in unnecessarily conservative reliability assessments for materials such as DSSs^131,132,133. A key parameter in the BN model is PRE, which can be used to examine the risks of both corrosion phenomena. Specifically, the PRE node can be adjusted to demonstrate that increased PRE values are associated with decreased probabilities of pitting events, consequently reducing SCC risks.

Figure 10 exemplifies the BN model’s response using the PRE node, showcasing its utility in the probabilistic evaluation of both localised corrosion and SCC. For this analysis, specific conditions were arbitrarily defined, encompassing a pH₂S interval of 0.02–0.5 bar, Cl⁻ concentrations between 30,000 and 120,000 ppm, temperatures spanning 60–115 °C, and constant loads equivalent to 0.8–1.2 ${\sigma }_{R}$. As shown in Fig. 10, these constrained variables are highlighted in green, indicating their limited probabilistic range within the BN model. To avoid limiting the dataset’s range of observations, other variables such as pH, pCO₂, ${\sigma }_{{YS}}$ and ${\varepsilon }_{f}$, were not subjected to specific constraints. Thereby, the BN model modulates their states and projects the most probable outcomes. In Fig. 10, the PRE node is marked with a red square, indicating 100% probability within the defined range. This visual cue emphasises the PRE values under evaluation, changing from the 35–40 interval for standard DSSs (Fig. 10a) to PRE values greater than 40 (Fig. 10b) for super DSSs.

**Fig. 10: Comparative analysis of BN model predictions for DSS and super DSS grades.**

Figure 10a indicates a 63.75% probability of pitting corrosion for DSSs with PRE between 35 and 40. However, the SCC risk remains significantly low, with a 68.39% probability of non-occurrence. This probabilistic assessment is consistent with the observations reported by Tynell¹³⁴, who conducted SCC tests on DSS samples of alloy S32205 (PRE ≈ 35) under sour conditions. In this study, DSS specimens were immersed in a solution with approximately 30,300 ppm Cl⁻ and subjected to a constant load equivalent to the nominal ${\sigma }_{{YS}}$ (i.e., ${\sigma }_{R}$ = 1.0). While various DSS samples did not undergo SCC, pitting corrosion was observed once pH₂S exceeded 0.3 bar and temperatures surpassed 70 °C. Similarly, Craig¹³⁵ documented pitting corrosion during SCC tests with alloy S32205 (PRE ≈ 35), although no signs of SCC were detected. In contrast, experiments with alloy S32550 (PRE ≈ 39) exhibited no damage. In this work, the U-bend specimens were tested using a solution containing 70,000 ppm Cl⁻ at 240 °F (≈ 115.5 °C), while exposed to a CO₂/H₂S mixture with a pH₂S of 5 psig (≈ 0.34 bar).

Figure 10b shows the probability of pitting corrosion and SCC for super DSS grades whose PRE values exceed 40. Here, the risks associated with pitting corrosion and SCC are notably low, with probabilities of 24.36% and 16.62%, respectively. These outcomes are consistent with empirical findings documented in the literature, demonstrating the exceptional resistance of super DSSs to both localised corrosion and SCC^{99,136,137,138}. The experimental conditions in these investigations involve pH₂S levels above 0.38 bar, Cl⁻ concentrations exceeding 100,000 ppm and temperatures over 100 °C, while stress conditions are equal or greater than 90% ${\sigma }_{{YS}}$.

SCC risks for DSSs

To further investigate SCC risks for DSSs using the BN model, inference analyses were conducted based on the operating limits for DSSs established by Cassagne et al.³⁵. The authors suggested operational thresholds for pH₂S based on field-relevant ranges of pH and Cl⁻ concentration, which are summarised as follows:

Environments with pH ≈ 3.5:

At 1000 ppm Cl⁻, conventional 22 wt% Cr DSSs resist up to 0.5 bar pH₂S, whereas 25 wt% Cr DSSs withstand up to 0.7 bar.
At 10,000 ppm Cl⁻, pH₂S tolerance reduces to 0.3 bar for 22 wt% Cr DSSs, and 0.5 bar for 25 wt% Cr DSSs.
At 100,000 ppm Cl⁻, the pH₂S limits further decrease to 0.2 bar for 22 wt% Cr DSSs and 0.3 bar for 25 wt% Cr DSSs.

Environments with pH ≈ 4.5:
At 1000 ppm Cl⁻, both 22 wt% Cr and 25 wt% Cr DSSs resist pH₂S levels greater than 2.0 bar.
At 10,000 ppm Cl⁻, 22 wt% Cr DSSs accommodate pH₂S up to 2.0 bar, while 25 wt% Cr DSSs withstand levels above 2.0 bar.
At 100,000 ppm Cl⁻, pH₂S range is 0.3 − 0.4 bar for 22 wt% Cr DSSs, and 0.8–1.0 bar for 25 wt% Cr DSSs.

For our inference analyses, we established the combinations of the abovementioned environmental conditions, while maintaining an average temperature of 80 °C and tensile loadings around 0.9 ${\sigma }_{R}$. Subsequently, the probabilities of pitting corrosion and SCC were assessed based on the PRE node states within the BN model. Thus, the PRE intervals were defined as follows: PRE ≤ 34, 35 < PRE ≤ 40, PRE > 40. The results of the inference analyses for environments with pH values of 3.5 and 4.5 are detailed in Tables 8 and 9, respectively.

Table 8 Inference analysis of pitting corrosion and SCC at pH = 3.5

Full size table

Table 9 Inference analysis of pitting corrosion and SCC at pH = 4.5

Full size table

As observed in Table 8, DSS alloys with PRE ≤ 34 exhibit a high risk of pitting corrosion in environments with a pH of 3.5. The BN model consistently predicted pitting corrosion probabilities exceeding 80% for this DSS category. In addition, significant SCC probabilities, exceeding 54%, were associated with specific conditions, such as 0.3 bar pH₂S and 100,000 ppm Cl⁻, 0.5 bar pH₂S and 10,000 ppm Cl⁻, 0.7 bar pH₂S and 1000 ppm Cl⁻. At lower pH₂S values (i.e., 0.2–0.3 bar), SCC risk was relatively low even when the Cl⁻ concentration was 100,000 ppm, where the associated probabilities of SCC no occurrence slightly surpassed 52%.

Comparatively, DSSs with PRE values between 35 and 40 demonstrated improved resistance to SCC across all conditions detailed in Table 8. In this respect, SCC is not expected within a probability range from 63.05% to 76.29%. Nonetheless, pitting corrosion remained probable for these DSS alloys, with probabilities oscillating between 58.18% and 71.16%. Notably, super DSS with PRE values over 40 exhibited the lowest risk of localised corrosion, with the absence of pitting reaching a probability as high as 72.05%. This superior performance consequently resulted in lower SCC risks, with the probability of SCC not occurring ranging from 69.18% to 84.21%.

At pH 4.5, the risk of pitting corrosion remains high for DSSs with PRE ≤ 34 across most sour conditions, as reported in Table 9. For these DSS alloys, the BN model predicted pitting probabilities spanning from 74.26% to 79.17%, although SCC risk is generally low, with no occurrence probabilities around 56%. Nevertheless, SCC occurrence risk may climb towards 60% when pH₂S levels vary from 0.8 to greater than 2.0 bar, and Cl⁻ concentrations exceed 10,000 ppm.

Table 9 shows that for DSSs possessing PRE values between 35 and 40, pitting may manifest with probabilities between 52.16% and 86.22% when pH₂S exceeds 0.4 bar. Below this pH₂S threshold, the BN model indicates a 58.63% probability that pitting corrosion will not occur even upon exposure to 100,000 ppm Cl⁻. Hence, the SCC risk for these DSS alloys is relatively low across all conditions, where probabilities indicate that there is no occurrence of SCC in the range from 55.28% to 69.98%. Consistent with the results observed in Table 8, the resistance to pitting corrosion and SCC markedly improves for DSSs with PRE > 40 across all evaluated conditions. As shown in Table 9, pitting corrosion is not expected, with probabilities ranging from 70.23% to 83.71%, while probabilities of SCC not occurring are notably high, falling between 81.31% and 90.54%.

Tables 8 and 9 demonstrate that pH variations significantly affect H₂S influence on SCC. This was observed by Leyer et al.¹³⁹, who quantified the pH-H₂S interaction, noting that a one-unit pH decrease can equate to a tenfold increase in pH₂S regarding environmental severity. As shown in Table 9, a pH of 4.5 significantly diminishes the impact of H₂S on SCC risk, potentially preventing cracking even at pH₂S levels beyond 2.0 bar.

However, DSSs with PRE ≤ 40 remain prone to localised corrosion across most conditions analysed herein, while super DSS grades (PRE > 40) effectively prevent both localised corrosion and SCC. In this regard, empirical evidence from the literature supports the outcomes presented in Tables 8 and 9. For example, Oredsson¹⁴⁰ conducted various experiments with DSS samples of alloy S32205 (PRE ≈ 34), which underwent pitting corrosion, albeit cracking did not occur. These observations were made under severe conditions, including a range of 0.1–5.0 bar pH₂S, a pH interval of 2.5–3.9, temperatures varying from 20 °C to over 200 °C, as well as Cl⁻ concentrations exceeding 30,000 ppm, and tensile stress levels approaching the nominal ${\sigma }_{{YS}}$.

In some SCC investigations, DSSs with PRE between 35 and 40 (e.g., S32205, S31803, S32550 and S32750) have exhibited localised damage while not manifesting cracking^97,98,141. The test parameters involved pH values between 3 and 4.5, a range of 0.1–1.0 bar pH₂S, temperatures from 60 to over 150 °C, and Cl⁻ concentrations ranging from 1000 to over 120,000 ppm, as well as stress levels spanning from 60 to 90% ${\sigma }_{{YS}}$. Comparatively, super DSS alloys (PRE > 40) demonstrate consistent resistance to both pitting and SCC. Investigations featuring alloys such as S32760, S32750 and S32520 reported no corrosion damage in environments comprising pH₂S values from 0.1 to 2.0 bar, Cl⁻ concentrations between 1000 and over 120,000 ppm, and temperatures up to 110 °C. These conditions also included pH levels from 2.8 to 6, while tensile loadings were equal to, or greater than 90% ${\sigma }_{{YS}}$^99,137,142.

SCC boundaries for DSSs

Thus far, the pH₂S values in our inference analyses have exceeded the H₂S limits in ISO 15156-3³¹ for DSSs. Despite this, our findings indicate that the incidence of SCC is often unlikely. This exposes a marked disparity between the conservative limits in industry standards and the actual performance of DSSs. While pH₂S is critical in restricting DSS applications, relying solely on this parameter is insufficient to determine SCC risks comprehensively.

In an attempt to elucidate the conditions under which SCC of DSS is prevented, we conducted a backwards analysis. This method infers the posterior probabilities of relevant variables based on observed outcomes¹²⁵. By assuming a 100% probability that SCC does not occur in the BN model, we can identify the range of environmental conditions that lead to this outcome. Similarly, a 100% probability of no pitting corrosion allows for examining possible settings that prevent both corrosion phenomena, while considering tensile loadings no greater than 90% ${\sigma }_{{YS}}$ (i.e., ${\sigma }_{R}\,$≤ 0.9).

Figure 11 illustrates the backward analysis using the BN model, demonstrating that DSS grades can withstand up to 0.5 bar pH₂S. This concentration considerably exceeds the ISO 15156-3 limits of 0.02 bar for conventional DSSs and 0.2 bar for super DSS grades, representing 25- and 2.5-fold increases, respectively. According to the BN model, other important environmental parameters must be within specific ranges, such as Cl⁻ ≤ 31,767 ppm, pCO₂ ≤ 0.546 bar, temperature ≤ 89 °C, and pH values around 4.2. Regarding the mechanical properties, the BN model points towards a probable range of 510 MPa ≤ ${\sigma }_{{YS}}$ ≤ 995 MPa, which covers commonly employed DSSs, such as S32250 and S32750^5,22.

**Fig. 11: Backward analysis to identify safe operating conditions for DSSs.**

Extensive experimental findings support the pH₂S boundary derived in this work. By way of illustration, Fig. 12 shows the results from a range of selected SCC investigations, using the relationship between pH₂S and Cl⁻ content. Analogous to Fig. 1, these parameters are key synergistic factors driving SCC, which facilitates comparison with the probabilistic pH₂S limit of 0.5 bar estimated herein. Specifically, Fig. 12a highlights that few pitting corrosion events are observed as pH₂S approximates to 0.5 bar. However, Fig. 12b shows that pitting corrosion and SCC occur predominantly at pH₂S levels exceeding 0.5 bar.

**Fig. 12: Comparison of the pH₂S limit derived from BN backward analyses against experimental evidence from the literature.**

Table 10 reports the DSS samples and experimental conditions of the literature data in Fig. 12. As summarised in Table 10, the data encompass a wide range of DSS alloys, such as S32205, S31803, S32750 and S32760. Test conditions replicated demanding downhole environments. These included pH₂S values up to 1.0 bar, Cl⁻ concentrations around 30,000 ppm, temperatures from 21 to 100 °C, a pH interval of 2.5–5.4, and pCO₂ up to 70 bar. Applied stresses were also significant, between 448 MPa and 1025 MPa (corresponding to ${\sigma }_{R}$ ≈ 0.9). These parameter ranges meet relatively well with the posterior intervals predicted by our BN model, underscoring both the repeatability of the evidence base and the conservatism embedded within current ISO 15156-3³¹ limits for DSSs.

Table 10 Summary of DSS samples and experimental settings in SCC studies presented Fig. 12

Full size table

Final remarks

This investigation introduces a BN model that assesses the risk of SCC in DSSs, particularly in the challenging conditions of downhole environments. Extensive CV demonstrated that the BN model achieved an accuracy of over 90% in predicting both pitting corrosion and SCC probabilities.

Our BN model infers that DSSs can reliably withstand higher partial pressures of hydrogen sulphide (pH₂S) than those currently stipulated by standard ISO 15156—Part 3 (ISO 15156-3)³¹. Specifically, our BN model estimates low SCC risk for DSS alloys even when exposed to 0.5 bar pH₂S in the gas phase. Such threshold is significantly higher than ISO 15156-3 limits (i.e., 0.02 and 0.2 bar pH₂S), representing a 25-fold and 2.5-fold increase for conventional DSS and super DSS, respectively. This finding underscores the insufficient characterisation of DSS grades in current standards and suggests the potential for their more cost-effective utilisation in sour service applications.

However, limitations persist in the context of causal inference. Our BN model falls short of enhancing the understanding of the physicochemical mechanisms driving SCC in the presence of H₂S. Much information is required to incorporate the effect of H₂S on promoting anodic dissolution through increased acidization and active corrosion acceleration leading to pit growth. Similarly, comprehensive data is needed to elucidate how H₂S facilitates increased hydrogen absorption in metals, resulting in embrittlement that ultimately promotes crack initiation and propagation.

Future work should therefore explore several avenues to enhance mechanistic understanding. As exemplified by network designs proposed by Sridhar⁸⁰, a better mechanistic understanding of SCC can be achieved by incorporating into the BN model microstructural details (e.g., phase balance, precipitates, cold work), as well as more specific environmental and mechanical factors (e.g., halide types, potentials, strain rates). In addition, transitioning our BN model to a dynamic BN approach would enable modelling time-dependent degradation processes, which is crucial for lifecycle assessments.

Data scarcity is a common limitation given the complex and costly experimental methods needed in EAC research. Thus, developing holistic EAC frameworks requires sophisticated data handling methods. In this study, the application of GANs for data imputation proved beneficial, and their potential for generating synthetic data is being increasingly explored to overcome dataset limitations in corrosion studies^143,144.

Methodologically, a key contribution of this work was the integration of ML with explainable AI. Specifically, XGBoost and SHAP analyses facilitated the data-driven development of our BN model, overcoming limitations of reliance solely on expert judgment. This data-centric approach holds considerable promise for future BN modelling of complex corrosion processes, applicable to diverse alloy systems and EAC phenomena, where adequate data exist. Nevertheless, enhancing causal interpretability and inferential validity may necessitate hybrid models. Thus, combining empirical data analysis with expert-informed variables should also be applied through relevant strategies, such as protocols in expert elicitation, structure learning with expert constraints, or network merging algorithms^78,145,146.

Lastly, it is pertinent to note that the BN model effectively synthesises a substantial body of literature data to manage uncertainties associated with DSS. However, the SCC boundaries estimated in this work should not be regarded as definitive limits. Rather, they indicate a range of conditions under which SCC susceptibility may be further investigated. The primary objective of this study lies in developing an integrative framework for SCC risk assessment aimed at advancing risk-based corrosion management and informing cost-effective material selection.

Methods

Workflow for BN model development

The development of the BN model in this study was structured into a three-stage workflow, as illustrated in Fig. 13. The initial stage focused on compiling data from 28 selected publications related to SCC studies of DSSs in sour environments, and thus consolidating a knowledge-based dataset, as detailed previously in Table 2.

In the second stage, the generated dataset underwent preprocessing, which included missing data imputation via GAIN and minority class oversampling using SMOTE-NC. The analysis proceeded with the training of an XGBoost classification model. Here, BHO was employed to fine-tune model settings so as to maximise predictive accuracy. Subsequently, SHAP analyses were undertaken to identify the main features, and their interactions, that largely contribute to SCC predictions. The insights gained from these analyses were instrumental in guiding the BN model design, specifically in selecting nodes and configuring directed arcs.

The third stage centred on constructing the BN model, leveraging the results from the XGBoost modelling. This phase encompassed establishing node connections, determining appropriate discretisation, and assessing the model’s performance through stratified CV. Fundamentally, stratified K-fold CV is a variant of k-fold CV. While k-fold CV randomly partitions the dataset into k equal-sized folds for model training and validation, stratified k-fold CV ensures that each fold maintains an equivalent proportion of class samples as the original dataset¹¹⁹. It is pertinent here to mention that both the XGBoost and BN models underwent CV using a stratified k-fold scheme of five folds. Meaning that, each CV cycle employs 80% of the data for training the models, and 20% for testing them.

During the CV process, we evaluated a range of performance metrics¹⁰⁸, which are listed as follows:

Confusion matrix. It constitutes a 2 × 2 matrix that summarises the performance of a classification model of the form:
$$\left[\begin{array}{cc}{TP} & {FP}\\ {FN} & {TN}\end{array}\right]$$
(4)
here, TP and TN denote correctly classified positive and negative instances, respectively; FP and FN denote negative instances misclassified as positive and positive instances misclassified as negative, respectively.

From the confusion matrix, the classification models’ accuracy is then estimated by.
$${Accuracy}=\frac{{TP}+{TN}}{{TP}+{TN}+{FP}+{FN}}$$
(5)
Recall, or TPR, evaluates the model’s capacity to correctly identify all relevant positive instances. It is computed as:
$${Recall}=\frac{{TP}}{{TP}+{FN}}$$
(6)
Precision assesses how many of the instances labelled as positive by the model are actually positive, which is calculated by
$${Precision}=\frac{{TP}}{{TP}+{FP}}$$
(7)
F1-score. It is the harmonic mean of precision and recall, designed to provide a single measure that balances both false positives and false negatives. Unlike the arithmetic mean, the harmonic mean penalises extreme values. F1-score is especially relevant in scenarios with class imbalance.
$${Precision}=2\times \frac{{Precision}\times {Recall}}{{Precision}+{Recall}}$$
(8)
Specificity, or TNR, indicates the model’s ability to correctly identify negative cases, serving as a counterpart to recall for the negative class.
$${Specificity}=\frac{{TN}}{{TN}+{FP}}$$
(9)
Receiver operating characteristic curve (ROC) is a graphical representation of the trade-off between the TPR and the FPR of a binary classifier, where TPR and FPR are determined by

$${TPR}=\frac{{TP}}{{TP}+{FN}}$$

(10)

$${FPR}=\frac{{FP}}{{FP}+{TN}}$$

(11)

The area under the curve (AUC) quantifies the overall performance of the classifier by measuring the entire area beneath ROC. It provides a scalar value ranging from 0 to 1, indicating the model’s aggregate ability to distinguish between positive and negative classes, a perfect classifier achieves an AUC of 1, whereas random guessing yields 0.5¹⁴⁷.

Additionally, a sensitivity analysis of the BN model was conducted to examine the primary attributes influencing SCC. Inference analyses were performed to explore SCC risks for DSSs within a range of sour conditions, comparing the results against existing literature. Ultimately, diagnostic reasoning (also termed backward analyses) was performed to infer the likely safe operating conditions for DSSs. This involved setting the desired outcome state (e.g., absence of SCC) and calculating the posterior probability distributions of the input variables consistent with that state.

Generative adversarial imputation nets

In real-life applications, datasets frequently exhibit inconsistencies, notably in the form of missing values across their attributes¹⁴⁸. To handle this issue, one prevalent strategy is imputation, through which missing instances are estimated based on observed values within the dataset¹⁴⁹. In this respect, deep learning (DL) methods have increasingly been employed to estimate missing values, such as denoising autoencoders, and GANs^103,150. Distinctively, these techniques tend to outperform statistical techniques (e.g., logistic regression, decision trees, and predictive mean matching), as they operate without assumptions about underlying data distribution¹⁵¹. Moreover, DL-based methods use a robust model to estimate missing data across multiple features, thereby effectively capturing the latent structure of complex high-dimensional data^152,153.

In this work, GAIN algorithm¹⁰³ is employed to resolve the missing values in our dataset. This data imputation approach has been effectively applied in various domains, such as materials science, civil engineering and medical research^112,154,155. Fundamentally, the GAIN method employs two main components: the generator and the discriminator¹¹⁰. The generator imputes the missing values based on the observed data, producing a complete data vector that resembles the real data distribution. Subsequently, the discriminator, equipped with additional hints about the missingness pattern, examines and distinguishes between observed and imputed values in the complete data vector. This adversarial process iteratively trains the generator to deceive the discriminator optimally, replicating the data’s actual distribution¹⁵⁶.

Synthetic minority over-sampling

SMOTE is a widely accepted method for addressing class imbalance in classification datasets¹⁵⁷. This random oversampling method generates synthetic examples within the minority class to achieve a more balanced class distribution, thereby enhancing the predictive performance of ML models. Specifically, SMOTE selects a minority class instance and then identifies its k-nearest neighbours within the feature space. Subsequently, a synthetic instance is created by interpolating the selected instance, and one or more of the nearest neighbours¹¹³. Thus, the synthetic sample, $s$, is generated by

$$s={x}_{i}+u\times ({x}_{{ki}}-{x}_{i})$$

(12)

where, ${x}_{i}$ is a randomly chosen minority class instance, ${x}_{{ki}}$ is one the k-nearest neighbours of ${x}_{i}$, and $u$ is a random number between 0 and 1.

In this work, we employ a variant of the original SMOTE algorithm, which is designed to handle datasets that contain both nominal (i.e., categorical) and continuous features. Referred to as SMOTE-NC¹¹³, this technique modifies the Euclidean distance calculation, required to estimate the k-nearest neighbours, by incorporating the median of the standard deviations of all continuous features from the minority class. This inclusion acts as a penalisation factor when categorical features differ between a sample and its potential k-nearest neighbours, effectively accounting for the disparity in categorical feature values. For the synthesis of new samples, SMOTE-NC employs the standard SMOTE interpolation for continuous features. Meanwhile, for categorical features, SMOTE-NC assigns the value that appears most frequently among the k-nearest neighbours, ensuring that the synthetic samples respect the distribution of the categorical data within the data frame.

Extreme gradient boosting

The XGBoost algorithm has widely gained recognition as a scalable and highly efficient ML method for classification and regression tasks in a variety of scientific fields¹⁵⁸. Developed by Chen and Guestrin¹¹⁸, XGBoost stands as an optimised version of the Gradient Boosting (GB) framework proposed by Friedman¹⁵⁹. This approach sequentially produces and updates base classifiers (weak learners) to build a robust ensemble classifier (strong learners), thereby systematically reducing errors and enhancing prediction accuracy. Meaning that, XGBoost algorithm aims to minimise prediction errors by gradually adding more learners (decision trees), while basing each update on the previous model’s prediction results¹⁶⁰. Thus, the resulting model exhibits the minimum bias and variance during the training process¹¹⁴. Fundamentally, the predicted output of the XGBoost model, denoted as ${\hat{y}}_{i}$, is the sum of all scores predicted by K trees, expressed as

$${\hat{y}}_{i}=\mathop{\sum }\limits_{k=1}^{K}{f}_{k}\left({x}_{i}\right),{f}_{k}\in F$$

(13)

where K denotes the total number of trees, $k$ represents the k-th tree, ${x}_{i}$ is the feature vector corresponding to sample $i$, while $F$ is the space of regression trees. To learn the set of functions, the algorithm minimises the following regularised objective function (${obj}$)

$${obj}=\mathop{\sum }\limits_{i=1}^{n}L\left({y}_{i},{\hat{y}}_{i}\right)+\mathop{\sum }\limits_{k=1}^{K}\varOmega \left({f}_{k}\right)$$

(14)

Here, $n$ is the number of samples, ${y}_{i}$ is the actual value of the i-th target, while ${\hat{y}}_{i}$ is the predicted value of the i-th target. The term $L\left({y}_{i},{\hat{y}}_{i}\right)$ represents the training loss function, which quantifies the discrepancies between predictions and data points. Lastly, $\varOmega \left({f}_{k}\right)$ is the regularization term that controls the complexity of the model to prevent overfitting¹¹⁸, and is defined as

$$\varOmega \left({f}_{k}\right)=\gamma T+\frac{1}{2}\lambda \mathop{\sum }\limits_{j=1}^{T}{\omega }_{j}^{2}$$

(15)

where, $T$ represents the number of leaves, and ${\omega }_{j}$ is the score of the j-th leaf. The coefficient $\gamma$ denotes the minimum loss reduction required to split a new leaf, and $\lambda$ is a regularization coefficient.

Bayesian hyperparameter optimisation

The effective implementation of ML models is a multifaceted and demanding process that extends beyond selecting an adequate algorithm; it also involves hyperparameter optimisation to fine-tune the model’s configuration. This process is essential for ensemble models, such as XGBoost and DL algorithms, where hyperparameters set the training conditions and significantly impact model performance and adaptability^161,162.

However, the diverse types of hyperparameters, which include continuous, discrete, and conditional values, make traditional optimisation strategies (e.g., grid and random search) less effective due to their inability to fully navigate the configuration space^163,164. To overcome these problems, BHO offers a sophisticated solution. This approach constructs a probabilistic surrogate model of an objective function, leveraging information from previous evaluations to make informed decisions about which hyperparameters to explore next^165,166. Thus, the surrogate model facilitates understanding the relationship between hyperparameters and model performance¹⁶⁷.

Moreover, BHO has proven to effectively determine optimal hyperparameters, especially for high-dimensional problems^165,168. In this study, we employ the tree-structured Parzen estimator (TPE) approach for implementing BHO; thus, eliminating the need for predefined initial values or training datasets¹⁶⁹. The TPE algorithm initiates by randomly exploring the parameter space. Subsequently, it classifies the sampled parameters based on their performance according to a predetermined cost function. The TPE algorithm classifies the hyperparameters that yield the most effective outcomes into one group, and allocates the rest to a second group. This classification allows for modelling the likelihood of parameter effectiveness¹⁷⁰. The main objective is then to identify a set of hyperparameters that probabilistically belong to the first category. Thus, the expected improvement (EI) per iteration is ref. ¹⁷¹

$${EI}=\frac{l(x)}{g(x)}$$

(16)

where $l(x)$ and $g(x)$ are the probability in the first and second groups, respectively.

Shapley additive explanation

The SHAP technique utilises a game-theoretic framework to interpret predictions from any ML model¹¹⁶. Based on the Shapley value concept from cooperative game theory, SHAP assigns a fair contribution value to each feature of a data instance, analogous to players in a game¹⁷². These values, referred to as Shapley values, distribute the prediction outcome among the features. In the context of SHAP, the model prediction is then expressed as ref. ¹⁷³

$$f({\rm{x}})=g\left({{\rm{z}}}^{{\prime} }\right)={\phi }_{0}+\mathop{\sum }\limits_{i=1}^{M}{\phi }_{i}{z}_{i}^{{\prime} }$$

(17)

wherein $f({\rm{x}})$ is the original model output and $g\left({{\rm{z}}}^{{\prime} }\right)$ represents the SHAP explanation model. The base value ${\phi }_{0}$ represents the prediction with no features present, while ${\phi }_{i}$ corresponds to the SHAP value for the i-th feature, measuring the feature contribution to the difference between the actual prediction and the base value. The term ${z}_{i}^{{\prime} }\in \left\{\mathrm{0,1}\right\}$ is a binary variable that indicates whether the i-th feature is present (1) or absent (0), while M is the total number of features. Thereby, SHAP represents the prediction as a sum of binary components. In this work, we employ the SHAP scheme for tree-based ensemble models, commonly referred to as TreeSHAP¹⁷⁴.

Bayesian networks

BNs provide a graphical framework that facilitates the representation and comprehensive analysis of uncertainty, as well as the interdependencies among multiple variables¹²⁵. Fundamentally, BNs are a compact representation of a multivariate statistical distribution function, which efficiently encode the joint probability distribution (JPD) of a set of random variables, $X=\left\{{x}_{1},{x}_{2},...,{x}_{n}\right\}$, through conditional independence statements, conditional functions and, principally, conditional probability matrices (CPMs)^175,176. From this foundation, BNs permit the integration of various sources of information, coupling both probabilistic and deterministic models seamlessly⁴⁰. Additionally, BNs can effectively be applied to obtain reliable inferences even when the data is ambiguous or incomplete¹⁷⁷.

A BN consists of two primary components: a directed acyclic graph (DAG) where nodes correspond to the random variables $X$, and a set of directed arcs (arrows) depicting the probabilistic dependencies between these variables. Accompanying the DAG are the associated CPMs for each node, specifying quantitatively the effects of dependencies among variables¹⁷⁸. Via the concept of causal independence (i.e., d-separation), the JPD of all variables in $X$ is given by the product of conditional probabilities, as follows¹⁷⁹

$$\left({x}_{1},{x}_{2},...,{x}_{n}\right)=\mathop{\prod }\limits_{i=1}^{n}P\left({x}_{i}{|pa}\left({x}_{i}\right)\right)$$

(18)

where ${pa}\left({x}_{i}\right)$ denotes the set of parent variables of ${x}_{i}$, and $P\left({x}_{i}{|pa}\left({x}_{i}\right)\right)$ is the CPM for ${x}_{i}$. Moreover, BNs can perform diagnostic analyses (also referred to as backward analyses) through a variety of inference techniques based on Bayes’ theorem in the form

$$P\left(X|E\right)=\frac{P\left({E|X}\right)P\left(X\right)}{P\left(E\right)}=\frac{P\left(E,X\right)}{{\sum }_{{X}^{{\prime} }}P\left(E,{X}^{{\prime} }\right)}$$

(19)

here, $P\left(X|E\right)$ is the posterior probability based on the obtained evidence ($E$), $P\left(X\right)$ denotes the prior probability, $P\left(E,X\right)$ represents the conditional probability (assuming that X is true), and $P\left(E\right)$ is the likelihood (also known as expectedness) that the evidence will be observed. Specifically, $P\left(E\right)$ is calculated via the sum ${\sum }_{{X}^{{\prime} }}P\left(E,{X}^{{\prime} }\right)$, where $P\left(E,{X}^{{\prime} }\right)$ represents the joint probability of the evidence E and a possible hypothesis state ${X}^{{\prime} }$, summed over all possible states ${X}^{{\prime} }$.

Data availability

Data will be made available on reasonable request.

Code availability

The underlying codes for this study are not publicly available, but may be made available to qualified researchers on reasonable request from the corresponding author.

References

Iannuzzi, M., Barnoush, A. & Johnsen, R. Materials and corrosion trends in offshore and subsea oil and gas production. NPJ Mater. Degrad. 1, 2 (2017).
Article Google Scholar
Perez, T. E. Corrosion in the oil and gas industry: an increasing challenge for materials. JOM 65, 1033–1042 (2013).
Article CAS Google Scholar
Raja, V. S. & Shoji, T. Stress Corrosion Cracking: Theory and Practice (Elsevier Science, 2011).
Rhodes, P. R., Skogsberg, L. A. & Tuttle, R. N. Pushing the limits of metals in corrosive oil and gas well environments. Corrosion 63, 63–100 (2007).
Article CAS Google Scholar
Rhodes, P. R. Environment-assisted cracking of corrosion-resistant alloys in oil and gas production environments: a review. Corrosion 57, 923–966 (2001).
Article CAS Google Scholar
Smith, L. & Craig, B. D. Practical corrosion control measures for elemental sulfur containing environments. In Corrosion 2005. NACE-05646 (NACE, Houston, Texas, 2005).
Andres, B., Mizukami, A. & Pimenta, G. Corrosion resistant alloys test protocol development and results for an ultra sour reservoir containing elemental sulphur. In Abu Dhabi International Petroleum Exhibition & Conference. SPE-203337-MS (NACE, Abu Dhabi, UAE, 2020).
International, N. Use of Corrosion-Resistant Alloys in Oilfield Environments. Report No. Technical Committee Report No. 1F192, rev. 2000, (2000).
Cao, L., Anderko, A., Gui, F. & Sridhar, N. Localized corrosion of corrosion resistant alloys in H₂S-containing environments. Corrosion 72, 636–654 (2016).
Article CAS Google Scholar
Sridhar, N. Advanced materials in oil and gas production. DNV GL Strateg. Res. Innov. https://www.dnv.com/publications/advanced-materials-in-oil-and-gas-production-14410/ (2014).
Migahed, M. A. & Nassar, I. F. Corrosion inhibition of tubing steel during acidization of oil and gas wells. Electrochim. Acta 53, 2877–2882 (2008).
Article CAS Google Scholar
Jones, R. H. Stress Corrosion Cracking: Materials Performance and Evaluation (ASM International, 2017).
Rebak, R. B. Industrial Experience on the Caustic Cracking of Stainless Steels and Nickel Alloys - A Review. In Corrosion 2006. (AMPP, San Diego, CA, 2006)
Cheng, Y. F. Stress Corrosion Cracking of Pipelines (Wiley, 2013).
Wilhelm, S. M. & Kane, R. D. Selection of materials for sour service in petroleum production. J. Pet. Technol. 38, 1051–1061 (1986).
Article CAS Google Scholar
Sridhar, N., Thodla, R., Gui, F., Cao, L. & Anderko, A. Corrosion-resistant alloy testing and selection for oil and gas production. Corros. Eng. Sci. Technol. 53, 75–89 (2017).
Article Google Scholar
Smith, L. Control of corrosion in oil and gas production tubing. Br. Corros. J. 34, 247–253 (1999).
Article CAS Google Scholar
Shah, M., Ayob, M. T. M., Yaakob, N., Embong, Z. & Othman, N. K. Comparative corrosion behaviour of austenitic 316L and duplex 2205 stainless steels: microstructure and property evolution at highly partial pressure of H₂S. Corros. Eng. Sci. Technol. 57, 15–31 (2022).
CAS Google Scholar
El-Sherik, A. Trends in Oil and Gas Corrosion Research and Technologies: Production and Transmission (Woodhead Publishing, 2017).
Klenam, D. E. P. et al. Corrosion resistant materials in high-pressure high-temperature oil wells: an overview and potential application of complex concentrated alloys. Eng. Fail. Anal. 157, 107920 (2024).
Article CAS Google Scholar
Miyasaka, A. & Ogawa, H. Corrosion performance and application limits of corrosion-resistant alloys in oilfield service. Corrosion 51, 239–247 (1995).
Article CAS Google Scholar
Francis, R. Duplex stainless steels: the versatile alloys. Corrosion 76, 500–510 (2019).
Article Google Scholar
Sedriks, A. J. Stress-corrosion cracking of stainless steels. In Stress-Corrosion Cracking: Materials Performance and Evaluation (ed. Jones, R. H.) 2nd ed, 95–134 (ASM International, 2017).
Gunn, R. Duplex Stainless Steels: Microstructure, Properties and Applications (Elsevier Science, 1997).
Francis, R. & Byrne, G. Duplex stainless steels-alloys for the 21st century. Metals 11, 836 (2021).
Article CAS Google Scholar
Zhang, W. et al. Fatigue failure mechanism of 2205 duplex stainless steel using the neutron diffraction and EBSD technologies. Int. J. Fatigue 159, 106828 (2022).
Article CAS Google Scholar
Francis, R. & Byrne, G. The erosion-corrosion limits of duplex stainless steels. Mater. Perform. 57, 44–47 (2018).
CAS Google Scholar
Kwok, C. T., Man, H. C. & Cheng, F. T. Cavitation erosion of duplex and super duplex stainless steels. Scr. Mater. 39, 1229–1236 (1998).
Article CAS Google Scholar
Kangas, P. & Chai, G. C. Use of advanced austenitic and duplex stainless steels for applications in oil & gas and process industry. Adv. Mater. Res. 794, 645–669 (2013).
Article Google Scholar
Papavinasam, S. Corrosion Control in the Oil and Gas Industry (Elsevier, 2013).
Milliams, D. E., Cottage, D. & Tuttle, R. N. ISO 15156/NACE MR0175 - A New International Standard for Metallic Materials for Use in Oil and Gas Production in Sour Environments. In Corrosion 2003. (San Diego, California, 2003).
Siegmund, G., Schmitt, G. & Kuhl, L. Unexpected Sour Cracking Resistance of Duplex and Superduplex Steels. In Corrosion 2016. 1−14 (AMPP, Vancouver, BC, 2016).
Coudreuse, L., Ligier, V., Audouard, J. P. & Soulignac, P. Lean Duplex Stainless Steel for Oil and Gas Applications. In Proceedings of the Corrosion 2003. (NACE, San Diego, California, 2003).
Barteri, M., Mancia, F., Tamba, A. & Montagna, G. Engineering diagrams and sulphide stress corrosion cracking of duplex stainless steels in deep sour well environment. Corros. Sci. 27, 1239–1250 (1987).
Article CAS Google Scholar
Cassagne, T., Peultier, J., Le Manchet, S. & Duret, C. An Update on the Use of Duplex Stainless Steels in Sour Environments. In Proceedings of the Corrosion 2012. 1−15 (AMPP, Salt Lake City, UT, 2012).
Scully, J. R. & Balachandran, P. V. Future frontiers in corrosion science and engineering, part III: the next “leap ahead” in corrosion control may be enabled by data analytics and artificial intelligence. Corrosion 75, 1395–1397 (2019).
Article CAS Google Scholar
Coelho, L. B. et al. Reviewing machine learning of corrosion prediction in a data-oriented perspective. NPJ Mater. Degrad. 6, 8 (2022).
Article Google Scholar
Dong, C. et al. Integrated computation of corrosion: modelling, simulation and applications. Corros. Commun. 2, 8–23 (2021).
Article Google Scholar
Liu, X. et al. Toward the multiscale nature of stress corrosion cracking. Nucl. Eng. Technol. 50, 1–17 (2018).
Article Google Scholar
Taylor, C. D. Corrosion informatics: an integrated approach to modelling corrosion. Corros. Eng. Sci. Technol. 50, 490–508 (2015).
Article Google Scholar
Taylor, C. D. & Rossi, M. L. Multiphysics modeling of the role of iodine in environmentally assisted cracking of zirconium via pellet-clad interaction. Corrosion 72, 978–988 (2016).
Article CAS Google Scholar
Li, Z., Lu, Y. & Wang, X. Modeling of stress corrosion cracking growth rates for key structural materials of nuclear power plant. J. Mater. Sci. 55, 439–463 (2020).
Article CAS Google Scholar
Taylor, C. D. Atomistic modeling of corrosion events at the interface between a metal and its environment. Int. J. Corros. 2012, 204640 (2012).
Article Google Scholar
Ke, H. & Taylor, C. D. Density functional theory: an essential partner in the integrated computational materials engineering approach to corrosion. Corrosion 75, 708–726 (2019).
Article CAS Google Scholar
Chandra, S., Kumar, N. N., Samal, M. K., Chavan, V. M. & Patel, R. J. Molecular dynamics simulations of crack growth behavior in Al in the presence of vacancies. Comput. Mater. Sci. 117, 518–526 (2016).
Article CAS Google Scholar
Das, N. K., Tirtom, I. & Shoji, T. A multiscale modelling study of ni–cr crack tip initial stage oxidation at different stress intensities. Mater. Chem. Phys. 122, 336–342 (2010).
Article CAS Google Scholar
Anderko, A., Sridhar, N. & Dunn, D. S. A general model for the repassivation potential as a function of multiple aqueous solution species. Corros. Sci. 46, 1583–1612 (2004).
Article CAS Google Scholar
Anderko, A., Gui, F., Cao, L., Sridhar, N. & Engelhardt, G. R. Modeling localized corrosion of corrosion-resistant alloys in oil and gas production environments: Part I. Repassivation potential. Corrosion 71, 1197–1212 (2015).
Article CAS Google Scholar
Anderko, A., Cao, L., Gui, F., Sridhar, N. & Engelhardt, G. R. Modeling localized corrosion of corrosion-resistant alloys in oil and gas production environments: Part II. Corrosion potential. Corrosion 73, 634–647 (2016).
Article Google Scholar
Katona, R. M., Burns, J. T., Schaller, R. F. & Kelly, R. G. Insights from electrochemical crack tip modeling of atmospheric stress corrosion cracking. Corros. Sci. 209, 110756 (2022).
Article CAS Google Scholar
Gong, K., Wu, M., Liu, X. & Liu, G. Nucleation and propagation of stress corrosion cracks: modeling by cellular automata and finite element analysis. Mater. Today Commun. 33, 104886 (2022).
Article CAS Google Scholar
Martínez-Pañeda, E. Phase-field simulations opening new horizons in corrosion research. MRS Bull. 49, 603–612 (2024).
Article Google Scholar
Mai, W. & Soghrati, S. A phase field model for simulating the stress corrosion cracking initiated from pits. Corros. Sci. 125, 87–98 (2017).
Article CAS Google Scholar
Cui, C., Ma, R. & Paneda, E. M. A phase field formulation for dissolution-driven stress corrosion cracking. J. Mech. Phys. Solids 147, 104254 (2021).
Article Google Scholar
Gerard, A. Y., Lutton, K., Lucente, A., Frankel, G. S. & Scully, J. R. Progress in understanding the origins of excellent corrosion resistance in metallic alloys: from binary polycrystalline alloys to metallic glasses and high entropy alloys. Corrosion 76, 485–499 (2020).
Article CAS Google Scholar
Keyes, D. E. et al. Multiphysics simulations: challenges and opportunities. Int. J. High. Perform. Comput. Appl. 27, 4–83 (2013).
Article Google Scholar
Nash, W., Zheng, L. & Birbilis, N. Deep learning corrosion detection with confidence. NPJ Mater. Degrad. 6, 26 (2022).
Article Google Scholar
Munawar, H. S. et al. Civil infrastructure damage and corrosion detection: an application of machine learning. Buildings 12, 156 (2022).
Article Google Scholar
Coelho, L. B. et al. Estimating pitting descriptors of 316 L stainless steel by machine learning and statistical analysis. NPJ Mater. Degrad. 7, 82 (2023).
Article CAS Google Scholar
Taylor, C. D. & Tossey, B. M. High temperature oxidation of corrosion resistant alloys from machine learning. NPJ Mater. Degrad. 5, 38 (2021).
Article CAS Google Scholar
Rovinelli, A., Sangid, M. D., Proudhon, H. & Ludwig, W. Using machine learning and a data-driven approach to identify the small fatigue crack driving force in polycrystalline materials. NPJ Comput. Mater. 4, 35 (2018).
Article Google Scholar
Allen, C. et al. Deep learning strategies for addressing issues with small datasets in 2D materials research: microbial corrosion. Front. Microbiol. 13 https://doi.org/10.3389/fmicb.2022.1059123 (2022).
Shabarchin, O. & Tesfamariam, S. Internal corrosion hazard assessment of oil & gas pipelines using Bayesian belief network model. J. Loss Prev. Process Ind. 40, 479–495 (2016).
Article Google Scholar
Koch, G., Ayello, F., Khare, V., Sridhar, N. & Moosavi, A. Corrosion threat assessment of crude oil flow lines using Bayesian network model. Corros. Eng. Sci. Technol. 50, 236–247 (2015).
Article CAS Google Scholar
Ayello, F., Alfano, T., Hill, D. & Sridhar, N. A Bayesian network based pipeline risk management. In Corrosion 2012, Vol. 14, 1−14 https://doi.org/10.5006/C2012-01123 (NACE International, Salt Lake City, Utah, 2012).
Dao, U. et al. A Bayesian approach to assess under-deposit corrosion in oil and gas pipelines. Process Saf. Environ. Prot. 176, 489–505 (2023).
Article CAS Google Scholar
Sridhar, N. Localized corrosion in seawater: a Bayesian network-based review. Corrosion 79, 268–283 (2022).
Article Google Scholar
Lorenz, K. & Medawar, G. Über das korrosionsverhalten austenitischer chrom-nickel-(molybdän-) stähle mit und ohne stickstoffzusatz unter besonderer berücksichtigung ihrer beanspruchbarkeit in chloridhaltigen lösungen. Thyssenforschung 1, 97–108 (1969).
Google Scholar
Taylor, C. et al. in Corrosion 2018 Vol. 13 (NACE International, Phoenix, Arizona, USA, 2018).
Harlow, D. G. & Wei, R. P. Probability modeling and material microstructure applied to corrosion and fatigue of aluminum and steel alloys. Eng. Fract. Mech. 76, 695–708 (2009).
Article Google Scholar
Kondo, Y. Prediction of fatigue crack initiation life based on pit growth. Corrosion 45, 7–11 (1989).
Article CAS Google Scholar
Sridhar, N. & Kappes, M. A. Probabilistic Assessment of Hydrogen Stress Cracking of Steels and CRA in Sour Environments. In CONFERENCE 2024. 1–15 (AMPP, New Orleans, LA, United States, 2024).
Alamri, A. H. Application of machine learning to stress corrosion cracking risk assessment. Egypt. J. Pet. 31, 11–21 (2022).
Article Google Scholar
Sarwar, U. et al. Enhancing pipeline integrity: a comprehensive review of deep learning-enabled finite element analysis for stress corrosion cracking prediction. Eng. Appl. Comput. Fluid Mech. 18, 2302906 (2024).
Google Scholar
Soomro, A. A. et al. Integrity assessment of corroded oil and gas pipelines using machine learning: a systematic review. Eng. Fail. Anal. 131, 105810 (2022).
Article CAS Google Scholar
Soomro, A. A. et al. A review on Bayesian modeling approach to quantify failure risk assessment of oil and gas pipelines due to corrosion. Int. J. Press. Vessels Pip. 200, 104841 (2022).
Article Google Scholar
Mathew, C. & Adu-Gyamfi, E. A review on AI-driven environmental-assisted stress corrosion cracking properties of conventional and advanced manufactured alloys. Corros. Eng., Sci. Technol. 60, 145–158 (2025).
Article CAS Google Scholar
O’Hagan, A. Expert knowledge elicitation: subjective but scientific. Am. Stat. 73, 69–81 (2019).
Article Google Scholar
Constantinou, A. C., Fenton, N. & Neil, M. Integrating expert knowledge with data in Bayesian networks: preserving data-driven expectations when the expert variables remain unobserved. Expert. Syst. Appl. 56, 197–208 (2016).
Article PubMed PubMed Central Google Scholar
Sridhar, N. Oil and gas production systems. In Bayesian Network Modeling of Corrosion (ed. Sridhar, N.) 185–223, https://doi.org/10.1007/978-3-031-56128-3_6 (Springer International Publishing, 2024).
Liu, M. et al. A review on pitting corrosion and environmentally assisted cracking on duplex stainless steel. Microstructures 3, 2023020 (2023).
CAS Google Scholar
Katona, R. M., Karasz, E. K. & Schaller, R. F. A review of the governing factors in pit-to-crack transitions of metallic structures. Corrosion 79, 72–96 (2022).
Article Google Scholar
Turnbull, A. Characterising the early stages of crack development in environment-assisted cracking. Corros. Eng. Sci. Technol. 52, 533–540 (2017).
Article CAS Google Scholar
Evans, C., Leiva-Garcia, R. & Akid, R. Strain evolution around corrosion pits under fatigue loading. Theor. Appl. Fract. Mech. 95, 253–260 (2018).
Article CAS Google Scholar
Xiang, L., Pan, J. & Chen, S. Analysis on the stress corrosion crack inception based on pit shape and size of the FV520B tensile specimen. Results Phys. 9, 463–470 (2018).
Article Google Scholar
Deng, B. et al. Critical pitting and repassivation temperatures for duplex stainless steel in chloride solutions. Electrochim. Acta 53, 5220–5225 (2008).
Article CAS Google Scholar
Francis, R. The Corrosion of Duplex Stainless Steels: A Practical Guide for Engineers https://doi.org/10.5006/37636 (NACE International, The Worldwide Corrosion, 2018).
Oberndorfer, M., Thayer, K. & Kästenbauer, M. Application limits of stainless steels in the petroleum industry. Mater. Corros. 55, 174–180 (2004).
Article Google Scholar
Prosek, T. et al. Low-temperature stress corrosion cracking of austenitic and duplex stainless steels under chloride deposits. Corrosion 70, 1052–1063 (2014).
Article Google Scholar
Haugan, E. B., Næss, M., Rodriguez, C. T., Johnsen, R. & Iannuzzi, M. Effect of tungsten on the pitting and crevice corrosion resistance of type 25Cr super duplex stainless steels. Corrosion 73, 53–67 (2016).
Article Google Scholar
Reyad, A. et al. Impact of minor changes in molybdenum content on the localized corrosion resistance of austenitic stainless steels. Mater. Corros. 76, 1169–1182 (2025).
Article CAS Google Scholar
Jargelius-Pettersson, R. F. A. Application of the pitting resistance equivalent concept to some highly alloyed austenitic stainless steels. Corrosion 54, 162–168 (1998).
Article CAS Google Scholar
Craig, B. Clarifying the applicability of PREN equations: a short focused review. Corrosion 77, 382–385 (2021).
Article Google Scholar
Garfias-Mesias, L. F. Understanding why PREN alone cannot be used to select duplex stainless steels. In Proceedings of the Corrosion 2015. (AMPP, 2015).
Kane, R. & Abayarathna, D. Use of Temperature Scan Testing for Evaluation of CRAs for Oilfield Environments (NACE International, Houston, TX (United States), 1995).
Ding, J., Lu, M., Zhang, L., Li, D. & Yu, Y. Stress corrosion cracking mechanism of UNS S31803 duplex stainless steel under high H2S-CO2 pressure with high Cl⁻ content. In Proceedings of the Corrosion 2013. 1−10 (AMPP, Orlando, FL, 2013).
Le Manchet, S., Fanica, A., Lojewski, C. & Cassagne, T. Corrosion Resistance of UNS S31803 Duplex Stainless Steel in Sour Environments. In Proceedings of the Corrosion 2014. (AMPP, San Antonio, Texas, USA, 2014).
Maldonado, J. G. & Skogsberg, J. W. Cracking susceptibility of duplex stainless steel at an intermediate temperature in the presence of H₂S containing environments. In Corrosion 2004. (AMPP, New Orleans, Louisiana, 2004).
Francis, R. & Byrne, G. The corrosion of duplex stainless steels in sour service. In Corrosion 1994. 1–1 (NACE, 1994).
Holmes, B., Mishael, S. & Sotoudeh, K. Stress Corrosion Cracking of a Duplex Stainless Steel. In Corrosion 2017, Vol. 10, 1−10 https://doi.org/10.5006/C2017-09113 (NACE International, New Orleans, Louisiana, USA, 2017).
Woolin & Maligas, M. N. Testing of Superduplex Stainless Steel for Sour Service. In Proceedings of the Corrosion 2003. 1−13 (AMPP, San Diego, CA, 2003).
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 27 (2019).
Article Google Scholar
Yoon, J., Jordon, J. & Schaar, M. Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning. 5689−5698 (PMLR, 2018).
Yoon, J., Jordon, J. & Schaar, M. v. d. Codebase for Generative Adversarial Imputation Networks (Gain) https://github.com/jsyoon0823/GAIN (2018).
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32 https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (2019).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631 (Association for Computing Machinery, Anchorage, AK, USA).
Kang, H. The prevention and handling of the missing data. Korean J. Anesthesiol. 64, 402–406 (2013).
Article PubMed PubMed Central Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer New York, 2009).
Han, J., Kamber, M. & Pei, J. Data Mining Concepts and Techniques. 3rd edn. (Elsevier, 2012).
Zhang, Y., Zhang, R. & Zhao, B. A systematic review of generative adversarial imputation network in missing data imputation. Neural Comput. Appl. 35, 19685–19705 (2023).
Article Google Scholar
Zhang, L., Yeasin, M., Lin, J., Havugimana, F. & Hu, X. Generative adversarial networks for imputing sparse learning performance. In Pattern Recognition (eds. Antonacopoulos, A. et al.) 381–396 https://doi.org/10.1007/978-3-031-78172-8_25 (Springer Nature Switzerland, 2025).
Dong, W. et al. Generative adversarial networks for imputing missing data for big data clinical research. BMC Med. Res. Methodol. 21, 78 (2021).
Article PubMed PubMed Central Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Kunapuli, G. Ensemble Methods for Machine Learning (Simon and Schuster, 2023).
Mienye, I. D. & Sun, Y. A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022).
Article Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural. Inf. Process. Syst. 30 https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (2017).
Wu, J. et al. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 17, 26–40 (2019).
Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 https://doi.org/10.1145/2939672.2939785 (Association for Computing Machinery, San Francisco, California, USA, 2016).
López, V., Fernández, A. & Herrera, F. On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci.257, 1–13 (2014).
Article Google Scholar
Thölke, P. et al. Class imbalance should not throw you off balance: choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage 277, 120253 (2023).
Article PubMed Google Scholar
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central Google Scholar
Henthorne, M. The slow strain rate stress corrosion cracking test—a 50 year retrospective. Corrosion 72, 1488–1518 (2016).
Article Google Scholar
Z, A. R., Bakhtiari, S., Aldrich, C., Calo, V. M. & Iannuzzi, M. XGBoost model for the quantitative assessment of stress corrosion cracking. NPJ Mater. Degrad. 8, 126 (2024).
Article Google Scholar
Friedman, N., Geiger, D. & Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997).
Article Google Scholar
Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).
Jiang, L., Cai, Z., Wang, D. & Zhang, H. Improving tree augmented naive bayes for class probability estimation. Knowl. -Based Syst. 26, 239–245 (2012).
Article CAS Google Scholar
Pearl, J. Direct and Indirect Effects. In Probabilistic and Causal Inference: The works of Judea Pearl, 373–392 https://doi.org/10.1145/3501714.3501736 (Association for Computing Machinery, 2022).
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951). 78.
Article Google Scholar
Murphy, K. P. Machine learning: A probabilistic perspective. 1096 (The MIT Press, London, UK, 2012).
Conrady, S. & Jouffe, L. Bayesian Networks and Bayesialab: A Practical Introduction for Researchers. Vol. 9 (Bayesia USA Franklin, TN, 2015).
Al Kharusi, A. et al. New Material Application Limits of Duplex Stainless Steel in Sour Service. In Proceedings of the Corrosion 2019. 1−9 (AMPP, Nashville, TN, 2019).
Francis, R. R., Byrne, G. G. & Warburton, G. The role of environmental and metallurgical variables on the resistance of duplex stainless steels to sulphide SCC. In Corrosion 1997 (AMPP, New Orleans, Louisiana, 1997).
Ayello, F., Jain, S., Sridhar, N. & Koch, G. H. Quantitive assessment of corrosion probability—a Bayesian network approach. Corrosion 70, 1128–1147 (2014).
Article Google Scholar
Tynell, M. Applicability range for a high-strength duplex stainless steel in deep sour oil and gas wells. J. Mater. Energy Syst. 5, 84–87 (1983).
Article CAS Google Scholar
Craig, B. D. Evaluation and application of highly alloyed materials for corrosive oil production. J. Mater. Energy Syst. 5, 53–58 (1983).
Article CAS Google Scholar
Tsuge, H., Tarutani, Y. & Kudo, T. Corrosion Resistance of Duplex Stainless Steel Weldments in Wet CO₂ and H₂S Environment. In The International Corrosion Forum Devoted Exclusively to The Protection and Performance of Materials, 1−14, https://doi.org/10.5006/C1986-86156 (NACE International, lbert Thomas Convention Center, Houston, Texas, 1986).
Audouard, J. & Verneau, M. Evaluation of the Corrosion Resistance of High Nitrogen Containing Stainless Steels in Chloride and H₂S/CO₂ Environments (NACE International, Houston, TX (United States), 1997).
Ueda, M. et al. Performance of High Corrosion Resistant Duplex Stainless Steel in Chloride and Sour Environments. In NACE Annual Conference and Corrosion Show USA 1993, 125 https://doi.org/10.5006/C1993-93125 (AMPP, USA, 1993).
Leyer, J., Sutter, P., Linne, C. & Gunaltun, Y. M. Influence of the test method on the SSC threshold stress of OCTG and line pipe steel grades. In Corrosion 2002 (AMPP, Denver, Colorado, 2002).
Oredsson, J. & Bernhardsson, S. The performance of high alloyed austenitic and duplex stainless steels in sour gas and oil environments. In Proceedings of the Corrosion 1982, 1−18 https://doi.org/10.5006/C1982-82126 (AMPP, Houston, TX, 1982).
Kobayashi, Y. et al. Full scale testing of duplex stainless steel pipes in simulated sour environments. In Corrosion 88, 1−11 https://doi.org/10.5006/C1988-88052 (NACE, Houston, TX, 1988).
Rhodes, P. R., Welch, G. A. & Abrego, L. Stress corrosion cracking susceptibility of duplex stainless steels in sour gas environments. J. Mater. Energy Syst. 5, 3–18 (1983).
Article CAS Google Scholar
Jiang, F. & Hirohata, M. A GAN-augmented corrosion prediction model for uncoated steel plates. Appl. Sci. 12, 4706 (2022).
Article CAS Google Scholar
Woldesellasse, H. & Tesfamariam, S. Data augmentation using conditional generative adversarial network (CGAN): application for prediction of corrosion pit depth and testing using neural network. J. Pipeline Sci. Eng. 3, 100091 (2023).
Article Google Scholar
Ramaswamy, V. P. & Szeider, S. Learning large {Bayesian} networks with expert constraints. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (eds Cussens, J. & Zhang, K.) Vol. 180. 1592–1601 (PMLR, 2022).
Vaniš, M., Lokaj, Z. & Šrotýř, M. A novel algorithm for merging Bayesian networks. Symmetry 15, 1461 (2023).
Article Google Scholar
Japkowicz, N. Assessment Metrics for Imbalanced Learning. In Imbalanced learning, 187–206 https://doi.org/10.1002/9781118646106.ch8 (2013).
García, S., Luengo, J. & Herrera, F. Dealing with missing values. In Data Preprocessing in data mining (eds García, S., Luengo, J. & Herrera, F.) 59–105 https://doi.org/10.1007/978-3-319-10247-4_4 (Springer International Publishing, 2015).
van Buuren, S. Flexible Imputation of Missing Data, 2nd edn. (CRC Press, 2018).
Nazabal, A., Olmos, P. M., Ghahramani, Z. & Valera, I. Handling incomplete heterogeneous data using VAEs. Pattern Recognit. 107, 107501 (2020).
Article Google Scholar
Boursalie, O., Samavi, R. & Doyle, T. E. Evaluation methodology for deep learning imputation models. EBM 247, 1972–1987 (2022).
CAS Google Scholar
Liu, M. et al. Handling missing values in healthcare data: a systematic review of deep learning-based imputation techniques. Artif. Intell. Med. 142, 102587 (2023).
Article PubMed Google Scholar
Sengupta, S. et al. A review of deep learning with special emphasis on architectures, applications and recent trends. Knowl. -Based Syst. 194, 105596 (2020).
Article Google Scholar
Chai, P., Hou, L., Zhang, G., Tushar, Q. & Zou, Y. Generative adversarial networks in construction applications. Autom. Constr. 159, 105265 (2024).
Article Google Scholar
Lee, J. A. et al. Influence of tensile properties on hole expansion ratio investigated using a generative adversarial imputation network with explainable artificial intelligence. J. Mater. Sci. 58, 4780–4794 (2023).
Article CAS Google Scholar
Kim, J., Tae, D. & Seok, J. A Survey of Missing Data Imputation Using Generative Adversarial Networks. In 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). 454–456 https://doi.org/10.1109/ICAIIC48513.2020.9065044 (2020)
Mukherjee, M. & Khushi, M. SMOTE-ENC: a novel SMOTE-based method to generate synthetic data for nominal and continuous features. Appl. Syst. Innov. 4, 18 (2021).
Article Google Scholar
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021).
Article Google Scholar
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 https://doi.org/10.1214/aos/1013203451(2001).
Wade, C. & Glynn, K. Hands-On Gradient Boosting with XGBoost and Scikit-Learn: Perform Accessible Machine Learning and Extreme Gradient Boosting with Python (Packt Publishing Ltd, 2020).
Yu, T. & Zhu, H. Hyper-parameter optimization: a review of algorithms and applications. Preprint at https://arxiv.org/abs/2003.05689 (2020).
Jin, H. Hyperparameter importance for machine learning algorithms. Preprint at https://arxiv.org/abs/2201.05132 (2022).
Agrawal, T. Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient (Springer, 2021).
Bergstra, J., Yamins, D. & Cox, D. D. No editor names are available; full details are as provided below. In 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), 1−8 https://doi.org/10.1109/IISA52424.2021.9555522 (Citeseer, 2021).
Malu, M., Dasarathy, G. & Spanias, A. Bayesian Optimization in High-Dimensional Spaces: A Brief Survey. In 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA). 1–8 https://doi.org/10.1109/IISA52424.2021.9555522 (2021)
Shekhar, S., Bansode, A. & Salim, A. in 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). 1–6 (IEEE).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016).
Article Google Scholar
Bergstra, J., Yamins, D. & Cox, D. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in science conference (eds van der Walt, S., Jarrod, M. & Huff, K.) 13−19 https://doi.org/10.25080/Majora-8b375195-003 (PMLR, 2013).
Watanabe, S. Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance. Preprint at https://arxiv.org/abs/2304.11127 (2023).
Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural. Inf. Process. Syst. 24 https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf (2011).
Falkner, S., Klein, A. & Hutter, F. in Proceedings of the 35th International Conference on Machine Learning Vol. 80 (eds. Dy J. & Krause A.) 1437–1446 (PMLR, Proceedings of Machine Learning Research, 2018).
Algaba, E., Fragnelli, V. & Sánchez-Soriano, J. Handbook of the Shapley Value (CRC Press, 2019).
Aas, K., Jullum, M. & Løland, A. Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artif. Intell. 298, 103502 (2021).
Article Google Scholar
Lundberg, S. M., Erion, G. G. & Lee, S.-I. Consistent individualized feature attribution for tree ensembles. Preprint at https://arxiv.org/abs/1802.03888 (2018).
Bielza, C. & Larrañaga, P. Bayesian networks in neuroscience: a survey. Front. Comput. Neurosci. 8 https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2014.00131 (2014).
Jensen, F. V. & Nielsen, T. D. Bayesian Networks and Decision Graphs. Vol. 2 (Springer, 2007).
Heckerman, D. A tutorial on learning with Bayesian networks. In Innovations in Bayesian Networks: Theory and Applications (ed. Jain, L. C.) Vol. 156, 33–82 https://doi.org/10.1007/978-3-540-85066-3_3 (Springer Berlin Heidelberg, 2008).
Fenton, N. & Neil, M. Risk Assessment and Decision Analysis with Bayesian Networks (CRCPress, 2018).
Pearl, J. Causality: Models, Reasoning, and Inference. 1st edn, Vol. 47 (Cambridge University Press, 2000).
Scoppio, L., Barteri, M. & Leali, C. Sulphide stress cracking resistance of superduplex stainless steels in oil & gas field simulated environments. In Proceedings of the Corrosion 1998. 1−11 (AMPP, San Diego, CA, 1998).
Mukai, S., Okamoto, H., Kudo, T. & Ikeda, A. Corrosion behavior of 25 pct cr duplex stainless steel in CO₂-H₂S-Cl⁻ environments. J. Mater. Energy Syst. 5, 59–66 (1983).
Article CAS Google Scholar
Desestret, A. Special stainless steels for application in natural sour gas exploitation systems resistance to stress cracking in environmentscontaining chlorides and H2S of austenitic and austeno-ferritic grades. In Corrosion 1995, 1−24 https://doi.org/10.5006/C1985-85229 (NACE, Houston, TX, 1986).
Eriksson, H., Norberg, P. & Bernhardsson, S. Properties of and experience with welded joints in stainless alloys. In Corrosion 87, 1−10 https://doi.org/10.5006/C1987-87308 (NACE, San Francisco, California, 1987).
Barteri, M., Mancia, F., Tamba, A. & Bruno, R. Microstructural study and corrosion performance of duplex and superaustenitic steels in sour well environment. Corrosion 43, 518–525 (1987).
Article Google Scholar
Crolet, J. L. & Bonis, M. R. Evaluation of the resistance of some highly alloyed stainless steels to stress corrosion cracking in hot chloride solutions underhigh pressures of CO₂ and H₂S. In Proceedings of the Corrosion 1985, 1−14 https://doi.org/10.5006/C1985-85232 (AMPP, Boston, MA, 1986).
Tamaki, K., Yasuda, K., Kimura, M., Kawasaki, H. & Uegaki, T. Optimizing Welding Condition for Excellent Corrosion Resistance in Duplex Stainless Steel Linepipe. Special Issue on Steel Pipe. 10−16 (1988).
Eriksson, H., Norberg, P. & Bernhardsson, S. No editors were reported. It seems that this reference was repeated with Reference 184 In Corrosion 1987. 1-10 https://doi.org/10.5006/C1987-87308 (NACE, Moscone Center/San Francisco, California, 1987).

Download references

Acknowledgements

This project was financially supported by Shell (Technology Centre, Bangalore) and Curtin University.

Author information

Authors and Affiliations

Curtin Corrosion Centre, Faculty of Science and Engineering, Curtin University, Bentley, WA, Australia
Abraham Rojas Zuniga, Sam Bakhtiari & Mariano Iannuzzi
Western Australian School of Mines: Minerals, Energy and Chemical Engineering, Faculty of Science and Engineering, Curtin University, Bentley, WA, Australia
Chris Aldrich
Computing and Mathematical Sciences, Faculty of Science and Engineering, Curtin University, Bentley, WA, Australia
Victor M. Calo

Authors

Abraham Rojas Zuniga
View author publications
Search author on:PubMed Google Scholar
Sam Bakhtiari
View author publications
Search author on:PubMed Google Scholar
Chris Aldrich
View author publications
Search author on:PubMed Google Scholar
Victor M. Calo
View author publications
Search author on:PubMed Google Scholar
Mariano Iannuzzi
View author publications
Search author on:PubMed Google Scholar

Contributions

A.R.Z. conducted the investigation, performed the formal data analysis, developed the methodology, conducted the model training and design, model validation, inference analyses, and wrote the original draft of the manuscript. S.B. contributed to visualisation, assisted with model training and design, and provided critical revisions during the review and editing of the manuscript. V.M.C. supervised the methodology and the application of machine learning algorithms, probabilistic modelling design, and inference analyses and contributed to the manuscript’s review and editing. C.A. supervised the project, managed its administration, and contributed to the review and editing of the manuscript. M.I. conceptualised the study, secured funding, supervised the project, and contributed to project administration and the review and editing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Abraham Rojas Zuniga.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Rojas Zuniga, A., Bakhtiari, S., Aldrich, C. et al. Predicting stress corrosion cracking in downhole environments: a Bayesian network approach for duplex stainless steels. npj Mater Degrad 9, 122 (2025). https://doi.org/10.1038/s41529-025-00646-y

Download citation

Received: 18 January 2025
Accepted: 15 July 2025
Published: 08 October 2025
DOI: https://doi.org/10.1038/s41529-025-00646-y

Subjects

Abstract

Similar content being viewed by others

Systemic and dynamic risk analysis of drilling construction based on bayesian network and system dynamics model

Electrochemical behavior of 2205 duplex stainless steel in simulated solution containing high concentration Cl− and saturated CO2 at different temperatures

Data-driven pitting evolution prediction for corrosion-resistant alloys by time-series analysis

Introduction

Results and discussion

Data preparation

Multiple data imputation

Dataset balancing

XGBoost modelling and SHAP analysis

BN model

Sensitivity analysis

Evaluation of pitting corrosion and SCC

SCC risks for DSSs

SCC boundaries for DSSs

Final remarks

Methods

Workflow for BN model development

Generative adversarial imputation nets

Synthetic minority over-sampling

Extreme gradient boosting

Bayesian hyperparameter optimisation

Shapley additive explanation

Bayesian networks

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Electrochemical behavior of 2205 duplex stainless steel in simulated solution containing high concentration Cl⁻ and saturated CO₂ at different temperatures