Introduction

In the petroleum industry, engineered tools and structures integral to production systems operate under some of the most severe industrial environments1. This distinction is further exemplified through the working conditions in technically challenging fields, such as high-pressure and high-temperature (HPHT) reservoirs, deep and ultra-deep fields, and remote Arctic locations. In these settings, pressure and temperature conditions reach upwards of 160 MPa and 300 °C, respectively, at depths exceeding 10,000 m1,2.

The presence of corrosive agents, predominantly chloride (Cl), carbon dioxide (CO2) and hydrogen sulphide (H2S), further escalates the technical difficulties of recovering crude oil and natural gas3,4. Moreover, the severity of downhole conditions can be aggravated by the presence of elemental sulphur (S0), which is likely to occur if H2S concentration exceeds a 5–10% threshold within the gas phase5,6,7. In the same vein, deep reservoirs may contain traces of organic acids (mainly acetic acid) and a variety of other contaminants, including liquid metals3,8,9. Furthermore, the use of completion fluids (typically rich in Cl) and highly acidic stimulation chemicals (e.g., hydrochloric and hydrofluoric acids), in conjunction with enhanced recovery methods, can considerably affect the corrosivity of the environment throughout the life cycle of production wells10,11.

Given the aggressive characteristics of downhole environments, addressing degradation mechanisms associated with environmentally assisted cracking (EAC) becomes imperative, chiefly stress corrosion cracking (SCC)3,5. Fundamentally, SCC is an anodic form of EAC, derived from the synergy between mechanical stresses (residual or applied) and a reactive environment12,13. SCC constitutes a prevalent threat to the integrity of hydrocarbon production equipment, as it can markedly accelerate the mechanical failure of exposed components14. Therefore, exhaustive protocols of material selection, extending from well testing to the completion stage, are critical to mitigating the risks posed by SCC15,16.

Selection criteria for metallic alloys focus primarily on robust structural integrity, superior corrosion resistance, affordability, and mechanical properties that satisfy operating conditions17,18. Thus, commonly employed materials such as carbon and high-strength low-alloy steels, while adequate as casing materials, generally do not meet the requirements for other downhole applications due to their limited corrosion resistance10,16. Instead, corrosion-resistant alloys (CRAs) are preferred for tubing, liners, and critically exposed components (e.g., tubing hangers, wellhead flow crosses and control valves), given their superior resistance to aggressive forms of corrosion such as pitting4,19. For the most severe environments, CRAs employed include super austenitic stainless steels, nickel-based and nickel-cobalt alloys, as well as titanium alloys20. However, the most commonly used CRAs are 13–22 wt% chromium (Cr) alloys, such as martensitic stainless steels and, in more demanding environments, duplex stainless steels (DSSs)20,21.

Due to their high resistance to localised corrosion and SCC, DSS alloys are used in sectors other than oil and gas, such as chemical processing, power production, and desalination22. In terms of chemical composition, duplex grades contain from 19 to 30 wt% Cr, and additions of nickel (Ni), molybdenum (Mo), nitrogen (N) and tungsten (W), with an optimally balanced austenite/ferrite phase ratio, approaching 50:5023,24. Thereby, DSS integrates the ferrite phase’s high strength with the austenite’s ductility and toughness25. Moreover, the high strength and hardness also endow DSSs with remarkable resistance to erosion, cavitation, and corrosion fatigue26,27,28. The chemical composition of DSS promotes the formation of a passive Cr-rich oxy-hydroxide layer. Consequently, DSS alloys maintain relatively low corrosion rates when exposed to both CO2 and H2S within oilfield settings at temperatures not exceeding 100 °C29. Regarding the mechanical resistance, the yield strength (\({\sigma }_{{YS}}\)) of DSS alloys ranges from 450 MPa when solution-annealed to approximately 1100 MPa through cold working, rendering them suitable for both shallow and deep well applications5,30. Notably, the DSS family is regarded as a cost-effective alternative compared to coated carbon steels, high-resistance stainless steels and, in certain instances, Ni-based alloys25.

Notwithstanding these attributes, industry standards currently provide an incomplete representation of DSS alloys’ performance in oil and gas environments. Particularly, the standard governing the material selection of CRAs in H₂S-containing services, ISO 15156—Part 3 (ISO 15156−3)31, imposes strict limits concerning critical corrosion factors. These include partial pressures of H2S (pH2S) and CO2 (pCO2), solution pH, temperature, and Cl concentration31. For DSSs, the standard prescribes an operational threshold for downhole tubular components based primarily on pH2S, ranging from 0.3 to 3.0 psi (~ 0.02–0.2 bar). However, such operational boundaries are often perceived as overly conservative, and diverge from numerous studies indicating that DSS alloys can withstand higher H2S levels32,33,34.

A prominent example is the seminal review by Cassagne et al.35, which emphasised that the SCC susceptibility of DSSs extends beyond pH2S levels, involving temperature, Cl concentration and pH as critical determinants. While this study is observational in nature, comparing data from the field and experimental work, it suggests that DSS alloys may resist approximately 2.0 bar pH2S. This tolerance depends on maintaining Cl concentrations below 10,000 ppm, a minimum pH of 4.5, at temperatures ranging from 80 to 100 °C.

While extensive literature has documented the SCC resistance of DSSs25,36,37,38,39,40, a comprehensive assessment of failure risks under service conditions remains elusive. Moreover, the disparity between empirical evidence and standardised limits highlights, in principle, the necessity of more integrative frameworks to inform material selection and application in production systems36,37. Addressing this need, the present study introduces a novel data-driven model, based on Bayesian networks (BNs), to determine the SCC risk for DSS alloys in downhole environments. Fundamentally, BNs enable the representation of conditional dependencies among variables via a directed graphical structure. This study therefore employs BN modelling to encode the direct dependencies of critical determinants driving SCC, such as corrosive agents, temperature, and stress state.

Despite consolidating base practices and lessons learned, industry guidelines and international standards such as ISO 15156-3 do not address the multifactorial nature of SCC and EAC more broadly38,39. The inherent complexity of EAC phenomena contributes to this shortfall. EAC processes are characterised by high variability and dimensionality, compounded by uncertainties related to boundary conditions and material-environment interactions40. As a result, SCC modelling remains challenging, as it requires coupling chemical, electrochemical, and mechanical effects within a singular model41.

Multiphysics approaches have been used in the hierarchical modelling of material degradation phenomena, including SCC42. These first-principle methodologies, while often applied in isolation, have been instrumental in modelling electronic structures, atomic-scale interactions, and stress-induced damage. For example, atomistic modelling, grounded in density functional theory (DFT) and solid-state physics, is extensively utilised to calculate electronic structures and predict atomic-level reaction pathways43. The DFT application offers profound insights into the micro-mechanisms driving SCC, including localised corrosion and hydrogen embrittlement (HE)44. Molecular dynamics simulations complement these efforts by modelling the dynamic behaviour of atoms, including bond breaking and formation, as well as the effect of stress fields around the crack tip45,46. Thermodynamic models have contributed to predicting the structure and composition of passive films, while assessing the aggressiveness of corrosive species in HPHT conditions47,48,49. These models focus on estimating both corrosion and repassivation potentials, which are crucial for evaluating the onset of localised corrosion that precedes SCC initiation.

The finite element method (FEM) has been widely employed to determine localised corrosion rates and structural damage such as cracks and fractures. FEM achieves this by coupling electrochemical and mass transport models with continuum-scale structural mechanics analyses50,51. In addition, phase-field models bridge the gap between atomistic and continuum models52. These mesoscale modelling techniques enable the simulation of phase transformations with evolving geometries. Such capability has facilitated a detailed examination of cracking initiated from corrosion pits53,54.

Despite the advances offered by current physic-based models, a unified framework coupling the relevant temporal and spatial scales involved in SCC has yet to be realised39,55. Furthermore, first-principles models frequently rely on classical theories, empirical correlations, or idealised assumptions grounded in measurable parameters and controlled boundary conditions56. These characteristics can ultimately restrict their predictive capacity in real-world environments.

Alternatively, data-centric modelling offers significant support in analysing SCC failures. A key advantage lies in utilising real data, which inherently captures the stochastic nature of corrosion-induced failures40. Recent studies in corrosion engineering prominently feature the integration of techniques in machine learning (ML) and artificial intelligence (AI)57,58,59,60,61,62. These data-driven approaches have led to significant progress in visual detection and classification of different corrosion patterns57,58, as well as the detailed analysis of chemical and electrochemical reactions59, solid-state processes60,61, and microbiologically induced degradation62.

In the oil and gas sector, BN modelling has allowed for holistic analyses of relevant degradation phenomena, including uniform material loss, localised corrosion, erosion, and under-deposit microbial corrosion63,64,65,66. The proposed BN models yielded insightful information regarding the interactions between qualitative and quantitative factors affecting corrosion. These encompass hydrodynamic conditions (e.g., liquid hold up, partial pressures, velocity), temperature, medium pH, soil conditions, pipe characteristics (e.g., surface condition, types of coating, cathodic protection), and the presence of organic decay products.

BN modelling has been purposefully tailored to investigate EAC mechanisms. For instance, Sridhar et al.67 proposed a BN model addressing localised corrosion risk (i.e., pitting and crevice) for Ni-Fe-Cr-Mo-N alloys in seawater. It estimates localised corrosion probabilities by integrating key input variables, including Cl and sulphate (SO42) concentrations, temperature conditions, crevice tightness, and alloy chemistry. This BN application also incorporates established repassivation models47, and a unifying parameter conceptually similar to pitting resistance equivalent (PRE)68, which quantifies alloying effects on corrosion resistance.

Focusing on a different failure mode, Taylor et al.69 employed a BN model to assess corrosion fatigue initiation in high-strength Al alloys (e.g., AA 7075, AA 2070). Inputs for this model included cathodic current density and intermetallic particle size, sourced from microstructural analysis and corrosion literature. Target nodes within the network are associated with pitting kinetics rates and pit-to-crack transition, which were parametrised using theoretical models from Harlow and Wei70 and the Kondo criterion71, respectively.

A BN model has recently been developed to predict hydrogen stress cracking (HSC)72. This model facilitates probabilistic assessment of HSC-related damage, which includes uniform corrosion, pitting, grooving, sulphide stress cracking, and HE. Discrete and continuous nodes are incorporated into the BN model, representing a wide range of variables, such as metallurgical characteristics (e.g., \({\sigma }_{{YS}}\), microstructure), environmental chemistry (e.g., H2S, pH, Cl), mechanical loadings (e.g., strain rate, stress intensity) and operational conditions (e.g., cathodic protection, galvanic effects). Critical factors in the model include electrochemical potentials and the concentration of mobile hydrogen. However, conditional probabilities in the BN model are derived primarily from expert knowledge.

BNs present a compelling pathway for assessing SCC risks, especially in data-scarce environments prevalent in multiple real-world industrial applications, including oil and gas systems. In this regard, the implementation of data-centric techniques in SCC studies is still underdeveloped, as existing efforts are hindered by insufficient data and the lack of well-defined implementation strategies73,74,75,76,77. In fact, a significant proportion of BN applications in corrosion engineering largely depend on expert knowledge for the specification of model parameters and network architectures76. This reliance can nonetheless introduce subjectivity and cognitive biases, such as anchoring, availability bias, and overconfidence78,79. Nevertheless, Sridhar et al.16,80 emphasise the ability of BNs to combine diverse information sources, both empirical and expert-derived, thereby enabling probabilistic reasoning of complex systems and associated uncertainties.

Therefore, this work leverages the flexibility of BNs to synthesise a wide array of data within a computationally tractable framework. To construct our BN model, we compiled several data sources (i.e., industry standards, technical guidelines, and scientific papers), all pertinent to SCC testing of DSS alloys under sour conditions. Advanced preprocessing techniques, such as multiple data imputation and synthetic minority oversampling, were applied to prepare the dataset for analysis. The BN structural design benefited from other ML techniques, namely extreme gradient boosting (XGBoost) and Shapley additive explanations (SHAP), which optimised its predictive accuracy. The BN model from this research provides insights regarding the interactions of the most critical factors leading to cracking, and ultimately aims to interrogate the boundaries within which DSS alloys can effectively resist SCC in downhole settings.

To ensure clarity for readers, it is essential to establish a fundamental understanding of the factors influencing DSS vulnerability to SCC. Duplex alloys are distinguished by their exceptional corrosion resistance; however, their behaviour under SCC-inducing conditions remains a subject of extensive research. Of particular interest is the role of localised corrosion processes, either pitting or intergranular corrosion, in promoting the initiation of SCC81.

Although the underlying mechanisms of SCC are not thoroughly understood, it is hypothesised that initial crack incubates originate from pitting events12. These nucleate after the disruption of the passive film and tend to grow preferentially along slip planes or grain boundaries3. Therefore, the formation of pits is considered a precursor to SCC, as these localised surface attacks act as stress concentrators, potentially leading to cracking82,83. However, it is critical to recognise that not all corrosion pits transition into cracks. Instead, the pit-to-crack transition is a multi-component process, governed by more than mechanical stresses, metallurgical characteristics, and environmental dynamics; the morphological aspects of the pit itself (i.e., size, shape, and aspect ratio) also play a significant role82. Particularly, pits with shaper geometries lead to higher strain concentrations, which in turn increase the likelihood of evolving into cracks84,85.

In examining the localised corrosion behaviour of DSS, various investigations indicate that pitting susceptibility is significantly influenced by Cl content and temperature86,87. For example, experimental work in 1.0 M sodium chloride (NaCl) solutions has shown that standard DSS samples, containing 22 wt% Cr, typically undergo pitting above critical temperatures exceeding 60 °C87. Nonetheless, critical pitting temperatures (CPTs) can extend to 150 °C, at Cl concentrations as low as 100 ppm88. Interestingly, standard DSSs can exhibit pitting corrosion even at moderate temperatures around 30 °C. This occurs when they are exposed to high Cl levels (>159,00 ppm) in solutions with different ionic compounds, such as NaCl, calcium chloride (CaCl2) or magnesium chloride (MgCl2)89. Notably, the increase in Cr content enhances pitting resistance; for instance, DSS alloys with 25–28 wt% Cr (also referred to as super DSSs) exhibit an average CPT of around 90 °C86,87.

A standard parameter for quantifying localised corrosion resistance and SCC susceptibility of CRAs is the PRE, given by68

$${PRE}= \% {Cr}+3.3\left( \% {Mo}+0.5 \% W\right)+16 \% N$$
(1)

This metric empirically correlates higher levels of Cr, Mo, W and N with enhanced resistance to localised corrosion in chloride-rich environments90,91. Accordingly, lower PRE values indicate a greater susceptibility to pitting corrosion92. Since localised attacks are a precondition in SCC, PRE is then used to establish the operational limits of DSSs for oil and gas applications. As downhole tubular, ISO 15156-3 standard31 dictates that conventional DSSs, with PRE values between 30 and 40, are eligible for environments where pH2S does not exceed 0.3 psi (~0.02 bar). This applies irrespective of temperature, Cl concentration, and pH. Comparatively, super duplex grades, having PRE values from 40 to 45, can operate up to 3.0 psi (~0.2 bar) pH2S and a maximum Cl concentration of 120,000 ppm.

However, the publications by Craig93 and Garfias-Mesias94 highlight that PRE is insufficient for evaluating the corrosion resistance of DSS in oil and gas settings. Both authors argue that, whereas PRE accurately correlates with the CPT for single-phase austenitic stainless steels in oxidising conditions, its predictive value fails for DSSs due to their dual-phase structure, or when considering anoxic environments. An example of this is the study conducted by Kane and Abayarathna95. This investigation demonstrated that CPT for standard DSSs (PRE = 34) is limited to about 115 °C under sour conditions. Such observation contrasts sharply with the 200 °C achieved by Ni-based alloys N08825 (PRE = 31) and N06255 (PRE = 45), which have comparable or lower PRE values. In addition, Craig93 and Garfias-Mesias94 pointed out that PRE’s formulation neglects the combined effects that contribute to SCC, which involve mechanical stresses, corrosive species, and particularly, the influence of specific chemical constituents. Among these are impurities such as phosphorus (P), sulphur (S), as well as additions of manganese (Mn) and Ni; all of which can drastically alter the performance of DSS.

At downhole conditions, extensive research has demonstrated that DSS is prone to pitting, and subsequent cracking, in the presence of H2S at temperatures between 60 and 180 °C96,97,98. These investigations have explored a wide range of experimental conditions, including pH2S levels spanning from 0.1 to 1.0 bar, Cl concentrations up to 120,000 ppm, and pH values between 3.0 and 4.5, while DSS specimens have been subjected to constant loads equivalent to 90% \({\sigma }_{{YS}}\). Notably, the evidence from these investigations elucidates the synergistic influence of H2S and Cl in accelerating SCC. However, uncertainties persist regarding the conditions under which SCC is most likely to occur. This is further illustrated in Fig. 1a, b, which correlate pH2S and Cl concentration across a variety of tests on standard and super DSSs, respectively. Table 1 complements these figures by detailing the properties of the tested DSS specimens.

Fig. 1: SCC test outcomes for DSS alloys as a function of H2S partial pressure and Cl concentrations.
figure 1

a Standard DSS: Positive SCC instances are marked by red circles, while negative instances are blue circles. b Super DSS: Red diamonds indicate SCC occurrence, and blue diamonds signify no SCC. The dashed lines in both plots represent the safe operating limits for pH2S (i.e., 0.02 and 0.2 bar) for DSS alloys according to ISO 15156-3 standard31. The data points encompass a wide range of experimental parameters, including temperatures (25–200 °C), pH (2.7–5.4), CO2 partial pressure (pCO₂ ≤ 92 bar), and tensile stresses (30–160% \({\sigma }_{{YS}}\)).

Table 1 Summary of DSS alloys employed in SCC studies presented in Fig. 1

As seen in Fig. 1a, b, the data exhibit trends that associate increased cracking susceptibility with rising levels of H2S and Cl, although they do not provide clear-cut criteria for determining SCC boundaries. More importantly, Fig. 1 shows that both, standard and super DSSs, may resist SCC in conditions far exceeding those outlined in ISO 15156-331. The data in Fig. 1 cover a broad range of experimental conditions, extending beyond H2S and Cl concentrations. These include a temperature range from 25 to 200 °C, pH values between 2.7 and 5.4, pCO2 up to 92 bar, and tensile stresses from 30 to 160% of DSS’s nominal \({\sigma }_{{YS}}\). Based on these data, it has been observed that DSSs exhibit less propensity for cracking under high H2S levels, and low Cl concentrations, or when pH values are not strongly acidic; albeit the impact of mechanical stresses has yet to be quantified.

For example, the study published by Francis and Byrne99 involved C-ring tests with a brine containing 46,000 ppm Cl at 80 °C (pH ≈ 4.3). The tests varied pH2S from 0.125 to 0.375 bar within a CO2/H2S mixture, while stress levels were equivalent to the nominal \({\sigma }_{{YS}}\) of the DSS samples. Here, pitting and fine cracks were detected in standard DSS specimens starting from 0.25 bar pH2S, whereas super DSS samples showed no signs of SCC, even at 0.375 pH2S. Comparatively, Holmes et al.100 observed principally pitting attacks in various SCC tests at constant load, without consistent development of SCC. These tests employed only DSSs with 22 wt% Cr, which were subjected to tensile stresses at 90% \({\sigma }_{{YS}}\), while exposed to 0.35 bar pH2S and immersed in a highly concentrated solution with 100,000 ppm Cl (pH ≈ 4.5).

For super DSS, Woolin and Malingas101 found that more severe conditions are required to observe SCC failures. Here, authors conducted C-ring tests at a loading state equal to the nominal \({\sigma }_{{YS}}\) of the samples, in an environment comprising 20,200 ppm Cl and 2.0 bar pH2S (pH ≈ 3.5) at 85 °C. Interestingly, Seigmund et al.32 noted that standard and super DSS alloys can resist SCC while exposed to a range of 0.5–1.0 bar pH2S, and high Cl levels of around 45,000 ppm within a pH interval of 4.2–5.0. These experiments were undertaken under varied temperature conditions (i.e., from 28.5 to 180 °C) and maintaining a constant uniaxial loading for all DSS specimens (i.e., 90% \({\sigma }_{{YS}}\)). Despite the aggressive experimental settings, DSS tensile probes underwent only moderate pitting corrosion, with penetration depths in the order of 20–50 μm, although no cracking was observed.

The preceding discussion has emphasised the prevailing uncertainty in identifying the conditions that lead to SCC of DSS alloys in downhole settings. Addressing this problem necessitates a sophisticated approach due to the complex interplay among numerous contributing factors. To this end, we present a BN model that visualises the relationships among variables affecting SCC in DSS alloys. Drawing on literature data, this model leverages BN inference to manage SCC uncertainty and interrogate the viability of DSSs for oil and gas applications in a probabilistic manner.

Results and discussion

Data preparation

This research involved compiling a dataset of 2535 instances from localised corrosion and SCC experiments on DSS. The dataset incorporates diverse information sources, including standards, technical guidelines, and scientific and conference papers. The supplementary materials accompanying this study provide a detailed list of these sources.

The generated dataset includes the specifications of DSS specimens, detailing their mechanical properties, such as \({\sigma }_{{YS}}\), ultimate tensile strength (\({\sigma }_{{UTS}}\)), and elongation (\({\varepsilon }_{f}\)), as well as their chemical composition and PRE. The DSS dataset also contains experimental parameters relevant to SCC occurrence, including temperature, pH2S and pCO2, Cl concentration, medium pH, as well as applied stresses (\({\sigma }_{{app}}\)). Test outcomes in the dataset are categorical variables: Pitting Corrosion and SCC. Each outcome is binary, indicating either the presence (YES) or absence (NO) of the event. The stress ratio (\({\sigma }_{R}\)), defined as the ratio of \({\sigma }_{{app}}\) to \({\sigma }_{{YS}}\), was also determined to assess whether the macroscopic stress state of DSS samples was elastic (\({\sigma }_{R}\) < 1) or plastic (\({\sigma }_{R}\) > 1). Thus, \({\sigma }_{R}\) was calculated as follows:

$${\sigma }_{R}=\frac{{\sigma }_{{app}}}{{\sigma }_{{YS}}}$$
(2)

However, the heterogeneity of source materials compromises data uniformity by introducing missing data points, particularly related to experimental features and results. These inconsistencies are largely attributed to variations in experimental protocols, including differences in sample handling and testing methods; for example, experiments at constant deflection (e.g., C-ring, U-bend, and 4-point bend tests) and constant load (e.g., proof rings), as well as slow strain rate tests. Consequently, many studies reported only the incidence of SCC, with control variables limited to environmental settings (e.g., pH2S, pCO2, temperature, and Cl content). Conversely, measurements of pH, equivalent stresses, or the incidence of pitting corrosion were either inconsistently documented or not feasible. For a concise overview, Table 2 summarises the continuous attributes of the DSS dataset, while Table 3 describes the categorical variables regarding pitting corrosion and SCC.

Table 2 Descriptive statistics and missing data percentages for continuous variables in the DSS dataset
Table 3 Descriptive statistics and missing data percentages for categorical variables in the DSS dataset

As reported in Table 3, the pitting corrosion column in the dataset exhibits approximately 47.1% missing data points. Since pitting is a prerequisite for SCC, any instances where pitting corrosion was not observed are exclusively associated with cases where SCC did not occur. Therefore, positive cases of SCC invariably imply positive instances of pitting corrosion. Moreover, Table 3 reveals a significant class imbalance in the pitting corrosion column, where positive instances of pitting corrosion exceed the negative ones by a ratio of approximately 8.73 to 1. This is of particular concern, as data-driven models can be heavily biased towards the majority class, resulting then in poor performance on the minority class102.

Missing data and class imbalance can compromise the performance and generalisability of predictive models derived from the DSS dataset. To mitigate these data quality issues, data preprocessing methodologies were employed. Specifically, missing values were imputed using generative adversarial networks (GANs). Subsequently, the observed class imbalance was managed through the application of the synthetic minority over-sampling technique (SMOTE). The following sections will elaborate upon the strategies adopted for preparing the DSS dataset.

Multiple data imputation

In this work, generative adversarial imputation nets (GAIN)103 were employed to resolve the missing values in our dataset. The GAIN method implemented builds upon the source code provided by Yoon et al.104, which has been adapted to be compatible with the Python library PyTorch105. This adaptation also integrates Bayesian hyperparameter optimisation (BHO) via the Python library Optuna106, enabling fine-tuning of the GAIN settings to achieve optimal performance. The corresponding techniques, GAIN and BHO, are further described in the Methods” section.

The primary justification for employing data imputation in this work is, in principle, to ensure the completeness and robustness of the dataset. In data-centric models, missing data points can significantly impact their accuracy and reliability, leading to biased estimates and incorrect inferences about the relationships between variables107. The supplementary material of this investigation offers detailed insights on the hyper-parameterisation process, as well as a comprehensive analysis of the data imputations.

Before the imputation process, the dataset underwent feature normalisation. This rescaling method was specifically utilised for the GAIN imputation process. Here, we employed Min-Max scaling108, which transforms the minimum value (\({x}_{\min }\)) of each feature to 0 and the maximum value (\({x}_{\max })\) to 1. Hence, scaled features (\({x}_{{scaled}}\)) are determined as follows

$${x}_{{scaled}}=\frac{x-{x}_{\min }}{{x}_{\max }-{x}_{\min }}$$
(3)

Min-Max normalisation preserves the original distribution of the data and the inter-relationships between feature values109. In the GAIN algorithm, feature normalisation is crucial for stabilising the adversarial training process. Here, a generator network imputes missing values, while a discriminator component evaluates the imputation quality relative to the authentic observations110. Fundamentally, feature normalisation ensures that each feature exerts a proportional influence on the adversarial loss function, which guides the adversarial dynamics between the discriminator and the generator103.

Figure 2 illustrates the data imputation results using GAIN. Here, the histograms compare the frequency distributions between imputed (red bars) and observed (blue bars) values for features with missing data. Through an adversarial training process, the GAIN algorithm effectively approximates the conditional distribution of the missing instances given the observed data111,112. This results in the generation of statistically plausible data points that maintain the dataset’s structural and distributional integrity. Figure 2 exemplifies this outcome, where the close alignment between the imputed and real data distributions showcases GAIN’s accuracy in mimicking the actual underlying data distribution of the DSS dataset.

Fig. 2: Comparison of frequency distributions between observed data and imputed data.
figure 2

The histograms for each variable presenting missing data are illustrated before and after applying the GAIN algorithm. The close alignment between these distributions demonstrates GAIN’s effectiveness in replicating the underlying data characteristics in the dataset.

The GAIN imputation accuracy was also quantitatively assessed. Figure 3 shows the behaviour of the root mean squared error (RMSE) throughout the training iterations. For clarity, this metric quantifies the average error between imputed values and actual data. In this respect, the GAIN algorithm was adapted to compute RMSE for both training and testing subsets, allowing the model’s performance to be monitored throughout the training process until stabilisation. In this study, 20% of the data was withheld as a test set, and during each iteration, 10% of this subset was deliberately obscured to simulate the presence of missing data. Thus, the RMSE values obtained for the training and test sets were 0.11 ± 0.05 and 0.19 ± 0.02, respectively, indicating relatively low error rates.

Fig. 3: RMSE behaviour across GAIN training iterations.
figure 3

Throughout the training iterative process, the imputation accuracy of GAIN is monitored using RMSE for both training (blue line) and test (red line) subsets, where a proportion of 10% of their data is intermittently obscured to simulate missing data. The trends illustrate the GAIN stabilisation, achieving final RMSE values of 0.11 ± 0.05 for the training set and 0.19 ± 0.02 for the test set.

Dataset balancing

Following the multiple imputation procedure, a significant adjustment in the class distribution of the pitting corrosion variable was observed. The initial imbalance, where positive instances outnumbered negative cases by approximately 8.74:1 (see Table 3), was reduced to 1.98:1 post-imputation. However, this remaining disparity necessitated further corrective measures. Thus, data augmentation was applied using SMOTE for nominal and continuous variables (SMOTE-NC)113. This method oversamples the minority class (i.e., pitting negative cases), achieving a more equitable distribution of classes that ensures unbiased predictive modelling.

Figure 4 illustrates the results obtained using SMOTE-NC, comparing the distribution of cases for pitting corrosion and SCC before and after applying synthetic oversampling. Specifically, the number of negative pitting corrosion cases increased to 80% of the positive cases. This was the optimal threshold detected that preserves the balance of SCC cases. Figure 4a, b demonstrate a substantial increase in the number of negative instances of pitting corrosion from 850 to 1348, while the count of positive instances remained steady at 1685.

Fig. 4: Class distribution balancing for pitting corrosion and SCC.
figure 4

a Initial imbalance in pitting corrosion cases, with an imbalance ratio of 1.98:1. b Adjusted class distribution for pitting corrosion after applying SMOTE-NC, achieving an imbalance ratio of 1.25:1. c The original class distribution in SCC shows an imbalance ratio of 1.21:1. d Class distribution in SCC after SMOTE-NC, with an imbalance ratio of 1:20.

Similarly, Fig. 4c, d show the impact of the dataset balancing on SCC instances. Here, the number of negative SCC cases rose from 1384 to 1653, and positive cases adjusted from 1151 to 1377. This demonstrates that, while effectively oversampling the pitting corrosion minority class, SMOTE-NC did not negatively skew the balance of SCC cases. By implementing these adjustments, the final imbalance ratios for pitting corrosion and SCC were around 1.25:1 and 1.20:1, respectively.

XGBoost modelling and SHAP analysis

After preparing the dataset through imputation and balancing, an XGBoost classification model was trained to predict SCC in DSS. However, our primary objective extended beyond simple prediction, aiming to elucidate feature interactions contributing to SCC susceptibility. Decision-tree ensembles, such as XGBoost, are inherently adept for this end, as their hierarchical structure models how combinations of variables influence the final prediction114,115. Thus, feature contributions within the XGBoost classifier were subsequently investigated using SHAP values116. These quantify feature importance and pairwise synergies, providing data-driven insights that guide variable selection and potential connections for the BN model.

Firstly, the XGBoost model performance was optimised utilising BHO117, which required the coupled framework provided by the Python libraries XGBoost118 and Optuna106. This optimisation process targeted maximising the area under the receiver operating characteristic curve (AUC-ROC), which increases the model’s ability to discern between the classes108. To extensively explore the parameter space and ascertain the optimal model settings, a total of 3000 trials were executed during the BHO procedure. Full particulars concerning the hyperparameter ranges and final settings are presented within the supplementary material accompanying this manuscript.

During the BHO process, the predictive performance of the XGBoost classifier was assessed using a stratified cross-validation (CV) scheme119, structured into five distinct folds. This validation strategy partitions the dataset while preserving the original class proportionality (i.e., the ratio of SCC-positive to SCC-negative instances), preventing potential biases introduced by class imbalance120. For each fold, 80% of the data formed the training set for the BHO-derived model configuration, with the remaining 20% used as the unseen test set. Figure 5 presents the CV results across the five folds, corresponding to the best hyperparameter configuration found through BHO.

Fig. 5: Performance evaluation of the XGBoost classifier after CV.
figure 5

a Confusion matrix displaying the aggregated results from the CV process, where the colour bar indicates the number of instances evaluated across test sets. The XGBoost classifier correctly classified 1682 TP and 1248 TN, incurring 144 FP and 112 FN. b ROC curves for each CV fold. The AUC scores range from 0.899 to 0.991. The mean AUC score was 0.967, with a standard deviation (shaded in grey) of ± 0.036, indicating a high model’s discriminative ability across diverse dataset segments. The overall accuracy of the XGBoost model was 91.97%.

Figure 5a displays the aggregated confusion matrix, summarising the performance across the CV test folds. The optimised XGBoost model correctly classified 1682 SCC-positive instances (i.e., true positives, TP) and 1248 SCC-negative instances (i.e., true negatives, TN). Misclassifications were significantly lower, comprising 144 false positives (FP) and 112 false negatives (FN). All these counts formed the basis for deriving key performance indicators, as outlined in the “Methods” section.

Overall, the XGBoost model achieved an accuracy of 91.97%, indicating high agreement between predicted and actual classes. The XGBoost classifier demonstrated strong performance with a true positive rate (TPR), or recall, of 93.76% and a precision of 92.11%. The F1-score was 92.93%, indicating an optimal model’s effectiveness in predicting the SCC-positive class. The true negative rate (TNR), also termed specificity, was 89.66% and indicates the model’s capacity to correctly classify SCC-negative instances. The XGBoost model exhibited a false positive rate (FPR) of 10.34% and a false negative rate (FNR) of 6.24%. Further assessment is provided by the ROC curves in Fig. 5b, which illustrate the trade-off between TPR and FPR for each CV fold. Here, the AUC scores ranged from 0.899 to 0.991, yielding a high mean AUC of 0.967 ± 0.036. This metric indicates that the XGBoost model consistently distinguishes between SCC-positive and SCC-negative cases. The performance metrics of the XGBoost classification model are outlined in Table 4.

Table 4 Performance metrics of the optimised XGBoost classification model

Figure 6 details the results of SHAP-based feature importance and interaction analyses. Model interpretation utilised the TreeSHAP explainer, as implemented in the Python package SHAP121. Fig. 6a illustrates the SHAP summary plot, displaying the distribution of SHAP values for each feature across all data points. This visual representation ranks features by the magnitude of their mean absolute SHAP values, indicating overall importance. The horizontal axis reflects the additive contribution of features towards shifting the model’s output. Thus, rightward shifts suggest an increased probability of SCC occurrence, while leftward shifts indicate a decreased probability. The colour gradient shifts from blue to red, denoting the impact of feature values, with blue representing a lower impact and red a higher one. In the case of categorical variables, such as pitting corrosion, SHAP values are coloured-coded in blue when pitting does not occur and red when it does.

Fig. 6: SHAP analysis of feature importance and interactions for SCC predictions.
figure 6

a SHAP summary plot demonstrating the distribution of SHAP values for each feature within the XGBoost classifier. Features are ranked by their impact magnitude. Rightward shifts in the plot increase the probability of SCC occurrence, while leftward shifts decrease it. The colour gradient from blue to red indicates impact levels, with blue representing low impact and red indicating high impact. b Analysis of SHAP interaction values, showing the foremost pairwise feature contributions to the XGBoost model output. Critical interactions comprise stress levels (i.e., \({\sigma }_{R}\)), environmental conditions (e.g., pH2S, Cl and pH), and DSS material properties (e.g., \({\sigma }_{{YS}}\) and \({\varepsilon }_{f}\)).

As seen in Fig. 6a, key predictors with high SHAP values, such as \({\sigma }_{R}\), pH2S, and pitting corrosion, demonstrate significant influence on XGBoost predictions towards increased SCC probability. In contrast, features such as pCO2, temperature, Cl, and pH exhibit varied effects on the classifier’s predictions, suggesting a dual role in the model’s predictive dynamics, or dependency on other features. Figure 6a highlights the importance of DSS alloy characteristics in the XGBoost model’s response, such as PRE, \({\sigma }_{{YS}}\), and \({\varepsilon }_{f}\). The relevance of PRE can be attributed to the alloying elements (i.e., Cr, Mo, N, W) determining its value, which collectively prevent pitting corrosion. Regarding the mechanical properties, \({\sigma }_{{YS}}\) and \({\varepsilon }_{f}\), these are inherently associated with toughness; a measure of the energy a material can absorb before fracturing. In fact, toughness serves as a key metric in SCC studies to measure how corrosive environments affect an alloy’s strength and ductility122,123.

Figure 6b provides insights into pairwise feature contributions to model output through SHAP interaction values, quantifying their joint effect beyond individual contributions. Notable interactions include combinations of stress levels (i.e., \({\sigma }_{R}\)) and environmental variables (e.g., pH2S, Cl and pH). This observation aligns with the understanding that susceptibility to SCC is heavily affected by the interaction of mechanical loadings and environmental chemistry. Figure 6b indicates recurring interactions among \({\sigma }_{R}\), pH2S, temperature, PRE, as well as strength and ductility variables (i.e., \({\sigma }_{{YS}}\) and \({\varepsilon }_{f}\)). As opposed to this, interactions with individual alloying additions (e.g., Ni, Cr, Mo and Cu) are observed to be the least frequent. This pattern is consistent with the feature importance rank in Fig. 6a, where chemical constituents appeared less significant to the XGBoost model’s response.

Table 5 ranks the feature importance and interaction effects in the XGBoost classification model. Based on SHAP analyses, the top 10 variables exhibiting the most predominant interactions were selected for the BN model. It is important to emphasise that our SHAP value analysis, derived from training an XGBoost classifier, does not fully explain causality in SCC. Nonetheless, the insights obtained inform the potential connections for designing the BN structure. Thus, the XGBoost-SHAP framework in this investigation allowed for a more detailed understanding of the critical interdependencies among variables, which cannot be readily established through expert knowledge or theoretical comprehension regarding SCC.

Table 5 SHAP-based ranking of feature importance and interaction effects for the XGBoost classifier

BN model

Figure 7 presents the BN model designed to predict SCC of DSSs, which has been developed using the software BayesiaLab 11.3.1 (Bayesia S.A.S. Ltd., France). The network incorporates nodes that represent the most critical attributes influencing SCC occurrence, as identified by our XGBoost–SHAP framework. The selected nodes encompass environmental variables (e.g., pH2S, pCO2, Cl, temperature, pH), stress conditions (\({\sigma }_{R}\)), as well as specific material characteristics (i.e., \({\sigma }_{{YS}}\), \({\varepsilon }_{f}\) and PRE) associated with both mechanical resistance and chemical composition of DSSs. The BN design allows for the systematic interrogation of pitting corrosion and SCC, given the combined effect of sour conditions and tensile loading on DSSs.

Fig. 7
figure 7

BN model for predicting SCC of DSSs.

What stands out in Fig. 7 is the directionality of the arcs pointing from the SCC node to the predictor nodes. In this respect, our BN model adopts an augmented naïve Bayes (ANB) structure, where the reverse directionality of connections emphasises a discriminative modelling approach124,125. Unlike typical BN designs that frequently represent causal pathways, our model is structured to assess scenarios where SCC is assumed, shifting the focus to how various environmental and material factors influence this condition probabilistically. More importantly, ANB-based networks enable the explicit modelling of inter-variable dependencies. These BN structures often demonstrate enhanced classification performance by relaxing the conditional independence assumption in traditional naïve Bayes classifiers, which is often untenable in complex real-world applications126.

In Fig. 7, the colour-coded arcs differentiate dependency types within the BN model. Blue arcs denote direct dependencies from the SCC node (i.e., target node) to predictors, establishing primary pathways that quantify the direct influence of each predictor on SCC risk. In contrast, pink arcs indicate additional inter-variable dependencies, highlighting the interactions that indirectly impact SCC risk. In this regard, feature interactions from SHAP analyses primarily guided the inclusion of inter-variable dependencies. Methodologically, network construction commenced with a basic naïve Bayes structure (generated via BayesiaLab) comprising only direct connections from the SCC target node to predictors. Afterwards, the BN structure was systematically augmented based on SHAP interactions, as reported in Table 5. To assess their impact on the accuracy of the BN model, the inter-variable dependencies were evaluated iteratively using a stratified five-fold CV.

Figure 8 shows the CV results obtained for the final BN model, evaluated with an 80:20 training-test data split. This analysis quantifies the model’s predictive performance for the pitting corrosion and SCC nodes, which are inherently correlated events. As seen in Fig. 8a, the confusion matrix demonstrates significant accuracy during the CV process for the pitting corrosion node, which registered an overall accuracy of 90.21%. Across all test sets, the model successfully classified 90.34% of positive instances (i.e., recall) and 91.43% of negative instances (i.e., specificity). Figure 8b illustrates the ROC curves for the pitting corrosion node, where AUC scores across CV folds ranged from 0.936 to 0.960, with a mean AUC value of 0.950 ± 0.008.

Fig. 8: Performance evaluation of the BN Model using stratified CV.
figure 8

a Confusion matrix displaying the aggregated results from test sets regarding the pitting corrosion node. Here, the BN model predicted 1468 TP and 1173 cases TN, yielding 90.34% recall and 91.43% specificity. b ROC curves for each CV fold for the pitting corrosion node, where AUC scores range from 0.936 to 0.960, and an average AUC of 0.950 ± 0.008. c Confusion matrix displaying the aggregated results from test sets regarding the SCC node. The BN model predicted 1069 TP and 1598 TN, resulting in 90.21% recall and 92.74% specificity. d ROC curves for each CV fold for the SCC node, where AUC scores range from 0.931 to 0.958, and an average AUC of 0.945 ± 0.008. The overall accuracy for the pitting corrosion and SCC nodes was 90.21% and 91.39%, respectively.

An accuracy of 91.39% was obtained for SCC classification. As illustrated in Fig. 8c, the BN model exhibited optimal performance during the CV process. Through all test sets, the BN model yielded 90.21% and 92.74% of recall and specificity, respectively. The ROC analysis for the SCC node yielded AUC scores ranging from 0.931 to 0.958, with an average of 0.945 ± 0.008, as observed in Fig. 8d. Collectively, these results demonstrate that the BN model consistently maintained an optimal level of accuracy across diverse testing conditions. Additional performance metrics for both target nodes, pitting corrosion and SCC, are presented in Table 6.

Table 6 Performance metrics for pitting corrosion and SCC nodes in the BN model

Sensitivity analysis

A sensitivity analysis was conducted to elucidate the contributions and probabilistic dependencies in the BN model. The results are visually summarised in Fig. 9. The variables with the most significant impact on the target node SCC are depicted in Fig. 9a. Here, the size of each node indicates the direct contributions to the SCC node, as determined by BayesiaLab. For clarity, the direct contributions represent the causal effect of a given variable on the target node while holding other variables constant127. Thus, larger nodes represent variables with greater direct influence on SCC.

Fig. 9: Sensitivity analysis of the BN model.
figure 9

a Direct effects on the SCC node. Here, node sizes correspond to direct effect magnitude on SCC, where larger nodes highlight the key influencing factors, such as \({\sigma }_{R}\), pitting corrosion, and pH2S. Red arcs indicate the strongest interdependencies, determined from KL divergence b KL divergence matrix showing the strength of interdependencies between variables, with higher values indicating stronger relationships. This asymmetric matrix denotes the directional dependencies from parent nodes (vertical axis) to child nodes (horizontal axis).

In Fig. 9a, the red arcs highlight the most critical interdependencies among variables, which are determined by Kullback-Leibler (KL) divergence values128. Specifically, KL divergence quantifies the information gain by examining the mutual relationship between two variables, as opposed to assuming they are independent. Therefore, higher KL divergence values indicate a stronger relationship between variables. Figure 9b depicts the KL matrix from the BN model, showcasing the strength of interdependencies between variables. It is important to note that the KL matrix is asymmetric, as KL divergence reflects the directional comparison between parent nodes (listed along the vertical axis) and their child nodes (listed along the horizontal axis)129.

As seen in Fig. 9a, the outcomes of the BN model are predominantly influenced by three main nodes, namely \({\sigma }_{R}\), pitting corrosion, and pH2S. These findings are consistent with results from the feature importance analysis using the XGBoost classifier (see Fig. 6a). Figure 9b indicates that the most critical relationships in the BN model are associated with pitting corrosion, where high KL values correspond to environmental factors, such as pH2S, pH, Cl and temperature. Regarding DSS properties, the PRE node demonstrates a significant effect on the pitting corrosion node. In fact, the PRE node holds a key position in the BN model, demonstrating strong interdependencies with the mechanical properties (i.e., \({\sigma }_{{YS}}\) and \({\varepsilon }_{f}\)), which in turn interact with stress levels at the \({\sigma }_{R}\) node.

It is noteworthy that directed arcs between pH2S and Cl nodes are strongly interconnected, indicating a synergy affecting the pitting corrosion node. Comparatively, the direct interdependence between the temperature and Cl nodes appears less significant in the BN model, despite their well-known influence on pitting corrosion. This observation can be attributed to SCC-promoting conditions where H₂S is present, which alter the corrosion behaviour of stainless steels, including DSSs. In this regard, several published reviews argue that H2S disrupts passive films synergistically with Cl, rendering pH2S the dominant variable triggering localised corrosion compared to variations in temperature and Cl concentration alone4,5,81.

Table 7 offers an overview of the BN model configuration, detailing node discretisation, associated probabilities, mean values, and calculated direct effects on the SCC node. Importantly, the probabilistic parameters within the BN model incorporate uncertainty. For clarity, BayesiaLab performs Monte Carlo simulations (typically employing 1000 samples) to determine parameter uncertainty, which yields 95% confidence intervals for model probabilities130. Meaning that, there is a 95% probability that the true parameter value lies within the given range, conditional upon the model and data.

Table 7 Overview of the BN model configuration

Evaluation of pitting corrosion and SCC

The BN model demonstrated significant efficacy in predicting both pitting corrosion and SCC. This dual capability enables a thorough evaluation of SCC by discerning whether operating conditions are conducive to SCC or merely localised corrosion. Understanding this differentiation is crucial, as relying solely on pitting corrosion as a direct proxy for SCC initiation overestimates the perceived failure risk. This, in turn, may result in unnecessarily conservative reliability assessments for materials such as DSSs131,132,133. A key parameter in the BN model is PRE, which can be used to examine the risks of both corrosion phenomena. Specifically, the PRE node can be adjusted to demonstrate that increased PRE values are associated with decreased probabilities of pitting events, consequently reducing SCC risks.

Figure 10 exemplifies the BN model’s response using the PRE node, showcasing its utility in the probabilistic evaluation of both localised corrosion and SCC. For this analysis, specific conditions were arbitrarily defined, encompassing a pH2S interval of 0.02–0.5 bar, Cl concentrations between 30,000 and 120,000 ppm, temperatures spanning 60–115 °C, and constant loads equivalent to 0.8–1.2 \({\sigma }_{R}\). As shown in Fig. 10, these constrained variables are highlighted in green, indicating their limited probabilistic range within the BN model. To avoid limiting the dataset’s range of observations, other variables such as pH, pCO2, \({\sigma }_{{YS}}\) and \({\varepsilon }_{f}\), were not subjected to specific constraints. Thereby, the BN model modulates their states and projects the most probable outcomes. In Fig. 10, the PRE node is marked with a red square, indicating 100% probability within the defined range. This visual cue emphasises the PRE values under evaluation, changing from the 35–40 interval for standard DSSs (Fig. 10a) to PRE values greater than 40 (Fig. 10b) for super DSSs.

Fig. 10: Comparative analysis of BN model predictions for DSS and super DSS grades.
figure 10

a Predicted probabilities for standard DSS with PRE values between 35 and 40. The BN model predicts a probability of pitting corrosion around 63.75%, while the probability of SCC is 31.61%. b Predicted probabilities for super DSS with PRE values above 40. The BN model shows significantly low probabilities of both pitting corrosion and SCC, around 24.36% and 16.62%, respectively. Variables highlighted in green cover specified conditions: 0.02–0.5 bar pH2S, 30,000–120,000 ppm Cl, temperatures 60–115 °C, and 0.8–1.2 \({\sigma }_{R}\). Variables highlighted in blue are unrestricted. Red square on PRE node indicates a 100% probability according to the specific range of DSS grades.

Figure 10a indicates a 63.75% probability of pitting corrosion for DSSs with PRE between 35 and 40. However, the SCC risk remains significantly low, with a 68.39% probability of non-occurrence. This probabilistic assessment is consistent with the observations reported by Tynell134, who conducted SCC tests on DSS samples of alloy S32205 (PRE ≈ 35) under sour conditions. In this study, DSS specimens were immersed in a solution with approximately 30,300 ppm Cl and subjected to a constant load equivalent to the nominal \({\sigma }_{{YS}}\) (i.e., \({\sigma }_{R}\) = 1.0). While various DSS samples did not undergo SCC, pitting corrosion was observed once pH2S exceeded 0.3 bar and temperatures surpassed 70 °C. Similarly, Craig135 documented pitting corrosion during SCC tests with alloy S32205 (PRE ≈ 35), although no signs of SCC were detected. In contrast, experiments with alloy S32550 (PRE ≈ 39) exhibited no damage. In this work, the U-bend specimens were tested using a solution containing 70,000 ppm Cl at 240 °F (≈ 115.5 °C), while exposed to a CO2/H2S mixture with a pH2S of 5 psig (≈ 0.34 bar).

Figure 10b shows the probability of pitting corrosion and SCC for super DSS grades whose PRE values exceed 40. Here, the risks associated with pitting corrosion and SCC are notably low, with probabilities of 24.36% and 16.62%, respectively. These outcomes are consistent with empirical findings documented in the literature, demonstrating the exceptional resistance of super DSSs to both localised corrosion and SCC99,136,137,138. The experimental conditions in these investigations involve pH2S levels above 0.38 bar, Cl concentrations exceeding 100,000 ppm and temperatures over 100 °C, while stress conditions are equal or greater than 90% \({\sigma }_{{YS}}\).

SCC risks for DSSs

To further investigate SCC risks for DSSs using the BN model, inference analyses were conducted based on the operating limits for DSSs established by Cassagne et al.35. The authors suggested operational thresholds for pH2S based on field-relevant ranges of pH and Cl concentration, which are summarised as follows:

Environments with pH ≈ 3.5:

  • At 1000 ppm Cl, conventional 22 wt% Cr DSSs resist up to 0.5 bar pH2S, whereas 25 wt% Cr DSSs withstand up to 0.7 bar.

  • At 10,000 ppm Cl, pH2S tolerance reduces to 0.3 bar for 22 wt% Cr DSSs, and 0.5 bar for 25 wt% Cr DSSs.

  • At 100,000 ppm Cl, the pH2S limits further decrease to 0.2 bar for 22 wt% Cr DSSs and 0.3 bar for 25 wt% Cr DSSs.

    Environments with pH ≈ 4.5:

  • At 1000 ppm Cl, both 22 wt% Cr and 25 wt% Cr DSSs resist pH2S levels greater than 2.0 bar.

  • At 10,000 ppm Cl, 22 wt% Cr DSSs accommodate pH2S up to 2.0 bar, while 25 wt% Cr DSSs withstand levels above 2.0 bar.

  • At 100,000 ppm Cl, pH2S range is 0.3 − 0.4 bar for 22 wt% Cr DSSs, and 0.8–1.0 bar for 25 wt% Cr DSSs.

For our inference analyses, we established the combinations of the abovementioned environmental conditions, while maintaining an average temperature of 80 °C and tensile loadings around 0.9 \({\sigma }_{R}\). Subsequently, the probabilities of pitting corrosion and SCC were assessed based on the PRE node states within the BN model. Thus, the PRE intervals were defined as follows: PRE ≤ 34, 35 < PRE ≤ 40, PRE > 40. The results of the inference analyses for environments with pH values of 3.5 and 4.5 are detailed in Tables 8 and 9, respectively.

Table 8 Inference analysis of pitting corrosion and SCC at pH = 3.5
Table 9 Inference analysis of pitting corrosion and SCC at pH = 4.5

As observed in Table 8, DSS alloys with PRE ≤ 34 exhibit a high risk of pitting corrosion in environments with a pH of 3.5. The BN model consistently predicted pitting corrosion probabilities exceeding 80% for this DSS category. In addition, significant SCC probabilities, exceeding 54%, were associated with specific conditions, such as 0.3 bar pH2S and 100,000 ppm Cl, 0.5 bar pH2S and 10,000 ppm Cl, 0.7 bar pH2S and 1000 ppm Cl. At lower pH2S values (i.e., 0.2–0.3 bar), SCC risk was relatively low even when the Cl concentration was 100,000 ppm, where the associated probabilities of SCC no occurrence slightly surpassed 52%.

Comparatively, DSSs with PRE values between 35 and 40 demonstrated improved resistance to SCC across all conditions detailed in Table 8. In this respect, SCC is not expected within a probability range from 63.05% to 76.29%. Nonetheless, pitting corrosion remained probable for these DSS alloys, with probabilities oscillating between 58.18% and 71.16%. Notably, super DSS with PRE values over 40 exhibited the lowest risk of localised corrosion, with the absence of pitting reaching a probability as high as 72.05%. This superior performance consequently resulted in lower SCC risks, with the probability of SCC not occurring ranging from 69.18% to 84.21%.

At pH 4.5, the risk of pitting corrosion remains high for DSSs with PRE ≤ 34 across most sour conditions, as reported in Table 9. For these DSS alloys, the BN model predicted pitting probabilities spanning from 74.26% to 79.17%, although SCC risk is generally low, with no occurrence probabilities around 56%. Nevertheless, SCC occurrence risk may climb towards 60% when pH2S levels vary from 0.8 to greater than 2.0 bar, and Cl concentrations exceed 10,000 ppm.

Table 9 shows that for DSSs possessing PRE values between 35 and 40, pitting may manifest with probabilities between 52.16% and 86.22% when pH2S exceeds 0.4 bar. Below this pH2S threshold, the BN model indicates a 58.63% probability that pitting corrosion will not occur even upon exposure to 100,000 ppm Cl. Hence, the SCC risk for these DSS alloys is relatively low across all conditions, where probabilities indicate that there is no occurrence of SCC in the range from 55.28% to 69.98%. Consistent with the results observed in Table 8, the resistance to pitting corrosion and SCC markedly improves for DSSs with PRE > 40 across all evaluated conditions. As shown in Table 9, pitting corrosion is not expected, with probabilities ranging from 70.23% to 83.71%, while probabilities of SCC not occurring are notably high, falling between 81.31% and 90.54%.

Tables 8 and 9 demonstrate that pH variations significantly affect H2S influence on SCC. This was observed by Leyer et al.139, who quantified the pH-H2S interaction, noting that a one-unit pH decrease can equate to a tenfold increase in pH2S regarding environmental severity. As shown in Table 9, a pH of 4.5 significantly diminishes the impact of H₂S on SCC risk, potentially preventing cracking even at pH₂S levels beyond 2.0 bar.

However, DSSs with PRE ≤ 40 remain prone to localised corrosion across most conditions analysed herein, while super DSS grades (PRE > 40) effectively prevent both localised corrosion and SCC. In this regard, empirical evidence from the literature supports the outcomes presented in Tables 8 and 9. For example, Oredsson140 conducted various experiments with DSS samples of alloy S32205 (PRE ≈ 34), which underwent pitting corrosion, albeit cracking did not occur. These observations were made under severe conditions, including a range of 0.1–5.0 bar pH2S, a pH interval of 2.5–3.9, temperatures varying from 20 °C to over 200 °C, as well as Cl concentrations exceeding 30,000 ppm, and tensile stress levels approaching the nominal \({\sigma }_{{YS}}\).

In some SCC investigations, DSSs with PRE between 35 and 40 (e.g., S32205, S31803, S32550 and S32750) have exhibited localised damage while not manifesting cracking97,98,141. The test parameters involved pH values between 3 and 4.5, a range of 0.1–1.0 bar pH2S, temperatures from 60 to over 150 °C, and Cl concentrations ranging from 1000 to over 120,000 ppm, as well as stress levels spanning from 60 to 90% \({\sigma }_{{YS}}\). Comparatively, super DSS alloys (PRE > 40) demonstrate consistent resistance to both pitting and SCC. Investigations featuring alloys such as S32760, S32750 and S32520 reported no corrosion damage in environments comprising pH2S values from 0.1 to 2.0 bar, Cl concentrations between 1000 and over 120,000 ppm, and temperatures up to 110 °C. These conditions also included pH levels from 2.8 to 6, while tensile loadings were equal to, or greater than 90% \({\sigma }_{{YS}}\)99,137,142.

SCC boundaries for DSSs

Thus far, the pH2S values in our inference analyses have exceeded the H2S limits in ISO 15156-331 for DSSs. Despite this, our findings indicate that the incidence of SCC is often unlikely. This exposes a marked disparity between the conservative limits in industry standards and the actual performance of DSSs. While pH2S is critical in restricting DSS applications, relying solely on this parameter is insufficient to determine SCC risks comprehensively.

In an attempt to elucidate the conditions under which SCC of DSS is prevented, we conducted a backwards analysis. This method infers the posterior probabilities of relevant variables based on observed outcomes125. By assuming a 100% probability that SCC does not occur in the BN model, we can identify the range of environmental conditions that lead to this outcome. Similarly, a 100% probability of no pitting corrosion allows for examining possible settings that prevent both corrosion phenomena, while considering tensile loadings no greater than 90% \({\sigma }_{{YS}}\) (i.e., \({\sigma }_{R}\,\)≤ 0.9).

Figure 11 illustrates the backward analysis using the BN model, demonstrating that DSS grades can withstand up to 0.5 bar pH2S. This concentration considerably exceeds the ISO 15156-3 limits of 0.02 bar for conventional DSSs and 0.2 bar for super DSS grades, representing 25- and 2.5-fold increases, respectively. According to the BN model, other important environmental parameters must be within specific ranges, such as Cl ≤ 31,767 ppm, pCO2 ≤ 0.546 bar, temperature ≤ 89 °C, and pH values around 4.2. Regarding the mechanical properties, the BN model points towards a probable range of 510 MPa ≤ \({\sigma }_{{YS}}\) ≤ 995 MPa, which covers commonly employed DSSs, such as S32250 and S327505,22.

Fig. 11: Backward analysis to identify safe operating conditions for DSSs.
figure 11

The BN model infers that DSSs may resist up to 0.5 bar pH2S, in environments comprising Cl ≤ 31,767 ppm, pCO2 ≤ 0.546 bar, temperatures ≤ 89 °C, and pH values around 4.2. The nodes SCC and pitting corrosion indicate a 100% probability that both corrosion phenomena do not occur, while the \({\sigma }_{R}\) node indicates tensile loadings are greater than 90% \({\sigma }_{{YS}}\).

Extensive experimental findings support the pH2S boundary derived in this work. By way of illustration, Fig. 12 shows the results from a range of selected SCC investigations, using the relationship between pH2S and Cl content. Analogous to Fig. 1, these parameters are key synergistic factors driving SCC, which facilitates comparison with the probabilistic pH2S limit of 0.5 bar estimated herein. Specifically, Fig. 12a highlights that few pitting corrosion events are observed as pH2S approximates to 0.5 bar. However, Fig. 12b shows that pitting corrosion and SCC occur predominantly at pH2S levels exceeding 0.5 bar.

Fig. 12: Comparison of the pH2S limit derived from BN backward analyses against experimental evidence from the literature.
figure 12

a Pitting corrosion instances occur as pH2S approximates to 0.5 bar. b SCC instances are primarily seen above 0.5 bar pH2S. Symbol shape indicates material type: circles for conventional DSS and diamonds for super DSS. Data points are coloured red if pitting, or SCC was reported and blue if no corrosion damage occurred. The black dashed lines in both plots represent the safe operating limits for pH2S (i.e., 0.02 and 0.2 bar) for DSS alloys according to ISO 15156-3 standard31. In contrast, the limit derived from the BN model at 0.5 bar pH2S is highlighted in pink. The data points cover a wide range of experimental conditions: pH2S up to 1.0 bar, Cl concentrations near 30,000 ppm, temperatures from 21 to 100 °C; a pH interval of 2.5–5.4, pCO2 up to 70 bar, and applied stresses closely to \({\sigma }_{R}\) ≈ 0.9.

Table 10 reports the DSS samples and experimental conditions of the literature data in Fig. 12. As summarised in Table 10, the data encompass a wide range of DSS alloys, such as S32205, S31803, S32750 and S32760. Test conditions replicated demanding downhole environments. These included pH2S values up to 1.0 bar, Cl concentrations around 30,000 ppm, temperatures from 21 to 100 °C, a pH interval of 2.5–5.4, and pCO2 up to 70 bar. Applied stresses were also significant, between 448 MPa and 1025 MPa (corresponding to \({\sigma }_{R}\) ≈ 0.9). These parameter ranges meet relatively well with the posterior intervals predicted by our BN model, underscoring both the repeatability of the evidence base and the conservatism embedded within current ISO 15156-331 limits for DSSs.

Table 10 Summary of DSS samples and experimental settings in SCC studies presented Fig. 12

Final remarks

This investigation introduces a BN model that assesses the risk of SCC in DSSs, particularly in the challenging conditions of downhole environments. Extensive CV demonstrated that the BN model achieved an accuracy of over 90% in predicting both pitting corrosion and SCC probabilities.

Our BN model infers that DSSs can reliably withstand higher partial pressures of hydrogen sulphide (pH2S) than those currently stipulated by standard ISO 15156—Part 3 (ISO 15156-3)31. Specifically, our BN model estimates low SCC risk for DSS alloys even when exposed to 0.5 bar pH2S in the gas phase. Such threshold is significantly higher than ISO 15156-3 limits (i.e., 0.02 and 0.2 bar pH2S), representing a 25-fold and 2.5-fold increase for conventional DSS and super DSS, respectively. This finding underscores the insufficient characterisation of DSS grades in current standards and suggests the potential for their more cost-effective utilisation in sour service applications.

However, limitations persist in the context of causal inference. Our BN model falls short of enhancing the understanding of the physicochemical mechanisms driving SCC in the presence of H2S. Much information is required to incorporate the effect of H2S on promoting anodic dissolution through increased acidization and active corrosion acceleration leading to pit growth. Similarly, comprehensive data is needed to elucidate how H2S facilitates increased hydrogen absorption in metals, resulting in embrittlement that ultimately promotes crack initiation and propagation.

Future work should therefore explore several avenues to enhance mechanistic understanding. As exemplified by network designs proposed by Sridhar80, a better mechanistic understanding of SCC can be achieved by incorporating into the BN model microstructural details (e.g., phase balance, precipitates, cold work), as well as more specific environmental and mechanical factors (e.g., halide types, potentials, strain rates). In addition, transitioning our BN model to a dynamic BN approach would enable modelling time-dependent degradation processes, which is crucial for lifecycle assessments.

Data scarcity is a common limitation given the complex and costly experimental methods needed in EAC research. Thus, developing holistic EAC frameworks requires sophisticated data handling methods. In this study, the application of GANs for data imputation proved beneficial, and their potential for generating synthetic data is being increasingly explored to overcome dataset limitations in corrosion studies143,144.

Methodologically, a key contribution of this work was the integration of ML with explainable AI. Specifically, XGBoost and SHAP analyses facilitated the data-driven development of our BN model, overcoming limitations of reliance solely on expert judgment. This data-centric approach holds considerable promise for future BN modelling of complex corrosion processes, applicable to diverse alloy systems and EAC phenomena, where adequate data exist. Nevertheless, enhancing causal interpretability and inferential validity may necessitate hybrid models. Thus, combining empirical data analysis with expert-informed variables should also be applied through relevant strategies, such as protocols in expert elicitation, structure learning with expert constraints, or network merging algorithms78,145,146.

Lastly, it is pertinent to note that the BN model effectively synthesises a substantial body of literature data to manage uncertainties associated with DSS. However, the SCC boundaries estimated in this work should not be regarded as definitive limits. Rather, they indicate a range of conditions under which SCC susceptibility may be further investigated. The primary objective of this study lies in developing an integrative framework for SCC risk assessment aimed at advancing risk-based corrosion management and informing cost-effective material selection.

Methods

Workflow for BN model development

The development of the BN model in this study was structured into a three-stage workflow, as illustrated in Fig. 13. The initial stage focused on compiling data from 28 selected publications related to SCC studies of DSSs in sour environments, and thus consolidating a knowledge-based dataset, as detailed previously in Table 2.

Fig. 13
figure 13

Schematic overview of the three-stage workflow in this study.

In the second stage, the generated dataset underwent preprocessing, which included missing data imputation via GAIN and minority class oversampling using SMOTE-NC. The analysis proceeded with the training of an XGBoost classification model. Here, BHO was employed to fine-tune model settings so as to maximise predictive accuracy. Subsequently, SHAP analyses were undertaken to identify the main features, and their interactions, that largely contribute to SCC predictions. The insights gained from these analyses were instrumental in guiding the BN model design, specifically in selecting nodes and configuring directed arcs.

The third stage centred on constructing the BN model, leveraging the results from the XGBoost modelling. This phase encompassed establishing node connections, determining appropriate discretisation, and assessing the model’s performance through stratified CV. Fundamentally, stratified K-fold CV is a variant of k-fold CV. While k-fold CV randomly partitions the dataset into k equal-sized folds for model training and validation, stratified k-fold CV ensures that each fold maintains an equivalent proportion of class samples as the original dataset119. It is pertinent here to mention that both the XGBoost and BN models underwent CV using a stratified k-fold scheme of five folds. Meaning that, each CV cycle employs 80% of the data for training the models, and 20% for testing them.

During the CV process, we evaluated a range of performance metrics108, which are listed as follows:

  • Confusion matrix. It constitutes a 2 × 2 matrix that summarises the performance of a classification model of the form:

    $$\left[\begin{array}{cc}{TP} & {FP}\\ {FN} & {TN}\end{array}\right]$$
    (4)

    here, TP and TN denote correctly classified positive and negative instances, respectively; FP and FN denote negative instances misclassified as positive and positive instances misclassified as negative, respectively.

    From the confusion matrix, the classification models’ accuracy is then estimated by.

    $${Accuracy}=\frac{{TP}+{TN}}{{TP}+{TN}+{FP}+{FN}}$$
    (5)
  • Recall, or TPR, evaluates the model’s capacity to correctly identify all relevant positive instances. It is computed as:

    $${Recall}=\frac{{TP}}{{TP}+{FN}}$$
    (6)
  • Precision assesses how many of the instances labelled as positive by the model are actually positive, which is calculated by

    $${Precision}=\frac{{TP}}{{TP}+{FP}}$$
    (7)
  • F1-score. It is the harmonic mean of precision and recall, designed to provide a single measure that balances both false positives and false negatives. Unlike the arithmetic mean, the harmonic mean penalises extreme values. F1-score is especially relevant in scenarios with class imbalance.

    $${Precision}=2\times \frac{{Precision}\times {Recall}}{{Precision}+{Recall}}$$
    (8)
  • Specificity, or TNR, indicates the model’s ability to correctly identify negative cases, serving as a counterpart to recall for the negative class.

    $${Specificity}=\frac{{TN}}{{TN}+{FP}}$$
    (9)
  • Receiver operating characteristic curve (ROC) is a graphical representation of the trade-off between the TPR and the FPR of a binary classifier, where TPR and FPR are determined by

$${TPR}=\frac{{TP}}{{TP}+{FN}}$$
(10)
$${FPR}=\frac{{FP}}{{FP}+{TN}}$$
(11)
  • The area under the curve (AUC) quantifies the overall performance of the classifier by measuring the entire area beneath ROC. It provides a scalar value ranging from 0 to 1, indicating the model’s aggregate ability to distinguish between positive and negative classes, a perfect classifier achieves an AUC of 1, whereas random guessing yields 0.5147.

Additionally, a sensitivity analysis of the BN model was conducted to examine the primary attributes influencing SCC. Inference analyses were performed to explore SCC risks for DSSs within a range of sour conditions, comparing the results against existing literature. Ultimately, diagnostic reasoning (also termed backward analyses) was performed to infer the likely safe operating conditions for DSSs. This involved setting the desired outcome state (e.g., absence of SCC) and calculating the posterior probability distributions of the input variables consistent with that state.

Generative adversarial imputation nets

In real-life applications, datasets frequently exhibit inconsistencies, notably in the form of missing values across their attributes148. To handle this issue, one prevalent strategy is imputation, through which missing instances are estimated based on observed values within the dataset149. In this respect, deep learning (DL) methods have increasingly been employed to estimate missing values, such as denoising autoencoders, and GANs103,150. Distinctively, these techniques tend to outperform statistical techniques (e.g., logistic regression, decision trees, and predictive mean matching), as they operate without assumptions about underlying data distribution151. Moreover, DL-based methods use a robust model to estimate missing data across multiple features, thereby effectively capturing the latent structure of complex high-dimensional data152,153.

In this work, GAIN algorithm103 is employed to resolve the missing values in our dataset. This data imputation approach has been effectively applied in various domains, such as materials science, civil engineering and medical research112,154,155. Fundamentally, the GAIN method employs two main components: the generator and the discriminator110. The generator imputes the missing values based on the observed data, producing a complete data vector that resembles the real data distribution. Subsequently, the discriminator, equipped with additional hints about the missingness pattern, examines and distinguishes between observed and imputed values in the complete data vector. This adversarial process iteratively trains the generator to deceive the discriminator optimally, replicating the data’s actual distribution156.

Synthetic minority over-sampling

SMOTE is a widely accepted method for addressing class imbalance in classification datasets157. This random oversampling method generates synthetic examples within the minority class to achieve a more balanced class distribution, thereby enhancing the predictive performance of ML models. Specifically, SMOTE selects a minority class instance and then identifies its k-nearest neighbours within the feature space. Subsequently, a synthetic instance is created by interpolating the selected instance, and one or more of the nearest neighbours113. Thus, the synthetic sample, \(s\), is generated by

$$s={x}_{i}+u\times ({x}_{{ki}}-{x}_{i})$$
(12)

where, \({x}_{i}\) is a randomly chosen minority class instance, \({x}_{{ki}}\) is one the k-nearest neighbours of \({x}_{i}\), and \(u\) is a random number between 0 and 1.

In this work, we employ a variant of the original SMOTE algorithm, which is designed to handle datasets that contain both nominal (i.e., categorical) and continuous features. Referred to as SMOTE-NC113, this technique modifies the Euclidean distance calculation, required to estimate the k-nearest neighbours, by incorporating the median of the standard deviations of all continuous features from the minority class. This inclusion acts as a penalisation factor when categorical features differ between a sample and its potential k-nearest neighbours, effectively accounting for the disparity in categorical feature values. For the synthesis of new samples, SMOTE-NC employs the standard SMOTE interpolation for continuous features. Meanwhile, for categorical features, SMOTE-NC assigns the value that appears most frequently among the k-nearest neighbours, ensuring that the synthetic samples respect the distribution of the categorical data within the data frame.

Extreme gradient boosting

The XGBoost algorithm has widely gained recognition as a scalable and highly efficient ML method for classification and regression tasks in a variety of scientific fields158. Developed by Chen and Guestrin118, XGBoost stands as an optimised version of the Gradient Boosting (GB) framework proposed by Friedman159. This approach sequentially produces and updates base classifiers (weak learners) to build a robust ensemble classifier (strong learners), thereby systematically reducing errors and enhancing prediction accuracy. Meaning that, XGBoost algorithm aims to minimise prediction errors by gradually adding more learners (decision trees), while basing each update on the previous model’s prediction results160. Thus, the resulting model exhibits the minimum bias and variance during the training process114. Fundamentally, the predicted output of the XGBoost model, denoted as \({\hat{y}}_{i}\), is the sum of all scores predicted by K trees, expressed as

$${\hat{y}}_{i}=\mathop{\sum }\limits_{k=1}^{K}{f}_{k}\left({x}_{i}\right),{f}_{k}\in F$$
(13)

where K denotes the total number of trees, \(k\) represents the k-th tree, \({x}_{i}\) is the feature vector corresponding to sample \(i\), while \(F\) is the space of regression trees. To learn the set of functions, the algorithm minimises the following regularised objective function (\({obj}\))

$${obj}=\mathop{\sum }\limits_{i=1}^{n}L\left({y}_{i},{\hat{y}}_{i}\right)+\mathop{\sum }\limits_{k=1}^{K}\varOmega \left({f}_{k}\right)$$
(14)

Here, \(n\) is the number of samples, \({y}_{i}\) is the actual value of the i-th target, while \({\hat{y}}_{i}\) is the predicted value of the i-th target. The term \(L\left({y}_{i},{\hat{y}}_{i}\right)\) represents the training loss function, which quantifies the discrepancies between predictions and data points. Lastly, \(\varOmega \left({f}_{k}\right)\) is the regularization term that controls the complexity of the model to prevent overfitting118, and is defined as

$$\varOmega \left({f}_{k}\right)=\gamma T+\frac{1}{2}\lambda \mathop{\sum }\limits_{j=1}^{T}{\omega }_{j}^{2}$$
(15)

where, \(T\) represents the number of leaves, and \({\omega }_{j}\) is the score of the j-th leaf. The coefficient \(\gamma\) denotes the minimum loss reduction required to split a new leaf, and \(\lambda\) is a regularization coefficient.

Bayesian hyperparameter optimisation

The effective implementation of ML models is a multifaceted and demanding process that extends beyond selecting an adequate algorithm; it also involves hyperparameter optimisation to fine-tune the model’s configuration. This process is essential for ensemble models, such as XGBoost and DL algorithms, where hyperparameters set the training conditions and significantly impact model performance and adaptability161,162.

However, the diverse types of hyperparameters, which include continuous, discrete, and conditional values, make traditional optimisation strategies (e.g., grid and random search) less effective due to their inability to fully navigate the configuration space163,164. To overcome these problems, BHO offers a sophisticated solution. This approach constructs a probabilistic surrogate model of an objective function, leveraging information from previous evaluations to make informed decisions about which hyperparameters to explore next165,166. Thus, the surrogate model facilitates understanding the relationship between hyperparameters and model performance167.

Moreover, BHO has proven to effectively determine optimal hyperparameters, especially for high-dimensional problems165,168. In this study, we employ the tree-structured Parzen estimator (TPE) approach for implementing BHO; thus, eliminating the need for predefined initial values or training datasets169. The TPE algorithm initiates by randomly exploring the parameter space. Subsequently, it classifies the sampled parameters based on their performance according to a predetermined cost function. The TPE algorithm classifies the hyperparameters that yield the most effective outcomes into one group, and allocates the rest to a second group. This classification allows for modelling the likelihood of parameter effectiveness170. The main objective is then to identify a set of hyperparameters that probabilistically belong to the first category. Thus, the expected improvement (EI) per iteration is ref. 171

$${EI}=\frac{l(x)}{g(x)}$$
(16)

where \(l(x)\) and \(g(x)\) are the probability in the first and second groups, respectively.

Shapley additive explanation

The SHAP technique utilises a game-theoretic framework to interpret predictions from any ML model116. Based on the Shapley value concept from cooperative game theory, SHAP assigns a fair contribution value to each feature of a data instance, analogous to players in a game172. These values, referred to as Shapley values, distribute the prediction outcome among the features. In the context of SHAP, the model prediction is then expressed as ref. 173

$$f({\rm{x}})=g\left({{\rm{z}}}^{{\prime} }\right)={\phi }_{0}+\mathop{\sum }\limits_{i=1}^{M}{\phi }_{i}{z}_{i}^{{\prime} }$$
(17)

wherein \(f({\rm{x}})\) is the original model output and \(g\left({{\rm{z}}}^{{\prime} }\right)\) represents the SHAP explanation model. The base value \({\phi }_{0}\) represents the prediction with no features present, while \({\phi }_{i}\) corresponds to the SHAP value for the i-th feature, measuring the feature contribution to the difference between the actual prediction and the base value. The term \({z}_{i}^{{\prime} }\in \left\{\mathrm{0,1}\right\}\) is a binary variable that indicates whether the i-th feature is present (1) or absent (0), while M is the total number of features. Thereby, SHAP represents the prediction as a sum of binary components. In this work, we employ the SHAP scheme for tree-based ensemble models, commonly referred to as TreeSHAP174.

Bayesian networks

BNs provide a graphical framework that facilitates the representation and comprehensive analysis of uncertainty, as well as the interdependencies among multiple variables125. Fundamentally, BNs are a compact representation of a multivariate statistical distribution function, which efficiently encode the joint probability distribution (JPD) of a set of random variables, \(X=\left\{{x}_{1},{x}_{2},...,{x}_{n}\right\}\), through conditional independence statements, conditional functions and, principally, conditional probability matrices (CPMs)175,176. From this foundation, BNs permit the integration of various sources of information, coupling both probabilistic and deterministic models seamlessly40. Additionally, BNs can effectively be applied to obtain reliable inferences even when the data is ambiguous or incomplete177.

A BN consists of two primary components: a directed acyclic graph (DAG) where nodes correspond to the random variables \(X\), and a set of directed arcs (arrows) depicting the probabilistic dependencies between these variables. Accompanying the DAG are the associated CPMs for each node, specifying quantitatively the effects of dependencies among variables178. Via the concept of causal independence (i.e., d-separation), the JPD of all variables in \(X\) is given by the product of conditional probabilities, as follows179

$$\left({x}_{1},{x}_{2},...,{x}_{n}\right)=\mathop{\prod }\limits_{i=1}^{n}P\left({x}_{i}{|pa}\left({x}_{i}\right)\right)$$
(18)

where \({pa}\left({x}_{i}\right)\) denotes the set of parent variables of \({x}_{i}\), and \(P\left({x}_{i}{|pa}\left({x}_{i}\right)\right)\) is the CPM for \({x}_{i}\). Moreover, BNs can perform diagnostic analyses (also referred to as backward analyses) through a variety of inference techniques based on Bayes’ theorem in the form

$$P\left(X|E\right)=\frac{P\left({E|X}\right)P\left(X\right)}{P\left(E\right)}=\frac{P\left(E,X\right)}{{\sum }_{{X}^{{\prime} }}P\left(E,{X}^{{\prime} }\right)}$$
(19)

here, \(P\left(X|E\right)\) is the posterior probability based on the obtained evidence (\(E\)), \(P\left(X\right)\) denotes the prior probability, \(P\left(E,X\right)\) represents the conditional probability (assuming that X is true), and \(P\left(E\right)\) is the likelihood (also known as expectedness) that the evidence will be observed. Specifically, \(P\left(E\right)\) is calculated via the sum \({\sum }_{{X}^{{\prime} }}P\left(E,{X}^{{\prime} }\right)\), where \(P\left(E,{X}^{{\prime} }\right)\) represents the joint probability of the evidence E and a possible hypothesis state \({X}^{{\prime} }\), summed over all possible states \({X}^{{\prime} }\).