Abstract
The corrosion failure prediction of coating materials in diverse environments is of great significance for service performance evaluation. This work proposes a two-stage machine learning method that makes use of various data, including environmental factors, physical properties, and coating barrier performance, to accurately evaluate the corrosion degradation of coatings. In the first stage, a 1-year outdoor exposure experiment of polyurethane coatings was conducted in nine representative climatic environments. A semi-supervised collaborative training regression model is established between key environmental data and physical properties of coatings (i.e., glossiness, adhesion, water contact angle, and yellowness). In the second stage, using the predicted physical property data as inputs, a machine learning model is constructed that links physical properties to the barrier performance of coatings and develops binary classification models that can distinguish between intact and damaged coatings. This two-stage modeling strategy provides enhanced prediction accuracy and scientific interpretability by incorporating intermediate physical property parameters.
Similar content being viewed by others
Introduction
The application of organic coatings has been widely adopted as a cost-effective and efficient means to control corrosion1,2. However, coatings are prone to degradation and subsequent failure when exposed to various atmospheric factors, such as solar irradiation, fluctuating temperatures, and varying humidity3,4. Due to the complex mechanism of coating degradation influenced by environmental factors, rapidly assessing the failure process of organic coatings and designing more durable protective coatings have emerged as critical research challenges.
Environmental factors such as UV irradiation, humidity, and temperature fluctuations can significantly affect coating performance. UV exposure induces polymer degradation, leading to changes in glossiness and yellowness, while humidity promotes hydrolysis and weakens adhesion5,6. Temperature variations cause expansion and contraction of coatings, potentially altering surface integrity7,8. More importantly, these physical properties play a critical role in the protective performance and durability of coatings under diverse environmental conditions9,10,11,12. Specifically, adhesion strength determines the coating’s ability to bond with the metal substrate, even in humid or corrosive atmospheres. Variations in water contact angle (WCA) reflect the hydrophobicity and anti-fouling characteristics of coatings, both essential for corrosion resistance in moisture-rich or chemically aggressive conditions13. Glossiness can reflect the degree of surface degradation, while variations in yellowness serve as indicators of aging induced by UV irradiation or oxidative environments14. Understanding the relationships between these physical properties and corrosion-related failure is hence essential for the design of coating with enhanced durability. By examining the full chain of relationships—from environmental factors to physical properties, then to protective performance—is of great importance. Such research provides theoretical support for understanding the impact of environmental factors on coating performance, laying the groundwork for the development of precise prediction models regarding the failure process of coatings.
Given the complexity of environmental factors and corrosion degradation mechanisms15, effectively predicting the failure process of coatings under diverse environmental conditions remains a significant challenge. Researchers typically expose organic coatings to both indoor (e.g., UV radiation and salt spray test) and outdoor (e.g., atmospheric exposure) environments, and subsequently measure specific properties after a designated period16. They have proposed parameterized models, including fitted empirical equations and the Arrhenius equation, to facilitate a time-dimensional extrapolation of the coatings’ performance. However, the scarcity of data from outdoor exposure experiments on organic coatings leads to considerable inaccuracies in empirical formulas used for predicting their aging failure17. In recent years, machine learning has emerged as a promising approach in the field of coating failure prediction. For instance, Ma et al.18 developed a lifetime prediction model for multilayer Cr/GLC coatings in deep-sea environments by integrating a mechanistic empirical model with combined artificial neural network (ANN) and random forest (RF). This model utilized environmental factors such as electrochemical impedance spectroscopy (EIS) data as inputs, with coating service lifetime as the output. Hydrostatic pressure, a critical environmental factor in deep-sea conditions, compromises the mechanical integrity of coatings by exacerbating the propagation of microcracks and increasing the risk of coating delamination. Additionally, hydrostatic pressure alters the electrochemical behavior of coatings, affecting the impedance characteristics and accelerating corrosion processes. As such, incorporating hydrostatic pressure into prediction models is essential for enhancing the accuracy and reliability of coating lifetime predictions.
However, most traditional algorithms rely on large volumes of labeled data, posing significant challenges in practical applications. In contrast, semi-supervised learning can leverage unlabeled environmental data to enhance the model accuracy, making it particularly suitable for investigating the impact of environmental factors on the failure behavior of organic coatings19. Furthermore, binary classification models based on coating damage conditions—which categorize coating states as either “damaged” or “intact”—simplify the model construction process and improve predictive accuracy20,21. The practical significance of such models lies in their capability to rapidly identify coatings in need of repair, thereby facilitating timely maintenance and quality control of coatings. Binary classification approaches are widely employed in various fields requiring two-category decisions, such as disease diagnosis or spam detection. For example, Chang et al.22 proposed a machine learning method for knowledge element extraction that leveraged a classification framework with 25 semantic types. By formulating the extraction of each semantic type as a binary classification task, they integrated these tasks into a multi-classification model using the Error Correcting Output Code (ECOC) method. Through comparisons of decision trees, Support Vector Machines (SVM), and Naive Bayes classifiers, SVM emerged as the most effective base classifier, demonstrating superior performance in the multi-classification model. These findings validate the robustness and versatility of binary classification-based methods in knowledge extraction and related applications.
In this study, we used environmental data from nine representative locations, as well as information on the physical properties and corrosion failure of coatings, to establish a predictive model for environmental data to physical properties and then to coating failure condition23,24. First, environmental factors were employed to predict changes in the physical properties of coatings; these predicted physical properties then served as inputs to estimate the extent of corrosion-induced coating failure. This two-stage approach facilitates a more precise assessment of coating degradation behavior under varying environmental conditions, thereby improving the accuracy of service life predictions and protective performance evaluations. In the first stage, we analyzed a range of environmental factors such as temperature, humidity, atmospheric pressure, and solar irradiation across nine representative locations to establish a relationship model between environmental data and changes in the physical properties of coating materials (e.g., glossiness, adhesion, WCA, and yellowness). This model quantitatively characterizes the impact of environmental factors on the physical properties of coatings, laying the foundation for the subsequent prediction of corrosion failure of coatings. In the second stage, using the predicted physical properties derived from the first stage, we constructed a machine learning model to reflect the extent of corrosion degradation. Specifically, this model employs the predicted physical property parameters as inputs and outputs the degree of corrosion failure as indicated by EIS impedance modulus and the number of time constants. Through this framework, the influence of physical property alterations on corrosion failure can be assessed with high precision, facilitating more accurate predictions of corrosion resistance performance under diverse environmental conditions.
Results and discussion
Development of the “Environmental Factors-Physical Properties” model
Using nine atmospheric corrosion exposure sites located in various geographical regions, the service behavior of Polyurethane (PU) varnish coating was evaluated over a 1-year period under diverse environmental conditions. As shown in Table 1, the nine atmospheric corrosion exposure sites are located in diverse geographic regions along the Belt and Road Initiative (BRI). These locations include Singapore, Guangxi, Nepal, Chennai, Kalimantan, Jeddah, Cilacap, Cairo, and Islamabad, each exhibiting distinct environmental characteristics. Singapore, located at the southern tip of the Malay Peninsula, maintains consistently warm and humid conditions. Guangxi, located in southern China, features a subtropical climate with numerous rivers. Nepal, nestled in the Himalayas, features a range of climates from tropical to alpine. Chennai, a coastal city in southern India, experiences a tropical wet climate. Kalimantan, on the Indonesian island of Borneo, represents a tropical rainforest climate rich in biodiversity. Jeddah, a key city along Saudi Arabia’s Red Sea coast, is characterized by a hot desert climate. Cairo, situated on the Nile River in Egypt, has a dry desert climate with minimal rainfall. Islamabad, in Pakistan, experiences a temperate climate. Table 2 presents the physical property data of coatings from the nine locations, including adhesion, glossiness, WCA, and yellowness. Among the tested sites, the coating exposed in Jeddah exhibited the highest adhesion strength of 6.97 MPa, while the one in Cilacap recorded the lowest adhesion value of 4.03 MPa. This discrepancy may be attributed to the contrasting humidity levels between the two regions. In areas with higher humidity, such as Cilacap, moisture infiltration can readily cause the coating to absorb water and expand, generating increased internal stress. This internal stress weakens the bond between the coating and the substrate, ultimately reducing adhesion strength25,26. In terms of glossiness, the coatings in Nepal exhibited the highest value of 22.14 Gu, whereas those in Cilacap displayed the lowest glossiness of 8.00 Gu. The climate in the Zhilaza region is hot, and it is exposed to intense direct sunlight for a long time. High-energy ultraviolet rays continuously impact the coating surface27,28, accelerating the deterioration process of gloss, whereas Nepal’s relatively mild climate helps preserve gloss. Regarding the WCA, the coating in Cilacap recorded the lowest value at 73.20°, indicating that the coating surface is more susceptible to wetting. The dry environment in Cilacap can damage the coating’s micro-roughness, thus reducing its hydrophobicity. The yellowness analysis revealed that the coatings in Singapore had the highest yellowness value of 0.2, while those in Jeddah had the lowest at −1.2. This phenomenon suggests that the high-temperature and humid climate of Singapore may accelerate the aging and yellowing of coatings during service, in contrast to the dry, low-precipitation conditions in Jeddah. In summary, environmental factors, including humidity, temperature, UV irradiation, and precipitation, exert distinct and significant effects on the physical properties of coatings.
As illustrated in the environmental data (Fig. 1), substantial variations in climatic conditions occur across different regions. For example, desert areas such as Jeddah exhibit substantial daily temperature variations and low precipitation, whereas tropical regions like Singapore and Chennai experience high temperatures and humidity. Certain regions may face extreme environmental conditions, including elevated temperatures, low atmospheric pressure, or severe dryness (e.g., Cairo and Jeddah). These extreme conditions present considerable challenges to the durability and reliability of coating materials.
a Average Temperature; b Maximum Temperature; c Minimum Temperature; d Atmospheric Pressure; e Dew Point Temperature; f Wind Speed; g Cloud Cover; h Mean Daily Precipitation; i Annual Mean Sunshine Duration; j Average Humidity; k Yearly Average Surface Horizontal Irradiation; l Yearly Total Surface Horizontal Irradiation.
To identify the key environmental factors, it is first essential to consider the correlations among these variables. Highly correlated variables may contain redundant information, which can diminish the modeling accuracy and efficiency. The Pearson correlation coefficients were calculated to evaluate the relationships among various environmental factors, including Ave. Temp., humidity, Daily Precip., Min Temp., Dew Point, Atm. Press., Max Temp., Sun Hours, and two solar irradiance indices (YTSHI and YASHI), along with wind speed and cloud coverage, as depicted in Fig. 2. To filter the key environmental factors, variables that exhibited high correlations but low significance were removed. For instance, the correlation coefficient between YTSHI and YASHI is 1, indicating that both parameters represent weighted or combined measures of solar irradiation with identical data trends. Consequently, YTSHI was determined to be a redundant parameter and was excluded from the selection of key environmental factors.
Next, the environmental factors were ranked by their significance, and the most influential factors were selected for subsequent modeling. Following the method outlined in section “Identification of key environmental factors”, the importance of 12 environmental variables to various coating physical properties was quantified. As illustrated in Fig. 3, the four most significant factors were identified as the initial candidate key environmental factors. For adhesion, the key environmental factors were average temperature, relative humidity, daily precipitation, and total solar irradiation intensity, with average temperature showing the highest importance. This highlights the pivotal role of temperature in coating corrosion failure. Higher temperatures are more likely to trigger the oxidation reaction of the coating29,30, and changes in temperature can also cause the shrinkage and expansion of the coating. In terms of glossiness, the primary environmental factors identified were relative humidity, dew point temperature, maximum temperature, and average hourly surface horizontal irradiation. Among these, relative humidity exhibited the greatest influence. High-humidity environments promote moisture uptake by the coating, disrupting surface smoothness and reducing gloss. Moisture absorption may also cause slight swelling or blistering on the coating surface, altering its uniformity of light reflection. Furthermore, elevated humidity levels can facilitate the hydrolysis or degradation of coating components, thereby intensifying gloss loss. Similarly, the main environmental factors affecting WCA were average wind speed, maximum temperature, cloud cover, and daily precipitation, the wind speed accelerates the evaporation rate of the droplets and reduces the degradation of the coating. For yellowness, the primary influencing factors were atmospheric pressure, relative humidity, average hourly surface horizontal irradiation, and daily precipitation.
After identifying the key environmental factors, predictive models were developed for the four physical properties of coatings: adhesion, glossiness, contact angle, and yellowness. The data from nine sites involved in the coating corrosion failure experiments were used as labeled samples, while data from 100 additional sites served as unlabeled samples. Analysis of the contributions of environmental factors to coating corrosion failure revealed that the cumulative contribution of the top four key factors exceeded 80%31. Consequently, these four factors were selected as model inputs. Using a co-training regression algorithm combined with the leave-one-out method, nine iterations were performed to construct the final model. Semi-supervised learning integrates both supervised and unsupervised learning by utilizing labeled and unlabeled samples to train a model, thereby reducing the dependence on labeled sample data16. As described in section “Cooperative training regression algorithm”, four algorithms were compared: 12-RF (a random forest model trained on the original 12-dimensional features), 4-RF (a random forest model trained on a reduced 4-dimensional features), 12-CORF (a COREG co-training model based on 12 features), and 4-CORF (a COREG co-training model based on the reduced 4-dimensional features). As shown in Fig. 4, the 12-RF model exhibited a higher Root Mean Square Error (RMSE) than the 12-CORF model, and similarly, the RMSE of 4-RF model was higher than that of the 4-CORF model. The 4-CORF model achieved the smallest prediction error among all tested models. Figure 5 further compares the prediction results of these models for the four physical properties under different environmental conditions. Gray bars represent true values, while colored dots and red star markers indicate predicted values. The prediction results of 4-CORF model were significantly closer to the true values across all four physical properties, particularly for glossiness and WCA.
The results demonstrate that the co-training regression algorithm effectively enhances the model’s sample space by incorporating unlabeled data, thereby reducing prediction errors. The higher error observed in the 12-CORF model compared to the 4-CORF model highlights the efficacy of the combination of RF importance analysis and Pearson correlation analysis, in extracting essential input information while eliminating redundancy. This improvement highlights the superior adaptability of 4-CORF model and generalization capability in predicting the physical properties of coatings. The reduced accuracy of models with excessive parameters is attributed to interactions among similar input variables, which may amplify certain factors’ effects on physical properties. For instance, combinations of high temperature and low humidity under specific conditions may result in larger prediction errors. However, the 4-CORF model, by supplementing unlabeled data and mitigating the influence of outlier features, significantly improved the prediction accuracy. In conclusion, the screened co-training regression model demonstrated excellent accuracy and generalization in predicting coating physical properties under various environmental conditions.
Development of the “Physical Property-Corrosion Failure” model
In the second stage, the dataset was expanded from indoor accelerated experiments, which contained multiple cycles of ultraviolet (UV) aging and salt spray tests. The UV aging test was conducted at 50 °C with an irradiation intensity of 60 W/m² for 1.5 days, and the salt spray test was performed using a 5.0 wt.% NaCl solution at 40 °C for 0.5 days. Each cycle of UV aging and salt spray test lasted 2 days, and a total of three such cycles were completed32,33,34,35. The physical property data of coatings were used as inputs to develop two distinct models: a prediction model to explore the nonlinear relationship between coating properties and barrier performance, and a binary classification model to predict coating damage states, thereby improving the accuracy of service state predictions. The classification model complements the prediction model by providing the specific damage state of the coating.
The primary purpose of the accelerated indoor experiments is to expand the dataset by simulating the aging and damage processes of coatings under various environmental conditions. This provides a more diverse and enriched set of input data for model, which will help improve its accuracy and generalization capability, particularly enhancing its predictive precision in practical applications. A total of 37 valid datasets were obtained from multiple cycles of UV aging and salt spray tests. Figure 6 illustrates the distribution and pairwise correlations between different physical property parameters, including glossiness, yellowness, WCA, adhesion, and the low-frequency impedance modulus (|Z|0.01Hz). The diagonal histograms (or density plots) represent the distributions of each physical property parameter, which generally exhibit a near-normal distribution trend. The graph shows varying correlations, both positive and negative, with differing linearity. The scatter plots highlight a positive correlation between glossiness, adhesion, and |Z|0.01Hz. This indicates that samples with higher glossiness and adhesion are associated with higher impedance modulus values, suggesting that these properties contribute to improved barrier property of coatings.
Based on the coating physical property data and electrochemical impedance modulus data collected from indoor accelerated experiments, glossiness, yellowness, WCA, and adhesion were used as input variables, while the |Z|0.01Hz values served as the output. Two distinct models are used: a regression model that explores the nonlinear relationship between coating properties and barrier performance, and a binary classification model that predicts coating damage states. The classification model complements the prediction model by providing specific information about the coating’s damage state, thereby enhancing the accuracy of service state predictions. Regression and classification models were established using Support Vector (SVR), ANN, and Adaptive Boosting (AdaBoost). In terms of dataset division, considering the data distribution, we adopted a training set:test set ratio of 4:1 for partitioning. Optimal parameter combinations were identified for multiple regression models through Grid Search. Specifically, the code iterates through each model, creating a GridSearchCV object for each. The parameter search space encompasses candidate parameter values for each model, as shown in Table 3. The code evaluates each parameter combination using 10-fold cross-validation (KFold). The performance of these models was evaluated using the Coefficient of Determination (R2) and precision metrics.
As shown in Table 4, AdaBoost demonstrated significant advantages in both regression and classification tasks, achieving an R2 value of 0.83 and a precision value of 1. In the binary classification task, SVR also showed high classification accuracy but had lower prediction accuracy in regression tasks, with R2 of only 0.41 and 0.44, respectively. Overall, AdaBoost exhibited exceptional accuracy, making it the optimal model for coating performance prediction and classification.
Figure 7 presents a SHapley Additive exPlanations (SHAP) bar plot illustrating the contributions of four features–adhesion, glossiness, contact angle, and yellowness–to the model output (i.e., barrier property). The horizontal axis represents the SHAP values, which reflect the impact of each feature on the prediction performance, with the color gradient (pink to green) indicating feature values from low to high. The results show that adhesion has the greatest influence on the barrier property of coating, followed by WCA and glossiness, with yellowness having the least impact. From literature, higher adhesion improves bonding strength, reducing voids and defects at the interface, and significantly enhancing barrier property by preventing moisture and corrosive substances from penetrating. Water contact angle and glossiness, as a measure of hydrophobicity and surface smoothness, respectively, enhance barrier property by minimizing moisture retention and contaminant adhesion. In contrast, yellowness, which relates to coating aging and photostability, has limited direct effects on the barrier property.
Evaluation of the “Environmental Factors-Physical Property-Corrosion Failure” model
In this study, a two-stage modeling framework, “Environmental Factors-Physical Property-Corrosion Failure” was proposed to comprehensively evaluate the anti-corrosion performance of coatings. The framework systematically progresses from environmental factors to modeling changes in physical property, electrochemical behavior, and ultimately the corrosion failure state of coatings, thereby establishing a complete evaluation system.
The framework employs the 4-CORF model to predict the physical properties of coatings using environmental factors as inputs. Leveraging a co-training semi-supervised learning strategy, the 4-CORF model effectively integrates labeled and unlabeled samples, enhancing generalization capability while quantifying the effects of variables such as temperature and humidity on coating physical property. These physical property changes directly influence the electrochemical results of coatings. Subsequently, the optimized AdaBoost model is applied to reveal nonlinear relationships between physical property metrics and the electrochemical impedance modulus, quantifying the specific contributions of each metric to the barrier performance. Based on these predictions, binary classification methods are then employed to assess the coating’s damage state, distinguishing between intact coatings and damaged ones (Table 5), where 1 represents intact coating and 0 represents damaged coating. As shown in Fig. 8, the two-stage model exhibits relatively small prediction errors for the coating performance in both damaged and undamaged states, especially in the damaged state. This demonstrates that the proposed two-stage “Environmental Factors-Physical Property-Corrosion Failure” model significantly outperforms the direct one-stage “Environmental Factors-Corrosion Failure” model (constructed using semi-supervised algorithms), achieving lower relative errors and superior prediction accuracy. This highlights the advantages and robustness of the two-stage framework in capturing complex relationships and improving predictive performance.
The single-stage model directly fits the input environmental variables to the output barrier property results, without considering the influence of complex physical features. Therefore, the single-stage model struggles to accurately capture the nonlinear effects in evaluating the coating’s barrier performance, often leading to significant prediction errors. In contrast, the “Environmental Factors-Physical Property-Corrosion Failure” two-stage modeling strategy effectively captures the influence of physical properties on corrosion processes, enabling a more detailed analysis of barrier property change. It presents lower relative errors and higher prediction accuracy compared to the one-stage model, performing exceptionally well in distinguishing damaged coatings from intact coatings. This physical property-based approach provides a multidimensional and dynamic evaluation of the anti-corrosion performance of coatings, overcoming the limitations of traditional methods that rely barely on environmental variables. This study offers a robust and practical methodology for advancing coating performance research.
In summary, this study constructed a comprehensive predictive model framework, “Environmental Factors-Physical Property-Corrosion Failure” by integrating environmental factors, physical properties, and corrosion failure performance of coatings in different environments. Compared to the one-stage “Environmental Factors-Corrosion Failure” model, the proposed two-stage model demonstrated significant advantages. First, it achieved higher predictive accuracy by effectively reducing the nonlinear complexity between environmental factors and corrosion failure performance during modeling, with a relative error significantly lower than that of the one-stage model. Second, it provided enhanced scientific interpretability by incorporating intermediate physical property parameters. This addition clarified the mechanisms by which environmental factors influence corrosion failure, thereby improving the model’s physical relevance and practical applicability.
Methods
Outdoor exposure tests
In this study, low-carbon steel cold-rolled plates (150 mm × 100 mm × 1 mm) were employed as the substrate material. PU varnish obtained from Macklin Biochemical Technology Co., Ltd. (Shanghai, China) was used to prepare coatings. The steel surfaces were successively polished using 240-, 400-, and 800-grit sandpapers, followed by degreasing with acetone, ultrasonic cleaning, and dehydration with anhydrous ethanol. The treated substrates were then stored in a desiccator prior to use. PU coatings were sprayed onto the steel plates, which were cured at 25 °C and 40% relative humidity for 72 h. The dry film thickness of all coatings was ~90 μm.
During the outdoor exposure tests conducted from 2022 to 2023, a total of 12 environmental parameters were recorded across these nine atmospheric exposure sites. The specific types of environmental data are listed in Table 6.
Data processing and modeling
The modeling workflow adopted this study is illustrated in Fig. 9. In the first stage, environmental variables were introduced into a key factor identification model to identify the most critical environmental factors. Subsequently, a co-training regression algorithm was employed to process unlabeled data, generate pseudo-labels, and expand the dataset, thereby establishing a predictive model that links environmental data to the physical properties of coatings. In the second stage, the dataset was expanded from indoor accelerated experiments. And different machine learning models were applied to construct predictive models that link physical properties to the barrier performance of coatings, as well as to develop binary classification models capable of distinguishing between intact and damaged coatings. Hyperparameter optimization was performed to identify the optimal models and parameters. As a result of these two stages, a comprehensive two-stage model—“Environmental Factors-Physical Properties-Corrosion Failure”—was formulated, offering a structured framework for precisely predicting coating degradation.
In the first stage, the environmental variables and physical properties of coatings were obtained from the nine regions along BRI. The dataset consists of outdoor test data from 9 sites and 100 unlabeled data points.
In the second stage, indoor accelerated experiments were conducted using the outdoor-exposed test specimens as a foundation. And the dataset was obtained from indoor accelerated experiments, which contained multiple cycles of ultraviolet (UV) aging and salt spray tests, generating a total of 37 datasets. Each cycle of UV aging and salt spray test lasted 2 days, and a total of three such cycles were completed. After each cycle, the samples underwent a series of physical property evaluations, including adhesion strength36, contact angle, glossiness9, and yellowness, as well as EIS measurement37. These data were then compiled to form the dataset used for the second-stage analyses. Various machine learning techniques are applied to develop the models. Specifically:
-
(1)
Model selection: various machine learning algorithms, including Random Forest, Support Vector Machine, Gradient Boosting Tree, and Neural Networks, were tested to establish the barrier property prediction model (regression task) and the damage state classification model (binary classification task).
-
(2)
Hyperparameter optimization: hyperparameter optimization and 10-fold cross-validation techniques were employed to systematically fine-tune the hyperparameters of each model38,39, aiming to enhance the prediction accuracy and generalization performance.
-
(3)
Model evaluation: for the barrier property prediction model, R2 was used to evaluate performance on the test set. For the damage state classification model, precision metrics were employed to assess the classification capability40.
-
(4)
Model comparison and optimal selection: the performance of each model was compared to identify the optimal model and corresponding hyperparameter configurations for the prediction and classification tasks, providing a robust foundation for subsequent analysis.
Prediction model of coating physical property
One year-long outdoor exposure experiment was conducted from 2022 to 2023, during which environmental data encompassing 12 factors were collected from nine atmospheric exposure sites. Additionally, unlabeled environmental data covering the same 12 factors were acquired from 100 BRI regions through the National Centers for Environmental Information (NCEI) under the National Oceanic and Atmospheric Administration (NOAA), accessible at https://www.ncei.noaa.gov/data/global-summary-of-the-day/archive/. To identify the key environmental factors, a combination of random forest importance analysis41,42 and Pearson correlation analysis43,44 was employed.
Random Forest is an ensemble learning algorithm that combines multiple decision trees using bagging techniques to enhance prediction stability. The dataset input consists of environmental data \(X=\left\{{X}_{1},{X}_{2},\ldots {X}_{M}\right\}\), where M represents the number of environmental variables. During the training process, certain samples are not selected and known as out-of-bag (OOB) samples. Once the training is complete, the OOB samples are applied to assess the contribution of input variables to the model. Assume the RF model contains T CART trees, the baseline error for the OOB samples of the t-th tree is defined as \({E}_{t}\). By introducing noise to the environmental variable j, a new OOB error \({{E}^{{\prime} }}_{t}\) is calculated. The average contribution of variable j across all CART trees is computed as:
Finally, the average reduction in accuracy for variable j across all CART trees is calculated and normalized to quantify its contribution to the EIS low-frequency impedance of coating.
The Pearson correlation coefficient quantifies the linear correlation between two variables by calculating their covariance and normalizing it to the product of their standard deviations. In this study, we assume that the environmental data exhibit a linear relationship; therefore, the Pearson correlation coefficient is employed to calculate the correlation between two datasets of environmental factors. For two datasets with a sample size of N, the Pearson correlation coefficient r is defined as follows:
where, \(\bar{x}\) and \(\bar{y}\) represent the mean values of datasets x and y, respectively.
Semi-supervised learning integrates both supervised and unsupervised learning by utilizing labeled and unlabeled samples to train a model, thereby reducing the dependence on labeled sample data. The co-training regression algorithm (Co-regression, COREG) is a semi-supervised regression approach that leverages co-training principles to enhance model performance through the incorporation of unlabeled data. In the COREG algorithm, the confidence of pseudo-labeled samples is assessed by evaluating their impact on the model’s error with respect to the labeled dataset. Specifically, before incorporating pseudo-labeled samples into the training set, the model’s error on the labeled data is recalculated. Samples that substantially reduce the error are then selected and added to the training set of a different model. This process continues iteratively until the improvement in error reduction falls below a predetermined threshold or the maximum number of iterations is reached. In this study, RF is used as the underlying regression model within the co-training framework, and the proposed algorithm is referred as Co-Regression RF (CORF).
Prediction model of coating corrosion failure
Three models are applied to compare the predictive performance of corrosion failure for coatings: ANN45, SVR, and AdaBoost46,47. The ANN model, inspired by the structure and function of biological neural networks, is capable of learning complex nonlinear relationships between input and output variables through adaptive weight adjustments. SVR identifies an optimal hyperplane to fit the data, rendering it particularly effective for high-dimensional and nonlinear datasets.
AdaBoost is an ensemble learning method that enhances classification performance by iteratively combining multiple weak classifiers48, typically decision trees49. The algorithm’s key principle is to focus increasingly on samples misclassified by earlier models. By raising the weights of previously misclassified samples after each iteration, AdaBoost guides subsequent weak learners to concentrate on more challenging instances, thereby improving overall model accuracy. The typical process of AdaBoost is as follows: starting with an initial training set (denoted as (X0, y0)), where X0 represents the physical property data of coating, and y0 represents the corrosion failure performance data. This dataset is trained and transformed to generate new training sets. AdaBoost iteratively trains multiple weak models (e.g., weak model1, weak model2, and weak model3). After each iteration, a new training set (e.g., (X1, y1), …, (Xm, ym)) is generated, adjusting the data distribution to focus on the samples misclassified by the previous model. Each weak model generates predictions based on the test data, which are then input into a combination module. The final prediction is a weighted aggregation of the predictions from the weak models, resulting in an integrated output that enhances overall model accuracy. Thus, the AdaBoost algorithm effectively consolidates the outputs from multiple weak classifiers to produce a more robust and accurate predictive model.
Additionally, the SHAP model was employed to analyze the contribution of each input feature to the target variable. SHAP, grounded in game theory, provides a method for interpreting machine learning model outputs. By calculating the marginal contribution of each feature when added to the model, SHAP quantitatively evaluates the impact of input variables on the model’s prediction results. Positive or negative SHAP values indicate whether a given feature positively or negatively influences the prediction, respectively, while the absolute magnitude of these values reflects the degree of the feature’s impact. Specifically, the model’s prediction result \({y}_{i}\) can be decomposed into the average value of the target variable ybase and the sum of the SHAP values of each feature, expressed as:
Here, ybase is the average value of the target variable across all samples, and f(xj) represents the SHAP value of feature xj.
Model evaluation
The two models described above were evaluated using three error analysis methods: RMSE, R2, and Relative Error, to assess the accuracy of the predictions made by the two-stage model. The following formulas are used, where \({y}_{i}\) is the actual observed value, \({ {\hat{y}} }_{i}\) is the predicted value, \(\bar{y}\) is the mean of the actual observed values, and n is the sample size.
-
(1)
Root mean square error
RMSE is utilized to quantify the difference between predicted values and actual values, effectively reflecting the dispersion of data. A lower RMSE indicates a better fit between the model and the data:
$${\rm{R}}{\rm{M}}{\rm{S}}{\rm{E}}=\sqrt{\frac{1}{{\rm{n}}}\mathop{\sum }\limits_{i=1}^{n}{({\hat{y}}_{i}-{y}_{i})}^{2}}$$(5) -
(2)
Coefficient of determination
The R2 value measures the goodness of fit between the predicted values and the actual data. Its range is from 0 to 1, where 1 indicates a perfect fit. A lower R2 value suggests that the model has limited explanatory power for the variables:
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{({y}_{i}-\bar{y})}^{2}}$$(6) -
(3)
Relative error
The relative error is calculated to quantitatively evaluate the accuracy of model predictions:
$${\rm{Relative\; error}}=\frac{|{y}_{i}-{ {\hat{y}} }_{i}|}{|{y}_{i}|}\times 100 \%$$(7)
Data availability
The data presented in this article is available upon request to the authors.
Code availability
The underlying code for this study [and training/validation datasets] is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
References
Sun, Q. et al. Preparation and properties of Mg-Nd binary alloy MAO/SiO2@α-Fe2O3 organic composite coating. npj Mater. Degrad. 8, 55 (2024).
Nazari, M. H. et al. Nanocomposite organic coatings for corrosion protection of metals: a review of recent advances. Prog. Org. Coat. 162, 106573 (2022).
Wu, J. et al. A study on microstructure evolution of MCrAlY coatings after thermal aging in Te environment. Surf. Coat. Technol. 494, 131490 (2024).
Gao, Y. et al. A simple strategy to significantly improve the anticorrosion and aging resistance of epoxy coatings by adding polyaniline modified multi-walled carbon nanotubes. Polym. Degrad. Stab. 230, 111027 (2024).
Qin, P. et al. Study on corrosion behavior of 40Cr steel in 3.5% NaCl solution under the synergistic effect of UV illumination and strain. Corros. Sci. 235, 112199 (2024).
Song, L. et al. The role of UV illumination on the initial atmospheric corrosion of 09CuPCrNi weathering steel in the presence of NaCl particles. Corros. Sci. 87, 427–437 (2014).
Xiang, Y. Effect of long-term aging treatment on the structure and oxidation resistance of Cr coatings under high-temperature steam. Corros. Sci. 212, 110923 (2023).
Miszczyk, A. et al. Accelerated ageing of organic coating systems by thermal treatment. Corros. Sci. 43, 1337–1343 (2001).
Wang, J. K. et al. Towards weathering and corrosion resistant, self-warning and self-healing epoxy coatings with tannic acid loaded nanocontainers. npj Mater. Degrad. 7, 39 (2023).
Balbay, S. & Acıkgoz, C. Anti-yellowing UV-curable hybrid coatings prepared by the sol–gel method on polystyrene. Prog. Org. Coat. 140, 105499 (2020).
Dong, W. et al. Emulsion polymerization of super-hydrophobic acrylate enabled by a novel “ferrying” strategy for developing waterborne coatings with reduced surface tension and glossiness. Prog. Org. Coat. 186, 108073 (2024).
Chen, X. et al. Microstructurally resolved electrochemical evolution of mechanical- and irradiation-induced damage in nuclear alloys. npj Mater. Degrad. 8, 84 (2024).
Li, Y. et al. Decanoate conversion layer with improved corrosion protection for magnesium alloy. Corros. Sci. 70, 229–234 (2013).
Narita, C. & Yamada, K. Development of glossy and UV-resistant urushi coatings by thermal polymerization. Prog. Org. Coat. 186, 108032 (2024).
Zhang, J. & Zheng, Y. Constructing multi-protective functional polyurethane composite coating via internal-external dual modification: achieving superhydrophobicity, enhanced barrier, corrosion inhibition, and UV aging resistance properties. Prog. Org. Coat. 194, 108540 (2024).
Li, Y. et al. Developing an atmospheric aging evaluation model of acrylic coatings: a semi-supervised machine learning algorithm. Int. J. Min. Met. Mater. 31, 1617–1627 (2024).
Lv, Y. et al. Outdoor and accelerated laboratory weathering of polypropylene: a comparison and correlation study. Polym. Degrad. Stab. 112, 145–159 (2015).
Ma, H. et al. Prediction of multilayer Cr/GLC coatings degradation in deep-sea environments based on integrated mechanistic and machine learning models. Corros. Sci. 224, 111513 (2023).
Chen, Y., Liu, Y., Lu, M., Fu, L. & Yang, F. Multi-consistency for semi-supervised medical image segmentation via diffusion models. Pattern Recogn. 161, 111216 (2025).
Pal, M. et al. Ensemble approach of deep learning models for binary and multiclass classification of histopathological images for breast cancer. Pathol. Res. Pract. 263, 155644 (2024).
Liu, Y. et al. Classification enhanced machine learning model for energetic stability of binary compounds. Comput. Mater. Sci. 244, 113277 (2024).
Chang, X. & Zheng, Q. H. Knowledge element extraction for knowledge-based learning resources organization. (ed. Leung, H.) 102–113 (Lecture Notes in Computer Science, 2007).
Sun, H., Xi, Y., Tao, Y. & Zhang, J. Facile fabrication of multifunctional transparent glass with superhydrophobic, self-cleaning and ultraviolet-shielding properties via polymer coatings. Prog. Org. Coat. 158, 106360 (2021).
Zhang, R. et al. Bayesian assessment of commonly used equivalent circuit models for corrosion analysis in electrochemical impedance spectroscopy. npj Mater. Degrad. 8, 120 (2024).
Dehri, I. & Erbil, M. The effect of relative humidity on the atmospheric corrosion of defective organic coating materials: an EIS study with a new approach. Corros. Sci. 42, 969–978 (2000).
Zhang, L. et al. Corrosion behavior of fluorinated carbonyl iron-hydrophobic composites in neutral salt spray environment. Corros. Sci. 210, 110823 (2023).
Chen, Y., Liu, R. & Luo, J. Enhancing weathering resistance of UV-curable coatings by using TiO2 particles as filler. Prog. Org. Coat. 169, 106936 (2022).
Xing, J. et al. Preparation of efficient ultraviolet protective transparent coating by using a titanium-containing hybrid oligomer. ACS Appl. Mater. Interfaces 13, 5592–5601 (2021).
Vu, D., Gigliotti, M. & Lafarie-Frenot, M. Experimental characterization of thermo-oxidation-induced shrinkage and damage in polymer–matrix composites. Compos. Part A Appl. Sci. Manuf. 43, 577–586 (2012).
Omastová, M., Podhradská, S., Prokes, J., Janigová, I. & Stejskal, J. Thermal ageing of conducting polymeric composites. Polym. Degrad. Stab. 83, 251–256 (2003).
Zhi, Y. et al. Improving atmospheric corrosion prediction through key environmental factor identification by random forest-based model. Corros. Sci. 178, 109084 (2021).
Wei, J. et al. Efficient protection of Mg alloy enabled by combination of a conventional anti-corrosion coating and a superamphiphobic coating. Chem. Eng. J. 390, 124562 (2020).
Jorcin, J., Aragon, E., Merlatti, C. & Pébère, N. Delaminated areas beneath organic coating: a local electrochemical impedance approach. Corros. Sci. 48, 1779–1790 (2006).
Finke, A., Escobar, J., Munoz, J. & Petit, M. Prediction of salt spray test results of micro arc oxidation coatings on AA2024 alloys by combination of accelerated electrochemical test and artificial neural network. Surf. Coat. Technol. 421, 127370 (2021).
Gao, J., Hu, W., Wang, R. & Li, X. Study on a multifactor coupling accelerated test method for anticorrosive coatings in marine atmospheric environments. Polym. Test100, 107259 (2021).
Qian, H. et al. Dual-action smart coatings with a self-healing superhydrophobic surface and anti-corrosion properties. J. Mater. Chem. A5, 2355–2364 (2017).
Wang, J. et al. Lurid bolete-inspired damage reporting coating with simultaneous weathering and corrosion resistance: construction strategy and real-time degradation monitoring. Adv. Funct. Mater. 35, 2414620 (2024).
Liu, Z. et al. Predicting the glass transition temperature of polymer based on generative adversarial networks and automated machine learning. MGE Adv. 2, 78 (2024).
Mura, R. et al. HO-FMN: hyperparameter optimization for fast minimum-norm attacks. Neurocomputing 616, 128918 (2025).
Ao, S., Xiang, S. & Yang, J. A hyperparameter optimization-assisted deep learning method towards thermal error modeling of spindles. ISA Trans. 156, 434–445 (2024).
Thakur, D. & Biswas, S. Permutation importance based modified guided regularized random forest in human activity recognition with smartphone. Eng. Appl. Artif. Intell. 129, 107681 (2024).
Oka, K., He, J., Honda, Y. & Hijioka, Y. Random forest analysis of the relative importance of meteorological indicators for heatstroke cases in Japan based on the degree of severity and place of occurrence. Environ. Res. 263, 120066 (2024).
Santiago, J. V., Hata, H., Martinez-Noriega, E. J. & Inoue, K. Ozone trends and their sensitivity in global megacities under the warming climate. Nat. Commun. 15, 10236 (2024).
Li, S. et al. Optimal design of high-performance rare-earth-free wrought magnesium alloys using machine learning. MGE Adv. 2, 45 (2024).
Guo, Y. P. et al. Generative complex networks within a dynamic memristor with intrinsic variability. Nat. Commun. 14, 6134 (2023).
Tao, W. et al. Transformer fault diagnosis technology based on AdaBoost enhanced transferred convolutional neural network. Expert Syst. Appl. 264, 125972 (2025).
Shi, H. et al. Accurate and robust ammonia level forecasting of aeration tanks using long short-term memory ensembles: a comparative study of Adaboost and Bagging approaches. J. Environ. 371, 123173 (2024).
Kage, H. An algorithm for two-dimensional pattern detection by combining Echo State Network-based weak classifiers. MLWA 17, 100571 (2024).
Kong, K.-K. & Hong, K.-S. Design of coupled strong classifiers in AdaBoost framework and its application to pedestrian detection. Pattern Recognit. Lett. 68, 63–69 (2015).
Acknowledgements
This work was financially supported by the National Natural Science Foundation of China (No. 52371049), the Open Research Foundation of Southwest Technology and Engineering Research Institute (grant number HDHDW59CZ), and the National Key Research and Development Program of China (Grant No. 2022YFB3808800).
Author information
Authors and Affiliations
Contributions
W.C.: conceptualization, methodology, writing—original draft. L.M.: supervision, investigation, writing—review & editing. Y.L.: investigation, methodology. D.W.: investigation. K.Z.: investigation. J.W.: investigation, methodology. Z.C.: investigation. X.G.: investigation. Z.L.: investigation. T.C.: methodology. X.L.: writing—review & editing. D.Z.: conceptualization, writing—review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chen, W., Ma, L., Li, Y. et al. Prediction of coating degradation based on “Environmental Factors–Physical Property–Corrosion Failure” two-stage machine learning. npj Mater Degrad 9, 67 (2025). https://doi.org/10.1038/s41529-025-00614-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41529-025-00614-6