Machine learning-driven stability analysis of eco-friendly superhydrophobic graphene-based coatings on copper substrate

Mamgain, Himanshu Prasad; Diamanti, Maria Vittoria; Pati, Pravat Ranjan; Mohamed, M. E.; Pandey, Jitendra Kumar; Bhardwaj, Nitin; Vasudeva, Ankit; Kanan, Mohammad

doi:10.1038/s41598-025-18155-y

Download PDF

Article
Open access
Published: 03 October 2025

Machine learning-driven stability analysis of eco-friendly superhydrophobic graphene-based coatings on copper substrate

Himanshu Prasad Mamgain¹,
Maria Vittoria Diamanti²,
Pravat Ranjan Pati³,
M. E. Mohamed⁴,
Jitendra Kumar Pandey⁵,
Nitin Bhardwaj⁶,
Ankit Vasudeva⁷ &
…
Mohammad Kanan^8,9

Scientific Reports volume 15, Article number: 34602 (2025) Cite this article

3180 Accesses
7 Citations
1 Altmetric
Metrics details

Subjects

Abstract

This study inspects the integration of machine learning (ML) techniques with materials science to develop durable, eco-friendly superhydrophobic (SHP) graphene-based coatings for copper. We employed various ML and regression models, including XGBoost, polynomial regression models, Random Forest (RF), K-Nearest Neighbours (KNN), and Support Vector Regression (SVR), to predict the stability of the contact angle (CA) under different stress conditions, such as NaCl immersion, abrasion cycles, tape peeling tests, sand impact, and open-air exposure. Our findings demonstrate that ensemble learning models, particularly XGBoost and Random Forest, outperform traditional regression techniques by effectively capturing nonlinear dependencies between stress parameters and CA retention. Higher-order polynomial regression models also exhibit strong predictive accuracy, making them well-suited for conditions where CA follows a well-defined trend. In contrast, SVR and KNN show limited generalization due to their sensitivity to hyperparameter selection and local interpolation effects, leading to weaker performance in datasets with high variability. ML-based algorithms predict CA values for tested coatings at longer term with respect to experimental tests, and underlined the beneficial effect of graphene incorporation in the coatings to extend the service life and preserve superhydrophobicity, overall reflecting the material’s resilience under mechanical stress. The study highlights the importance of advanced predictive models, such as higher-degree polynomial regression and XGBoost, in capturing the complex relationships between variables influencing coating stability. Additionally, the integration of these models significantly accelerates the design and analysis process by reducing the reliance on time-consuming experimental testing.

Machine learning frameworks to accurately estimate the adsorption of organic materials onto resin and biochar

Article Open access 30 April 2025

Predicting CO2 adsorption in KOH-activated biochar using advanced machine learning techniques

Article Open access 08 July 2025

Application of machine learning techniques to predict the compressive strength of steel fiber reinforced concrete

Article Open access 21 August 2025

Introduction

Superhydrophobic (SHP) surfaces, characterized by a sliding angle (SA) less than 10° and a contact angle (CA) greater than 150°¹, have gained significant attention as their remarkable properties and broad applications, including drag reduction^2,3, self-cleaning^4,5,6,7, corrosion resistance⁸, and water/oil separation^9,10,11. Inspired by natural phenomena such as lotus leaves, SHP surfaces are designed by achieving a rough surface texture combined with surface chemistry modification using low-energy coatings. Various fabrication techniques like chemical vapor deposition¹², sol–gel¹³, chemical etching¹⁴, and electrodeposition¹⁵ have been developed to create SHP surfaces. However, most methods face limitations due to their complexity, high cost, and specialized equipment requirements. Electrodeposition has emerged as a simple and scalable technique for producing SHP coatings on conducting polymers, metal oxides, and metals. This method allows for precise control and cost-effective fabrication of robust coatings, and is particularly relevant for copper substrates, which are widely utilized in numerous applications – e.g., heat conductors, electrical power lin es, and water pipelines. As copper is particularly prone to corrosion in solutions containing chlorides, enhancing the corrosion resistance of copper substrates through SHP coatings is critical for extending their lifespan. Nickel, known for its hardness and corrosion resistance, has been effectively used as a coating material on copper, offering additional benefits when integrated with SHP properties^16,17. Despite these advantages, SHP surfaces often suffer from susceptibility to external damage, low mechanical stability caused by wear and abrasion, which deteriorate the surface micro/nanostructures, and consequent fast loss of superhydrophobicity, particularly in corrosive environments. Indeed, exposure to corrosive agents such as chloride ions, as well as acidic or alkaline environments, accelerates the degradation of the coatings, reducing their effectiveness in enhancing corrosion resistance. Furthermore, thermal and environmental factors, such as UV exposure and temperature fluctuations, also pose challenges to the long-term stability and durability of SHP surfaces⁶.Addressing these challenges requires the development of mechanically and chemically stable SHP coatings¹⁸. In this frame, large attention has been dedicated to the incorporation of graphene in anti-corrosion coatings due to its strength, hydrophobicity, and chemical inertness. However, challenges such as poor substrate adhesion and ineffective hydrophobicity necessitate graphene modification, often achieved through doping with metals or non-metals.

In our previous study, Ni films and Ni-graphene composite coatings were fabricated on copper substrates using the electrodeposition technique, followed by treatment with myristic acid, a sustainable low-energy compound, to create SHP coatings^19,20. Wettability, long term durability, corrosion resistance, mechanical stability, and chemical stability were evaluated for the fabricated coatings¹⁸. To address stability challenges, machine learning (ML) techniques are employed in this investigation. ML models, including classification, regression, and clustering algorithms, are utilized to predict key coating properties such as long-term durability, corrosion resistance, wettability, chemical and mechanical stability. These models analyse experimental data to uncover critical patterns and dependencies between material properties, environmental conditions, and performance metrics. The SHP coatings stability is influenced by multiple factors, including the physicochemical properties of the surface, environmental exposure conditions, and mechanical stresses experienced during operation. Traditional experimental methods utilized to estimate these factors are often labour-intensive, time-consuming, and costly. To address these challenges, ML can become an influential method for forecasting and assessing the stability of SHP coatings. By utilizing data-driven models, ML can reveal complex relationships between coating formulations, processing parameters, and their performance characteristics^21,22,23. Despite the growing interest in SHP coatings, only a few studies were performed on ML to predict the long-term stability and anticorrosion behavior of these coatings. Past research has primarily focused on developing SHP coatings with myristic acid (MA) on various substrates, such as aluminum, copper and Steel^1,24,25. However, there has been a lack of comprehensive studies analysing the effects of process parameters on coating responses. In some cases, researchers have utilized ML techniques such as random forests (RF)²⁶, artificial neural networks (ANN)²⁷, support vector machines (SVM)²⁸, extra trees (ET)²⁹, particle swarm optimization (PSO)³⁰, and genetic algorithms (GA)³¹ to predict the outcomes of process parameters. Barai et al. used ANN models to predict the anticorrosion efficiency of SHP coatings and validated their predictions against experimental data, achieving highly accurate results³². Such studies highlight the potential of ML to optimize process parameters, reduce experimental workload, and enhance the understanding of SHP coating performance, particularly in terms of durability and anticorrosion capabilities. To show the potential of ML, water contact angle data from a previous study³³ were used to validate the ML algorithms developed in this article.

This study builds upon previously published experimental data but introduces a new and significant contribution through the application of machine learning (ML) techniques to analyze and predict the stability of superhydrophobic graphene-based coatings. The novelty lies in the use of multiple ML and regression models such as XGBoost, KNN, Random Forest, SVR, and polynomial regression to assess contact angle (CA) degradation under various environmental and mechanical stress conditions. Unlike prior work, this approach provides a comparative evaluation of model performance tailored to specific degradation mechanisms and reveals valuable insights into the nonlinear and long-term behavior of the coatings. The predictive modeling confirms that graphene incorporation (Ni-G-MA) enhances long-term superhydrophobicity, particularly under mechanical stress, thus offering a new perspective not addressed in earlier studies.

This study advances the current state of research by not only applying a wide range of machine learning (ML) models to predict the degradation of superhydrophobic coatings, but also by offering a comparative, condition-specific evaluation of these models under various real-world stress scenarios for hydrophobicity predication these ML model is never used before this is the first time ML is using for durability predication in such as sand impact, tape peeling, abrasion, and long-term exposure. While previous studies have applied ML in this domain, they often focus on a single model or lack detailed differentiation across degradation mechanisms.

Experimental details

Substrate production and characterization

The working electrode used in the previous work from where data were extracted³³ is a copper plate of dimensions 20 mm × 10 mm × 3 mm. The chemicals used for substrate preparation and coating by electrodeposition include anhydrous ethanol, boric acid, sulfuric acid, nickel chloride hexahydrate, sodium hydroxide, nickel sulfate, and myristic acid. We here briefly report the preparation procedures used. Prior to electrodeposition, copper substrates were sequentially polished using silicon carbide (SiC) abrasive papers ranging from grade 150 to 800, followed by ultrasonic cleaning in a soap solution for 10 min and brief immersion in 0.5 M H₂SO₄ for 1 min. The substrates were then rinsed with distilled water. The electrodeposition bath consisted of NiSO₄ (176 g/L), NiCl₂·6 H₂O (40 g/L), and H₃BO₃ (60 g/L). Electrodeposition of nickel (Ni) films was carried out at an applied potential of 8.75 V using a platinum rod as the anode and the copper substrate as the cathode. For the fabrication of nickel-graphene (Ni-G) films, a graphite rod was used as the anode, and electrochemical exfoliation of graphene was simultaneously achieved at 10.0 V. The exfoliation process was facilitated by the generation of oxygen and hydroxyl radicals from water, enhancing intercalation and delamination of graphene sheets. After deposition, both Ni and Ni-G films were rinsed with distilled water and dried at room temperature for 24 h. Surface modification was performed by immersing the dried films in 0.01 M myristic acid for 15 min, followed by ethanol rinsing and air drying. The resulting films, Ni-MA and Ni-G-MA, were subjected to further characterization and stability analysis³³.Samples were then subjected to mechanical and chemical stress, and coating stability and adhesion were tested through wettability measurements, as previously reported.

Machine learning framework for wettability of Ni-Ma-G coating

In this study, a ML framework is developed to predict the wettability of Ni-MA and Ni-G-MA coatings, utilizing experimental data CA measurements. The goal is to build a predictive model that captures the relationship between various coating characteristics and external factors that influence wettability, including environmental conditions and processing parameters.

The ML model incorporates a range of input features, including the coating deposition parameters and testing conditions such as abrasion cycles, tape peeling cycles, immersion in NaCl, and atmospheric exposure. CA is the primary target variable for the model, as it is the key indicator of coatings SHP properties. Data pre-processing steps were used to confirm the reliability of the model. These steps included normalization, feature selection, and outlier detection. A variety of ML algorithms were tested to determine their suitability for predicting the wettability of experimentally analysed coatings. Among these algorithms, artificial neural networks (ANN), random forests (RF) and support vector machines (SVM), XGBoost³⁴ and regression models³⁵ were evaluated for their ability to accurately predict CA and SA values under various conditions. To determine the model performance and ensure generalization to unseen data, cross-validation techniques were employed. These techniques help prevent overfitting and ensure that the model remains accurate when applied to new, unseen experimental data. Once trained, the ML model can predict the wettability characteristics of coatings under different experimental conditions, such as changes in temperature, voltage, pH levels, immersion time in NaCl solution, and number of abrasion cycles, referring to all mechanical damage tests performed (scratch, tape, sand impact). Additionally, to enhance the model performance, data augmentation techniques based on the introduction of Gaussian noise were applied. These techniques were compared with models trained on the original dataset to determine the effect of data augmentation on the model accuracy and generalization capability.

As above mentioned, the study also delves into the relationship between CA and key environmental factors, including immersion time in NaCl solution, long term durability in atmospheric exposure, pH levels, and the number of abrasion cycles. By examining these relationships, the study objective is to understand the factors that govern the wettability behavior of coatings in different conditions and to predict its changes under varying conditions. A regression model was fitted to further explore the nonlinearities and higher-order interactions between the environmental variables and the wettability of the coatings.

Machine learning using random forest and XGBoost

The XGBoost model, introduced by Chen et al.³⁴, is one of the most advanced and widely used ML algorithms given its exceptional efficiency and performance. At its core, XGBoost leverages the concept of gradient boosting, which is a sequential group learning technique. It constructs multiple “weak learners” (mainly decision trees) in a stepwise manner, each new model seeks to rectify the mistakes of its predecessors. This iterative process enhances the accuracy of predictions by focusing on the residuals (the errors) left by earlier trees.

One of the standouts feature of XGBoost is its ability to handle overfitting through the use of regularization. Unlike traditional gradient boosting methods, which may suffer from overfitting due to the accumulation of many weak models, XGBoost incorporates regularization terms into its objective function. This allows the model to control its complexity, balancing between fitting the training data well and maintaining the ability to generalize to new data. The objective function that XGBoost minimizes is a combination of two components: the prediction error and a regularization term that penalizes overly complex models, preventing overfitting. This is mathematically expressed as³⁴

$$\:L\:=\:{\sum\:}_{i}^{l}\left({y}_{i}^{,}{y}_{i}\right)\:+\:{\sum\:}_{k}^{{\Omega\:}}\:{\Omega\:}\left({f}_{k}\right),\:\:\:\:\text{where}\:\:\:\:{\Omega\:}\left({f}_{k}\right)={\gamma\:}_{T}\:+\:\frac{1}{2}\lambda\:\:{\left|\left|\omega\:\right|\right|}^{2}$$

(1)

Where L is the loss function, $\:{y}_{i}$is the actual value and $\:{y}_{i}^{,}$_i is the predicted value, $\:{\Omega\:}\left({f}_{k}\right)$is the regularization term, which controls the complexity of the model, $\:{\gamma\:}_{T}$ is a parameter that penalizes the number of trees in the model. ω represents the weights of the model and λ is the regularization parameter that controls the weight of the L₂ norm (ridge regression-like penalty) on the model parameter By minimizing this objective function, XGBoost not only improves predictive performance, but also ensures robustness by preventing overfitting.

In contrast, the Random Forest (RF) model operates based on an entirely different philosophy. Rather than optimizing a single function, RF combines the predictions of multiple individual decision trees, each trained on a random portion of the dataset. Each tree in the forest generates its own prediction for a given input, and the final prediction is obtained by averaging the predictions of all trees in the ensemble. Considering the collection of decision trees, denoted as $\:\left\{{T}_{1},{T}_{2},\dots\:,{T}_{n}\right\}$, in the random forest group, each tree, $\:{T}_{i}$ generates a prediction $\:{y}_{i}$, for a given input $\:x\:$. In a regression problem, the random forest ensemble final prediction, y, is usually obtained by taking the average of the predictions made by all the trees.

$$\:y\:=\:\frac{1}{n}{\sum\:}_{i=1}^{n}{y}_{i}$$

(2)

The technique of averaging serves to mitigate the impact of individual tree projections and generates a more consistent and resilient overall prediction³⁵.

Linear regression model

Linear Regression is a supervised learning algorithm used for predicting continuous numerical values. It assumes a linear relationship between the independent variable(s) (features, X) and the dependent variable (target, y). The goal is to find the best-fit line that minimizes the difference between actual and predicted values.

Polynomial regression models

Polynomial regression extends linear regression by modelling the relationship between independent variables (predictors) and the dependent variable (response) as an nth-degree polynomial. This method is especially effective for capturing non-linear relationships within the data, which can provide a more accurate fit for complex patterns that linear models might fail to capture.

In the context of this study, polynomial regression is used to model the wettability of the Ni-MA and Ni-G-MA coating by considering the relation between the environmental and processing factors used in stressing the coatings and the resulting CA.

The polynomial regression equation with one independent variable can be expressed in the general form:

$$y = \beta _{0} + \beta _{1} x + \beta _{2} x^{2} + \ldots + \beta _{n} x^{n} + \in ~$$

(3)

Here, y is the dependent variable, x is the independent variable, $\:{\beta\:}_{0}\:,\:{\beta\:}_{1},\:{\beta\:}_{2},\:\dots\:\:{\beta\:}_{n}$ are the coefficients of the polynomial terms, $\in$is the error term and $\:n\:$is the degree of the polynomial.

The complexity of the relationship that the model can capture is determined by the degree of the polynomial. Polynomial regression models are commonly fitted using the method of least squares estimation, which aims to minimize the total sum of the squared discrepancies between the predicted and observed values. The polynomial terms coefficient is computed to optimize the fit to the data. The coefficients of the polynomial terms quantify the impact of each degree of the independent variable on the dependent variable.

The coefficient$\:\:{\beta\:}_{1}$ indicates the linear relation between the independent and dependent variables, β₂ captures the quadratic relationship, and so on for higher-order terms³⁶.

Support vector model (SVR)

Support Vector Regression (SVR) is a ML model designed for regression tasks, aiming to predict continuous values. It builds upon the principles of Support Vector Machines (SVM), which are primarily utilized for classification, but are adapted in SVR to forecast real-valued outputs. The fundamental concept of SVR involves identifying a function that accurately represents the data while maintaining a specified margin of tolerance (ε), meaning that deviations within this margin are not penalized. The model works by identifying support vectors, which are the data points closest to the regression function and optimizing the function to be as flat as possible while minimizing prediction errors. To address non-linear relationships, SVR employs kernel functions, including radial basis function (RBF), polynomial, and linear kernels. These functions transform the data into higher-dimensional spaces, enabling the identification of linear relationships. The SVR is highly effective in high-dimensional spaces and demonstrates robustness against overfitting. However, it can be computationally intensive, particularly when dealing with large datasets. Additionally, it necessitates meticulous tuning of hyperparameters, including the regularization parameter (C) and ε. Despite these challenges, SVR is widely used in applications like time series forecasting and stock market prediction, where the relationships between variables are often complex and non-linear³⁷.

The goal of SVR is to find a function f(x) that predicts the target variable with minimal deviation from the actual data points, within a margin. The general form of the regression function is:

$$\:f\left(x\right)=w.\theta\:\left(x\right)+b$$

(4)

Where w is the weight vector (the coefficients of the regression model), $\:\theta\:\left(x\right)\:$is the mapping function that transforms the data into a higher-dimensional space through the kernel function and b is the bias term (the intercept).

Multilayer perceptron (MLP)

A Multilayer Perceptron (MLP) is a sophisticated type of artificial neural network constructed for supervised learning tasks, including classification and regression. It comprises several layers of interconnected nodes, or neurons, where each neuron in one layer is linked to every neuron in the subsequent layer. The architecture comprises an input layer responsible for data reception, one or more hidden layers dedicated to information processing, and an output layer that produces the final predictions. In the hidden layers, each neuron takes inputs from the preceding layer, calculates a weighted sum, incorporates a bias, and applies an activation function—like ReLU or sigmoid—to introduce non-linearity. The output layer then produces the final prediction using an appropriate activation function, like softmax for classification tasks or linear activation for regression. During the training phase, the network learns its weights and biases through backpropagation, where the discrepancy between predicted and actual outputs is sent back through the network. Gradient descent is employed to adjust the weights, aiming to minimize this error effectively. The process involves forward propagation, loss calculation, and backward propagation, which are repeated until the model converges. The MLP ability to model complex relationships comes from its use of multiple hidden layers, enabling it to capture non-linear patterns in data³⁸.

Experimental dataset

The electrodeposited coatings on copper exhibited distinct micro-nano structures, with Ni-MA forming dendritic patterns and Ni-G-MA displaying a cauliflower-like morphology). SEM images of the coatings are reported with permission from³³ in Supplementary Figures S1. Contact angle (CA) measurements revealed that both Ni-G-MA and Ni-MA achieved superhydrophobicity (162° and 159°, respectively), compared to pure copper (58°) and nickel-coated copper (24°), with Ni-MA forming dendritic patterns and Ni-G-MA displaying a cauliflower-like morphology. WCA images of the coatings are reported with permission from³³ in Supplementary Figures S2. This behavior aligns with the Cassie-Baxter model, where micro- and nano-scale structuring enhances WCA.

Machine learning results

Linear regression, xgboost, random forest

This study investigates the correlation between superhydrophobicity and several influencing factors using machine learning models. The input variables included: number of days immersed in NaCl solution, pH levels, environmental exposure duration, number of tape peeling cycles, sand impact and number of abrasion cycles. The contact angle (CA) served as the sole output variable.

Various ML algorithms were applied to analyse the data, including XGBoost, Random Forest (RF), polynomial regression, multilayer perceptron (MLP), gradient boosting, support vector regression (SVR), and k-nearest neighbours (KNN). The experimental datasets taken from the previous research¹, and detailed in Tables S1–S12, were used to train separate models for each dataset. Additionally, the importance of each input variable in predicting the CA was evaluated and discussed, highlighting their influence on the SHP performance of the coatings²¹. Linear regression is initially used to capture any linear trends, providing a simple and direct approach to modelling the data. Polynomial regression models are then applied to better capture any nonlinear patterns, offering flexibility to model more complex relationships between CA and abrasion cycles. Additionally, we incorporate two advanced machine learning models: XGBoost, particularly effective in handling large datasets and complex relationships, and Random Forest, which combines predictions if multiple decision trees to improve accuracy.

The study initially utilized the dataset related to abrasion cycles in Tables S1, S2. The objective was to develop regression models using the XGBoost and Random Forest (RF) algorithms to predict CA based on the provided input. The dataset was randomly divided into training and testing subsets using an 80:20 split. This approach ensured that a substantial portion of the data was available for model training while preserving a separate subset for evaluation. Both the XGBoost and Random Forest (RF) models were trained using the training dataset, and their performance was evaluated with the test dataset.

Figure 1a represents the distribution of experimental values (in blue) and values predicted using linear regression, Random Forest and XG Boost ML models for Ni-MA coatings (red, green and yellow, respectively). Key performance metrics were then extracted and reported in Table 1, which included the mean squared error (MSE) and R-squared (R²) values. The MSE measured the average squared differences between the predicted and actual CA values, serving as an indicator of prediction accuracy, while, the R² value represented the proportion of variance in CA that the model explained. Performance metrics for both models are summarized in Table 2, highlighting their effectiveness in forecasting the CA.

Table 1 MSE and R² value by linear regression, XG boost model and random forest model for Ni-MA.

Subjects

Abstract

Similar content being viewed by others

Machine learning frameworks to accurately estimate the adsorption of organic materials onto resin and biochar

Predicting CO2 adsorption in KOH-activated biochar using advanced machine learning techniques

Application of machine learning techniques to predict the compressive strength of steel fiber reinforced concrete

Introduction

Experimental details

Substrate production and characterization

Machine learning framework for wettability of Ni-Ma-G coating

Machine learning using random forest and XGBoost

Linear regression model

Polynomial regression models

Support vector model (SVR)

Multilayer perceptron (MLP)

Experimental dataset

Machine learning results

Linear regression, xgboost, random forest

Data augmentation for XGBoost and RF

Other ML models

Tape peeling and chemical stability datasets

Stability prediction on superhydrophobic coatings

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links