Introduction

Corrosion is a widespread and interdisciplinary phenomenon that results in the degradation of metals, posing significant economic and safety challenges across various industries. It undermines the reliability, efficiency, and longevity of structural and functional components in critical sectors such as power generation, oil and gas, and infrastructure. Traditional methods for analyzing corrosion-such as visual inspection, potentiodynamic polarization, and electrochemical impedance spectroscopy-offer valuable insights into the overall corrosion process, particularly in materials like carbon steel1,2,3,4. However, they fall short in revealing detailed surface characteristics like porosity, which play a vital role in the initiation and progression of localized corrosion.

Pores on metal surfaces act as entry points for corrosive agents, significantly accelerating degradation through increased surface area exposure and the formation of micro-environments conducive to corrosion. Studies have shown that pore size, distribution, and inter-connectivity influence the kinetics of corrosion reactions5. Therefore, precise characterization of porosity is essential for predictive corrosion modeling. Advanced imaging techniques such as optical microscopy (OM), X-ray computed tomography, and scanning electron microscopy have made it possible to correlate surface microstructure with corrosion susceptibility6. In industrial applications, such insights facilitate more effective preventive maintenance strategies7, especially in complex systems like steam generators and pipelines.

The advent of digital monitoring and machine learning (ML) offers new opportunities for automating corrosion analysis, reducing reliance on subjective manual inspection. ML enables the analysis of large, complex datasets and the discovery of hidden patterns, making it an ideal tool for predicting corrosion rates, characterizing degradation, and optimizing mitigation strategies8,9. While supervised ML approaches have shown promise in predicting corrosion inhibitor performance and material degradation trends10,11,12, they are often limited by the need for labeled datasets and the risk of overfitting13. Unsupervised ML, by contrast, circumvents these limitations by identifying intrinsic data patterns without requiring prior annotations. This makes it especially valuable in corrosion science, where labeled data are scarce and experimental conditions are variable14.

Recent studies have begun to explore the role of unsupervised ML in materials science, such as in the classification of aluminum alloys15 and quantification of nanopores in oxide films16. However, its application to corrosion characterization remains limited. In particular, under-deposit corrosion (UDC)-a localized corrosion form prevalent in high-pressure steam generators-has not been extensively studied through unsupervised learning. UDC arises due to the accumulation of magnetite deposits that trap chlorides and other ionic contaminants beneath porous surface layers1,17,18,19. These deposits alter local chemistry, such as pH and chloride concentration, and drive aggressive corrosion processes that are difficult to monitor using conventional methods20,21,22,23,24,25.

Common industrial practices for mitigating UDC include reducing chloride concentration in steam generator water, monitoring deposit accumulation, and performing chemical cleaning before critical thresholds are reached26,27,28. However, controlling chlorine levels is often infeasible in large-scale systems, where trace chloride concentrations in the parts-per-billion range are intentionally maintained to promote uniform magnetite layer formation that protects steam generator tubes29. As a result, precise and continuous monitoring of deposit characteristics becomes critical to balance protective deposition with the risk of corrosion. Experimental systems for simulating magnetite deposits have been developed30, yet their real-world durability and behavior remain uncertain. Imaging and quantification tools such as ImageJ31 and MATLAB32 have been applied to assess corrosion features at high temperatures. In this study, MATLAB was selected for image processing due to its compatibility with machine learning workflows, although future adaptations of the algorithm could also be developed as plugins for ImageJ to support wider community use.

Manual classification of UDC stages based on optical microscopy images has been proposed to address this issue, defining four corrosion stages based on deposit layering and porosity33,34. However, this approach is labor-intensive, prone to interobserver variability, and challenging to scale across industrial operations. Automated image analysis, especially when combined with unsupervised ML techniques, offers a more robust, scalable, and objective method for staging corrosion and enabling predictive maintenance. Parameters such as surface texture, deposit thickness, and porosity-readily extracted from OM images-can serve as inputs for ML algorithms that estimate the extent and severity of corrosion33,35.

Recent studies on the progression of under-deposit corrosion (UDC) using optical microscopy have proposed a staging system based on the morphological features of corrosion deposits, particularly porosity and layer structure33,34. This classification allows for more effective preventive maintenance and monitoring in pipeline systems. The UDC stages, as described in ref. 33, are visually interpreted as follows:

  • Stage 1 involves an unfractured scale with a thin protective magnetite layer and a nonporous barrier adjacent to the steel surface;

  • Stage 2 is marked by a porous layer forming on the magnetite, resulting in a double-layer structure with pores concentrated on the outer surface;

  • Stage 3 is characterized by the presence of multilayer corrosion product scales; and

  • Stage 4 exhibits multilayer corrosion products with visible fractures between layers.

In this study, we present an unsupervised, image-based methodology for the automated staging of under-deposit corrosion. The proposed system computes two key surface parameters-”local porosity and deposit thickness-“from optical microscopy images of ex-service steam generator samples. These parameters are further used to estimate the chloride concentration factor and associated pH, both critical indicators of corrosion severity. Our findings demonstrate that deposit thickness and porosity not only reflect the underlying chemical environment but also correlate well with expert-defined corrosion stages. Furthermore, we propose a fully automated staging algorithm based on these metrics, advancing the digitalization of corrosion diagnostics in industrial settings.

The main contributions of this study are as follows:

  • Development and validation of an automated algorithm for thickness measurement and porosity computation, which shows good agreement with manual measurements and domain expertise calculations.

  • Analysis of chloride concentration factor and pH as a function of deposit thickness, revealing a correlation between increasing deposit thickness, higher chloride concentration, and decreasing pH.

  • Demonstration of the relationship between the progression of UDC and the increase in the chloride concentration factor along with the decrease in pH values.

  • Quantification of pH values for different stages of under-deposit corrosion (UDC), particularly for stages 3 and 4, with stage 3 showing minimum pH values between 2.8 and 3.5, and stage 4 between 1 and 1.5.

  • Identification of a potential threshold for transition from stage 3 to stage 4 UDC, based on pH values between 2.8 and 3.

  • Finally, a fully automated algorithm for automated corrosion staging based on OM images was developed using the computed deposit thickness and local porosity from the unsupervised machine learning method.

These findings contribute to a better understanding of the UDC process and provide quantitative data for predicting and monitoring corrosion stages in industrial applications.

Results

All implementations were performed using MATLAB on a Linux workstation with an Intel i9-9900X, 3.3 GHz, 10 cores processor and 128 GB of RAM. Figure 1 shows the input OM image with the calibrated thickness obtained using Eq. (8), the local porosity map obtained using Eqs. (6), the local porosity as a function of thickness obtained using Eq. (7) and the attributes of the sample OM images were considered. As expected, the porosity values were lower on the metal side (x) than on the oxide interface (y). The local porosity map estimated using the proposed method is directly correlated with the visual assessment of porosity throughout the width of the tube deposit. Further, representative images from each stage is shown in Fig. 2 along with the calibrated thickness obtained using the proposed algorithm.

Fig. 1: Representative results obtained using the proposed automated pore characterization on sample OM image.
figure 1

a Input image with calibrated thickness, b local porosity map, and c local porosity computed as a function of thickness and attributes obtained.

Fig. 2: Representative results obtained using the proposed automated pore characterization on sample OM images from each of the four stages.
figure 2

In each column, the top row corresponds to the input image with calibrated thickness shown in the insets. The attributes such as porosity (Por), tortuosity (Tor), local porosity (LPor) and local tortuosity (LTor) corresponding to the respective images are also shown for reference.

Figure 3 shows the confusion matrices corresponding to (a) porosity, (b) local porosity, (c) thickness, and (d) UDC stage classifications based on the voting method. These results corresponded to the 48 sample OM images used by Abitha et al.33. The true stage of the UDC corresponds to the actual stage of the UDC determined by domain expertise33. The diagonal elements in each matrix represent the number of correctly classified samples and the nondiagonal elements represent the number of misclassified samples. Because the thickness-based criterion yielded the maximum number along the diagonal with 72.9% accuracy, it was used for classification in the results presented in this study. The classification methods based on porosity and thickness exhibit relatively lower accuracy, and the voting-based approach tends to produce conservative predictions-often assigning a corrosion stage that is equal to or greater than the actual stage. Given the critical implications of missing a true case of corrosion, the proposed approach was intentionally designed to be conservative, favoring false positives over false negatives. This aligns with our broader goal of ensuring robust detection, even at the cost of overestimation, to enhance reliability in early-stage corrosion monitoring.

Fig. 3: Confusion matrices showing the performance of different methods for predicting Under-Deposit Corrosion (UDC) stages.
figure 3

(a) porosity-based, (b) local porosity-based, (c) thickness-based, and (d) voting method-based classifications. These results correspond to the 48 sample OM images used in ref. 33. The true UDC stage corresponds to UDC stage determined by ref. 33.

Further in the proposed approach, the initial segmentation is performed using an unsupervised deep learning method-k-means clustering-which does not rely on predefined class labels. This segmentation is then used to compute local porosity and deposit thickness. The final classification into UDC stages is carried out based on quantitative thresholds of deposit thickness, rather than through a supervised learning model trained directly on labeled class data. As such, the class distribution arises naturally from the underlying physical measurements, and no supervised loss function or class-based model training was applied. Therefore, class imbalance in the traditional supervised learning sense does not influence the classification performance.

To understand how the thickness and local porosity of the deposit vary with the UDC stage, the deposit thickness and local porosity versus the UDC stage are plotted in Fig. 4a, b, respectively. As expected, the thickness of the deposit and local porosity increased with increasing stages of corrosion. Furthermore, the plots pinpoint the thresholds at which the transition occurred from Stage 3 to Stage 4 UDC. The thickness of the deposit was measured as the maximum value reading for each sample and was free from variation due to spallation. The highest reading in Stage 3 was approximately 90 μm and the lowest reading in Stage 4 was ~95 μm. Similarly, for Stage 3, the median porosity reported was ~30%, which is consistent with the expert analysis of the subject matter reported by Abitha et al.33. Figure 5 shows a 2D histogram detailing the distribution of OM images with respect to the thickness of the deposit and local porosity. As shown in Fig. 5, both the thickness of the deposit and the local porosity increase as the UDC progresses. In Stage 1, the OM images exhibited the lowest porosity and thickness, corresponding to the green peak. Similarly, the peaks corresponding to the subsequent stages shifted toward higher values of the deposit thickness and porosity as the UDC progressed.

Fig. 4: Relationship between deposit characteristics and UDC stages.
figure 4

Plots showing (a) deposit thickness and (b) local porosity versus UDC stage. As expected, the deposit thickness and local porosity increased during the later corrosion stages.

Fig. 5: Intensity distributions of OM images across deposit thickness and local porosity.
figure 5

2D histogram detailing the distribution of OM images with respect to deposit thickness and local porosity.

Figure 6 shows the analysis of the Cl concentration factor (Fig. 6a) and pH (at 286 °C) (Fig. 6b) as a function of the thickness of the deposit. For the thickness of the deposit <30 μm, the chloride concentration factor under the deposits was not high, and the local pH remained within the range of 5.5 to 5.8, which is close to the pH of the bulk water (5.8). However, as the thickness of the deposit increased, the concentration factor continued to increase and the pH continued to decrease (Fig. 6). Figure 7 top row shows the Cl Concentration Factor (Fig. 7a) and pH (at 286 °C) (Fig. 7b) variation across the stages obtained using the manual thickness measurement and porosity computation with domain expertise33 and bottom row shows the Cl Concentration Factor (Fig. 7c) and pH (at 286 °C) (Fig. 7d) variation across the stages computed using the proposed automated algorithm. This shows that the automated algorithm agrees well with manual thickness measurement and porosity computation with domain expertise. The minimum pH value of stage-3 was approximately between 2.8 and 3.5 (Fig. 7d). Similarly, for Stage 4, the minimum pH value was between 1 and 1.5 (Fig. 7d). As shown in Fig. 7, Cl Concentration Factor increases and pH decreases as the UDC stage progresses. Although there is a spread for both measures across stages, a stable threshold to decide the transition from stage 3 to stage 4 is approximately between 2.8 and 3 for the pH values.

Fig. 6: Chloride concentration factor and pH variation with deposit thickness.
figure 6

Graphs depicting (a) the chloride concentration factor and (b) the corresponding pH (at 286 °C) as function of the deposit thickness.

Fig. 7: Chloride concentration factor and pH variation across UDC stages.
figure 7

The top row shows (a) the Chloride concentration factor versus UDCstage and (b) pH versus UDC stage using manual thickness measurement and porosity computation with domain expertise, as reported by Abitha et al.33. The bottom row shows (c) Chloride concentration factor versus UDC stage and (d) pH versus UDC stage obtained using the proposed automated algorithm.

Discussion

This study demonstrated an unsupervised machine-learning-based automated image processing algorithm for corrosion staging in optical microscopy images. The algorithm automatically computed the deposit thickness and local porosity directly, which facilitated the characterization of the corrosion stage. Furthermore, the calculated porosity facilitates the quantitative analysis of the chloride concentration factor and associated pH value. In particular, the observed deposit thickness transition between Stages 3 and 4 aligned well with the estimated pH threshold. This alignment is significant because it suggests a potential link between the deposit morphology, local chemistry, and the progression of the corrosion stage, offering valuable insight into the mechanisms of UDC.

This work further demonstrates ~73% accuracy in the characterization of the UDC stage in ex-service samples. Although this level of accuracy is promising, future work will focus on improving the robustness and performance of the algorithm for more complex corrosion morphologies. In contrast to traditional manual analysis, the proposed unsupervised algorithm offers an important advantage in eliminating human subjectivity associated with repetitive tasks associated with labeled data generation in supervised machine-learning approaches9, paving the way for a broader application of unsupervised techniques in image analysis. Specifically, it demonstrated the feasibility of extracting meaningful quantitative information from complex corrosion images without relying on manual annotation. Using unsupervised machine learning and automated characterization techniques, this study presents a valuable tool for the assessment of UDC, which will ultimately contribute to improved maintenance strategies in large-scale settings.

Methods

Computation of local porosity

The method for computing the local porosity is divided into three parts: (i) pore detection, (ii) estimation of the local porosity, and (iii) characterization of the porosity. The process of characterizing the porosity throughout the thickness of the boiler deposit is illustrated in Figure 8.

Fig. 8: Key steps in characterizing local porosity across boiler deposit thickness.
figure 8

Cartoon picture showcasing the key steps involved in the characterization of local porosity across the thickness of boiler deposit.

K-means36 is an unsupervised clustering algorithm that clusters (or groups) data points according to their similarity score (often the mean squared error) is used for the pores detection. In contrast to other machine learning methods, unsupervised learning reduces dependency on labeled data, which is very rare in corrosion datasets. In a given image, a random cluster of centroids is initially selected, and the cluster centroids are refined over multiple iterations until the convergence criterion is satisfied. As shown in Fig. 9a, there are three clusters: (i) the waterside region, (ii) the deposit, and (iii) the metal pipe onto which the data points can be grouped. Each pixel (data point) has three intensity values corresponding to the Red, Green, and Blue (RGB) color model. An in-depth breakdown of the detection of pores from the optical microscopy (OM) images is shown in Fig. 9.

Fig. 9: Pore detection process in optical microscopy (OM) images.
figure 9

Illustration of the important steps involved in pore detection (a) Original OM image (clusters C1: sample support, C2: deposit region and C3: metal tube), (b) Segmented Image obtained after K-means clustering, (c) blob-filled tube deposit, (d) tube deposit with pores, and (e) detected pores. The operations performed at each step were provided correspondingly at the bottom of the image.

Consider an OM image consisting of N pixels, where each pixel (x1, . . . xN) is represented by vector 3 × 1. The objective was to divide the data into three clusters such that the distance within each cluster was minimized and the distance between each cluster was maximized. For a given data point xn and centroid μk of the kth cluster, the binary variable gnk 0, 1 indicates whether xn belongs to cluster k. In other words, if xn is assigned to group k, gnk takes a value of 1; otherwise, it takes a value of 0. The clustering process aims to minimize the following:

$$J=\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{k=1}^{3}{g}_{nk}| | {x}_{n}-{\mu }_{k}| {| }^{2}$$
(1)

The algorithm starts by assigning random values to each cluster centroid, μk. Initially, objective function J in Eq. (1) is minimized with respect to gnk while keeping μk fixed. Then, J is minimized with respect to μk while keeping the value of gnk fixed. This two-stage process is repeated until convergence is achieved. Subsequently, blob removal37 was performed to fill the small black/white regions in the segmented image S(r, c) (Fig. 9b) to obtain the tube deposit image T(r, c) (Fig. 9c). Here, (r, c) corresponds to the rth row and cth column of the OM image. This is followed by a logical AND operation37 to obtain tube deposit with pores Tp(r, c) (Eq. (2)) as shown in Fig. 9d,

$${T}_{p}(r,c)=\left\{\begin{array}{ll}1\quad \,\text{if}S(r,c)=1,\text{and}\,T(r,c)=1\\ 0\quad otherwise.\end{array}\right.$$
(2)

Finally, pores P(r, c) were extracted using an XOR operation with T(r, c) (Fig. 9c) and Tp(r, c) (Fig. 9d) as shown in Eq. (3). Equation (3) produces an output consisting of the pores present within the deposit region, as shown in Fig. 9e.

$$P(r,c)=\left\{\begin{array}{ll}1\quad \,{\text{if}}T(r,c)=1,{\text{and}}\,{T}_{p}(r,c)=0\\ 1\quad \,{\text{if}}T(r,c)=0,{\text{and}}\,{T}_{p}(r,c)=1\\ 0\quad otherwise.\end{array}\right.$$
(3)

In the local porosity estimation process, a localized porosity map is initially generated by aggregating the porosity values within a predefined neighborhood relative to each pore. To compute the porosity within this neighborhood, a square matrix denoted by M was employed, with all its entries set to 1 and a size of 55 × 55, equivalent to a physical area of 5 μm × 5 μm. This matrix M is then convolved with the pore image P(r, c) and tube deposit image T(r, c), as indicated in Eq. (4), and Eq. (5), respectively.

$${V}_{v}^{local}(r,c)=\mathop{\sum }\limits_{dr=-a}^{a}\mathop{\sum }\limits_{dc=-b}^{b}M(dr,dc)P(r-dr,c-dc)$$
(4)
$${V}_{T}^{local}(r,c)=\mathop{\sum }\limits_{dr=-a}^{a}\mathop{\sum }\limits_{dc=-b}^{b}M(dr,dc)T(r-dr,c-dc),$$
(5)

The edges of P and T are appropriately zero padded. However, \({V}_{v}^{local}(r,c)\) and \({V}_{T}^{local}(r,c)\) are cropped to maintain the same size as P. This step is performed using the conv2d function in MATLAB32. A sample of accumulated pores (\({V}_{v}^{local}(r,c)\)) and accumulated tube deposits (\({V}_{T}^{local}(r,c)\)) in the considered local neighborhood is shown in Fig. 10a, b, respectively. Then the local porosity map/image (Fig. 10c) is computed as

$${\phi }^{local}(r,c)=\left\{\begin{array}{ll}\frac{{V}_{v}^{local}(r,c)}{{V}_{T}^{local}(r,c)}\quad \,{\text{if}}\,P(r,c)=1\\ 0\quad otherwise.\end{array}\right.$$
(6)

Finally, the averaged local porosity is computed as the mean of the local porosity map given in the Eq. (6). Tortuosity was computed using the empirical relation given in ref. 33. Similarly, the local tortuosity was computed using the same empirical relation for tortuosity, but with porosity replaced with the averaged local porosity.

Fig. 10: Local porosity map generation.
figure 10

Example case (from left to right) showing (a) accumulated pores, (b) accumulated tube deposits, and (c) local porosity map obtained using Eq. (6).

To characterize the local porosity along the thickness ϕth(r) of the deposit, the ϕlocal(r, c) in Eq. (6) is averaged across each row r of the local porosity image,

$${\phi }^{th}(r)=\frac{1}{{n}_{C}}\mathop{\sum }\limits_{c=1}^{C}{\phi }^{local}(r,c),$$
(7)

where nC is the number of pores in the rth row. The thickest portion in the deposit on each sample image is identified and the deposit width at that location is estimated as

$$Thickness={n}_{p}\times resolution,$$
(8)

where np is the number of pixels and resolution is the length of the pixels expressed in μm. The resolution of the pixels in this case is 0.11 μm.

Criteria for corrosion stage classification

This study further evaluated the importance of the thickness of the calculated deposit and the local porosity by using them to classify the UDC stages in ref. 33. The classification was performed based on the porosity, which was first computed as the fraction of total pores to the total tube deposit. By analyzing the experimental studies reported by Abitha et al.33, each OM image is assigned a UDC stage based on porosity value as listed in Table 1. The same criteria based on the porosity values were utilized for the averaged local-porosity-based classification. In the thickness-based classification, the UDC stage is assigned according to the thickness values reported in Table 133. Next, for the voting method, if two of the aforementioned criteria agree, the corresponding stage is assigned to the OM image. If none agreed with each other, a thickness-based criterion was used.

Table 1 Criteria for UDC stage classification based on percentage of porosity and thickness

Four boilers operating with all-volatile treatment (AVT) implementation produced 12 coil pieces, each measuring approximately 1 m in length. The OM images used in this study were acquired from a group of boilers operating simultaneously and theoretically identically, all suffering from UDC. Typically, boilers have a helical coil design38,39 with boiling water on the exterior and hot process gas inside. The details of the image acquisition and corrosion stage characterization for UDC are provided in Abitha et al.33.