Abstract
The stability of arc bubble is a crucial indicator of underwater wet welding process. However, limited research exists on detecting arc bubble edges in such environments, and traditional algorithms often produce blurry and discontinuous results. To address these challenges, we propose a novel arc bubble edge detection method based on deep transfer learning for processing underwater wet welding images. The proposed method integrates two training stages: pre-training and fine-tuning. In the pre-training stage, a large source domain dataset is used to train VGG16 as a feature extractor. In the fine-tuning stage, we introduce the Attention-Scale-Semantics (ASS) model, which consists of a Convolutional Block Attention Module (CBAM), a Scale Fusion Module (SCM) and a Semantic Fusion Module (SEM). The ASS model is further trained on a small target domain dataset specific to underwater wet welding to fine-tune the model parameters. The CBAM can adaptively weight the feature maps, focusing on more crucial features to better capture edge information. The SCM training method maximizes feature utilization and simplifies training by combining multi-scale features. Additionally, the skip structure of SEM effectively mitigates semantic loss in the high-level network, enhancing the accuracy of edge detection. On the BSDS500 dataset and a self-constructed underwater wet welding dataset, the ASS model was evaluated against conventional edge detection models—Richer Convolutional Features (RCF), Fully Convolutional Network (FCN), and UNet—as well as state-of-the-art models LDC and TEED. In terms of Mean Absolute Error (MAE), accuracy, and other evaluation metrics, the ASS model consistently outperforms these models, demonstrating edge detection capabilities that are both effective and stable in detecting arc bubble edges in underwater wet welding images.
Similar content being viewed by others
Introduction
Underwater welding is an essential process in the installation and maintenance of underwater steel structures. Among various types of underwater welding, underwater wet welding offers advantages such as simpler equipment, lower costs, and broader applicability compared to underwater dry welding and underwater localized cavity welding1. The stability of the arc bubble is a crucial factor affecting the underwater wet welding process and joint quality. Therefore, the analysis and edge detection of arc bubble images are critical for enhancing the quality of underwater wet welding.
Tsai et al. discovered the relationship between the heat loss of underwater wet welding pools and the movement of dynamic underwater bubbles2. A bubble growth model was developed to predict its dynamic behavior. Yurioka et al. proposed that the thermal cycles applied during welding could alter the mechanical properties of the base metal and that the heat-affected zone of the weld would typically increase in hardness and decrease in toughness3. Guo et al. studied underwater wet arc stability using the X-ray transmission method4, demonstrating that a larger bubble size can improve the stability of the welding process. Wang et al. analyzed the effects of controllable arc bubbles on the stability of the welding process and the microstructural evolution of the joint5. The results suggest that by controlling arc bubble detachment, a more stable wet welding process can be achieved, thereby providing a new method for improving the underwater wet welding process. Wang et al. further analyzed the relationships between bubble geometries and welding heat input using a high-speed camera and a dysprosium lamp to study the growth and detachment of bubbles6. This research offers a novel approach to advancing wet welding technology. However, their experiment in an open tank did not replicate the deep-water environment in a closed pressure tank. Moreover, the underwater welding images were only processed offline using a simple image processing method, and no online detection method for underwater bubble edges was proposed.
Deep learning techniques have achieved significant advancements in the field of image processing, enabling computers to learn and understand features within images, thereby achieving outstanding performance in tasks such as image classification, object detection, and semantic segmentation. For example, Liang et al. proposed an innovative low-light image enhancement approach utilizing an Adaptive Frequency Decomposition Network (AFDN)7, which effectively addresses visibility and quality issues in low illumination conditions by adaptively decomposing and enhancing the frequency components of the images, leading to significant improvements in both visual clarity and detail recovery. Lu et al. developed a denoising model to remove multiple noises in images, enhancing detection accuracy8. They utilized gradient denoising to locate defects more accurately and employed a generative diffusion model for higher-quality image reconstruction. Chen et al. introduced an image inpainting approach that utilizes a multi-scale feature module combined with an attention mechanism to effectively fill in missing parts of an image, enhancing both the accuracy and visual quality of the reconstruction9. Chen et al. also presented a deep neural network-based image inpainting algorithm that leverages an attention mechanism to focus on significant image features, resulting in improved inpainting performance and more realistic image completion10. Additionally, Chen et al. proposed a super-resolution method for images that employs a multi-level information compensation strategy alongside a U-net architecture, achieving higher resolution images by effectively upscaling and enhancing details11.
Image edge detection methods based on deep learning, such as HED12, DeepNet13, and EGNet14, represent a research hotspot in the field of image edge detection. Wang et al. proposed a rectangular convolution pyramid and edge enhancement network called RENet for accurate and robust pavement crack detection15. Experiments conducted on two pavement crack datasets demonstrate that the proposed framework outperforms other state-of-the-art algorithms in terms of robustness and universality. Xiao et al. adopted the principle of convolutional neural networks (CNNs) and a Laplacian filter to propose an edge detection model with improved performance16. The proposed CNN model successfully detects fuzzy defects in noisy X-ray images and presents better structural similarity of the detected information. In the field of welding image processing, due to the influence of arc light, dust, and splashes during the welding process, welding images often have issues such as uneven gray levels and blurred edges, making it difficult to extract edge features using conventional methods. However, image edge detection methods based on deep learning have achieved promising results17. Ma et al. utilized a combination of a CNN and laser vision to detect typical weld defects on galvanized steel sheets18, but the types of detectable defects were limited. Wang et al. developed a deep learning algorithm for monitoring the width of the welding sedimentary layer19. This control algorithm can effectively monitor and control weld width online. Ganguly et al. proposed an edge detection method for workpieces to be welded, which allows operators to guide a robotic tool along the surface of the workpieces through a haptic-visual interface, recording the positions of the tool tip, and automatically detecting the workpiece edge using a RANSAC (Random Sampling and Consensus) type algorithm20. However, the method may require further adjustments to accommodate different types of welding joints. Zou et al. proposed an improved Canny algorithm for detecting welding pool edges in complex environments21. This method utilizes a hybrid filter that combines dark channel prior defogging with bilateral filtering for image denoising and employs an improved quadratic Otsu algorithm to obtain dual thresholds, thereby achieving more accurate extraction of welding pool edge information. However, the method requires further parameter adjustments to maintain robustness and accuracy in various actual welding scenarios.
Deep learning models typically require large datasets for training. The scarcity of data can lead to overfitting during the training process, limiting the model’s ability to generalize well and perform effectively on new, unseen data5. Consequently, ensuring a sufficient volume of data is crucial for developing a robust and reliable deep learning model. Numerous public datasets exist for common visual tasks in natural images, such as pedestrian detection, license plate recognition, etc. However, due to the difficulty of acquiring high-quality welding images in an industrial setting and the limited number of researchers in welding image analysis, there is currently no publicly available dataset in this field. In weld image processing based on deep learning, therefore, only limited sample data is typically available. Transfer learning is a popular research topic in machine learning. It is applied to the training of neural networks to address the small sample problem of welding image datasets. For instance, combining transfer learning and MobileNet, Pan et al. developed a new welding defect model22, achieving a prediction accuracy of 97.69%. Kumaresan et al. developed a method based on CNN and transfer learning that enhanced the training performance of neural networks, shortened training time, and enabled the classification of welding defects23. Jiao et al. introduced an algorithm based on Residual Neural Networks (ResNet) using transfer learning24, which decreased the training time and improved the accuracy of weld penetration prediction. Wang et al. applied the MobileNetV2-based transfer learning model to fit a custom dataset and recognize weld penetration states25. Bahador et al. presented a novel application of transfer learning for tool wear detection in turning processes using a one-dimensional (1D) CNN26. The transfer learning model significantly increased the tool wear classification accuracy using a single-axis low-cost MEMS accelerometer from 58 to 85%. Moreover, the proposed model using transfer learning significantly reduced the amount of data required for model development.
In recent years, numerous researchers have processed images containing bubbles. Yang et al. introduced a vision detection technology to identify sapphire bottom bubbles27, offering a technical solution for obtaining high-quality sapphire crystals. However, since the primary purpose of this method was to discern the presence of bubbles in sapphire, the precision of bubble edge detection was not high. Fu et al. proposed a Bubble Generative Adversarial Network (BubGAN)28, which significantly improved the accuracy of bubble image processing compared to the traditional GAN method. Zhao et al. designed a bubble machine vision detection system for the transparent layer of a quartz crucible, employing a morphological contour extraction algorithm to detect bubbles, achieving higher detection efficiency29. However, this method was only applicable to bubbles with more regular shapes. Saha et al. developed an algorithm to detect bubbles from pseudo-occlusion30. The algorithm employed for bubble detection comprises Adaptive Threshold-based Image Segmentation, which includes segmentation, edge detection, segment grouping, and ellipse fitting. Sadeghi et al. proposed a wavelet transform-based method for segmenting and quantifying bubble regions in small bowel capsule endoscopy images, effectively identifying and measuring bubbles, thereby improving the diagnostic accuracy of small intestine diseases31. However, the method was sensitive to the selection of wavelet transform parameters, which can affect the precision of segmentation. At present, there is no research focusing on the edge detection of arc bubbles in underwater wet welding.
In conclusion, there is currently a lack of research on using deep learning for edge detection of underwater wet welding bubbles, and there are no publicly available datasets for underwater wet welding bubble images. Additionally, traditional algorithms often produce edge detection results that are blurry and discontinuous. This study successfully detects bubble edges in the arc area of underwater wet welding using deep learning-based image edge detection. The findings of this study are significant for understanding the arc mechanism involved in underwater wet welding, controlling the welding process, optimizing the process parameters, and ultimately improving the overall quality of underwater wet welding.
The main contributions of this paper are as follows:
-
1.
The proposed ASS model consists of a CBAM, SCM, and SEM, which effectively detect the edges of arc bubbles in the underwater wet welding process.
-
2.
The CBAM within the model enhances the perception of local details, thereby better capturing the details of arc bubble edges and improving edge clarity and accuracy.
-
3.
The SCM within the model extracts and trains multi-scale convolution features, maximizing the use of feature data and simplifying the training process.
-
4.
The SEM within the model effectively resolves the problem of semantic loss in the high-level network during the training process and improves the accuracy of edge detection.
Underwater wet welding image acquisition system
The underwater wet welding image acquisition system is a comprehensive setup designed to replicate the challenging conditions encountered in underwater welding environments. This system comprises several key components to ensure the precision and effectiveness of the welding process, as well as the clarity of the visual data captured.
Central to this system is the underwater pressure simulation cabin, as depicted in Fig. 1. This robust chamber, engineered to withstand a design pressure of 1.2 MPa, can accurately simulate the substantial pressures experienced at depths of up to 100 m below the water’s surface. With a volume of 6.5 cubic meters, the cabin provides sufficient space to accommodate the intricate components of both the visual and welding systems, ensuring their optimal performance under simulated underwater conditions. The welding system includes a welding power source, a wire feeder, and underwater self-protected flux-cored wire. The welding power source used is a Panasonic YD-500KR, capable of delivering a maximum welding voltage of 60 V and a peak welding current of 500 A, suitable for a wide range of welding requirements.
Regarding the visual system, as illustrated in Fig. 2, the core component is the NIT MC1003 industrial camera, which boasts a logarithmic dynamic range of up to 140 decibels. This camera is designed to capture images with exceptional clarity, even under the challenging lighting conditions typical of underwater environments. To facilitate its operation and ensure optimal image capture, the camera is paired with a versatile actuator providing three degrees of freedom, allowing for precise positioning and adjustment to accommodate various angles and distances, thus capturing the entire welding process comprehensively.
A crucial aspect of the visual system’s design is the choice of material for the water tank’s inner wall. Since the camera is positioned outside the tank and images are captured through the tank’s inner wall, the material’s transparency and optical properties are of paramount importance. Ordinary glass lacks the clarity necessary for meeting stringent experimental requirements. To address this, the inner wall is crafted from high-quality quartz glass, known for its excellent optical conductivity over a wide wavelength range. This choice ensures that the captured images are not only clear but also accurately represent the underwater welding process.
In summary, the underwater wet welding image acquisition system is a meticulously designed and engineered solution that serves as a reliable and efficient tool for simulating and studying underwater welding conditions.
Figure 3 illustrates the process of bubble formation and bursting during underwater wet welding. The positions of the wire, bubbles, and arc during the welding process are shown in the figure. The underwater welding bubble forms between the arc and the end of the welding wire, which serves as the target for edge detection. Initially, bubbles form at the welding point due to the high temperature and ionization effects of the arc, as well as chemical reactions that occur during the welding process. As the arc continues to generate heat, the bubbles gradually increase in size and rise away from the welding point. Eventually, the bubbles burst as internal pressure increases; however, new bubbles continue to form due to the ongoing welding process.
Bubble edge detection method
Transfer learning is an advanced machine learning approach that leverages existing knowledge to address problems in different but related fields32. Based on their technical characteristics, transfer learning methods can be categorized into feature selection-based transfer, instance-based transfer, and model parameter-sharing-based transfer33. The parameter-sharing-based transfer learning method exploits the correlation between source domain model parameters and target domain tasks. Consequently, the structure and parameter values in the source domain model are used as initial conditions for the target domain model, enabling the target model to achieve better generalization with a limited sample size of target domain data.
The flowchart of the arc bubble edge detection method based on deep transfer learning is shown in Fig. 4. In this study, VGG16 is used as the source domain model, while the ASS model serves as the target domain model. The large public dataset ImageNet is employed for pre-training the source domain model, and its parameters are subsequently transferred to the target domain model. After image pre-processing, the acquired underwater welding images are input into the target domain model, which incorporates prior domain knowledge for training. Image pre-processing steps include edge enhancement (linear enhancement) and image denoising (bilateral filtering). The preprocessed images are fed into the model for prediction, resulting in the final edge extraction.
Image pre-processing
To achieve optimal results in bubble edge detection, the edges in the welding images are enhanced through image pre-processing before being input into the training model. First, the RGB images are converted to grayscale. Then, a piecewise linear transformation is applied to enhance the bubble edges. However, this transformation not only emphasizes the bubble region but also amplifies noise. To mitigate this, a bilateral filtering algorithm is applied to denoise the image34. As shown in Fig. 5, after pre-processing, the upper and lower edges of the bubbles in the welding image become more pronounced, while the noise is significantly reduced.
Deep transfer learning model
Building on prior work in deep learning35,36, we designed an edge extraction model for underwater wet welding images using deep learning and transfer learning techniques. The structure of this model is depicted in Fig. 6. The ASS model utilizes a pruned VGG16 as the backbone, incorporates a Convolutional Block Attention Module (CBAM)37 at the shallow layers, and constructs a Scale Fusion Module (SCM) consisting of five different scale features. Additionally, a Semantic Fusion Module (SEM), composed of four skip structures, is employed to fuse all features. From low to high levels, the SCM outputs are labeled o1, o2, o3, o4, and o5, with the model outputs labeled as Predict. During model training, the backbone is pre-trained, and then the constructed deep transfer learning model undergoes fine-tuning.
The CBAM integrates channel and spatial attention mechanisms to enhance the feature representation capability of CNN. In the channel attention module, each channel of the feature map is treated as a feature detector, and the model learns channel weights to determine the importance of different channels for specific features. This mechanism focuses more on channels that are beneficial for the current task, thereby extracting more discriminative features. Conversely, spatial attention emphasizes the “where” aspect of effective information within the feature map, placing greater weight on spatial positions that contribute to the task. CBAM infers attention maps along two independent dimensions (channel and spatial) and then multiplies these maps with the input feature map for adaptive feature optimization. Since the model can only extract basic low-level features such as edges and textures in the shallow layers, CBAM is applied at the shallow layers to enhance the perception of local details, thereby better capturing arc bubble edges and improving edge clarity and accuracy.
The SCM simultaneously extracts and trains multi-scale convolution features, which maximizes the use of feature data and simplifies the training process. This module employs direct fusion of the same scale feature map, whereas RCF fuses all features after convolution. In RCF, all convolution features undergo another convolution operation with 21 channels, making the model overly complex and requiring a large number of model parameters to be trained. An overly complex model often results in insufficient training, especially when training samples are limited, making it challenging to improve accuracy. Furthermore, when there are too many parameters to train, the benefits of transfer learning based on prior knowledge cannot be fully realized, leading to slow or difficult convergence. Therefore, SCM avoids these issues through direct fusion. The architecture order is as follows: perform a Concatenate operation to fuse the convolution features of the same scale as the backbone, perform a Convolution operation on the fused features with a channel number of 1, perform a Deconvolution operation to recover features to their original scale, and obtain different scale feature maps using a loss function.
Visualization is accomplished by outputting feature maps at various scales, as shown in Fig. 7. The VGG16 model introduces multiple pooling layers, which lead to substantial spatial information loss due to the large step size during deconvolution, resulting in low-resolution output from the network’s high layers. This loss of resolution significantly impacts image quality. In contrast, the high-level feature map of a CNN retains more semantic features, enabling better filtering of texture edges within the target area compared to low-level features, resulting in outputs that are more focused on the target contour. To leverage the strengths of both low-level and high-level semantic information, the ASS model employs the SEM, consisting of four skip structures. This module allows for local prediction while maintaining global prediction, addressing the high-level semantic loss in the skeleton network, and generating high-quality images. Before each scale pooling, the module extracts convolution layer features as input. The construction sequence involves applying Convolution, Batch Normalization, and Activation to the high-level convolution features. Then, a Concatenate operation is performed with the next layer’s convolution features to iterate the process until the convolution features are restored to the original scale. The recovered features are then convolved with a channel number of 1, fused with the SCM output features, and fed into the loss function. The model is iteratively trained to produce the optimal prediction results.
Training method of deep transfer learning model
The overall training process of ASS model is shown in Fig. 8 and consists of two stages. First, the VGG16 feature extractor is pre-trained on a large source domain dataset, ImageNet. During this stage, the activation value \(\:{X}_{b}\) of the source domain dataset is input into the feature extraction function \(\:f\left(\theta\:\right)\). The extracted feature is then input into the classifier function \(\:c\left(.|{\omega\:}_{n}\right)\). The outputting image activation value \(\:{X}_{pred}\) is input into the Loss function \(\:{L}_{pred}\) to continually train to obtain the optimal model. By sharing the feature extractor parameters with the ASS model, the training time of the model is significantly reduced, and its generalization ability is improved. In the second stage, the pre-trained feature extractor is combined with the CBAM, SCM and the SEM to construct a deep transfer learning model, which is then trained on a small target domain dataset of underwater wet welding images. During this stage, the activation value \(\:{X}_{n}\)of the target domain dataset is input into the feature extraction function \(\:f\left(\theta\:\right)\), and the features that are more useful for arc bubble edge recognition are determined by CBAM. The extracted feature \(\:{X}^{k}\) and \(\:{X}^{z}\), are then input into the SCM and the SEM, respectively. Finally, the outputting fusion feature image activation value \(\:{X}^{fuse}\) from two modules is input into the Loss function \(\:L\left(W\right)\)to continually train to obtain the optimal model.
The training process of the ASS model mainly includes forward propagation and backward propagation. The forward propagation involves extracting convolutional features from VGG16, optimizing and selecting the features using CBAM, and then inputting them to the SCM and SEM. The outputs of both modules are then combined to predict the edges of underwater welding bubbles. In the backward propagation, the results of forward propagation predictions are compared with the labels, and the global minimum of the loss function is continuously sought using the Adam optimizer.
CBAM weights the extracted features in both channel and spatial dimensions. It generates a one-dimensional channel attention map Mc (Mc ∈ C×1 × 1) and a two-dimensional spatial attention map Ms (Ms ∈ 1×H×W). Here, C represents the number of channels, H is the height of the feature map, and W is the width of the feature map. The input features are denoted as F, while the output features are represented as F’, as shown in Eqs. (1), (2) and (3) respectively.
SCM extracts convolution features from VGG16, performs successive Concatenate, Convolution, and Deconvolution operations, and then inputs the feature map into the Loss function. Thus, the forward propagation formula of the module is as follows:
Where \(\:\left|{l}_{k}\right|\)is the pixels number of the k-th scale feature map, k is the number of different scale features (equals to 5 here). k = 1 represents that the output image scale is equal to the image original scale. Therefore, \(\:{X}^{k}\) is the activation value of all neurons in the k-scale feature output image; \(\:{X}_{i,j}^{k}\) is the activation value of the i-th neuron of the extracted j-th feature map; and \(\:{W}_{i,j}^{k}\) is the learnable parameter of the neuron. \(\:trans\left\{\bullet\:\right\}\), \(\:conv\left\{\bullet\:\right\}\) and \(\:concat\langle{\bullet}\rangle\)are the deconvolution function, convolution function, and fusion function, respectively.
Before the Pooling layers, SEM extracts convolution features from VGG16 and performs Deconvolution, Batch Normalization, Activation, and Concatenation operations in succession. The output of the features of the SEM is then fused with the SCM features and the results are input into the loss function to obtain the model prediction result. The forward propagation formula of the module is as follows:
Where z is the number of skip connections (equals to 4 here). Where z is the number of skip connections (equals to 4 here). \(\:\left|{l}_{z}\right|\) is the number of pixels in the input feature map of the z-th skip connection, and \(\:{X}_{i}^{z}\)、\(\:{W}_{i}^{z}\) are the activation value and learnable parameters of the i-th neuron of the input feature map of the skip connection, respectively. Similarly, \(\:\left|{l}_{z+1}\right|\) is the number of pixels in the input feature map of the (z+1)-th skip connection, and \(\:{X}_{i}^{z+1}\)、\(\:{W}_{i}^{z+1}\) are the activation value and learnable parameters of the i-th neuron in the input feature map of the skip connection; \(\:{X}^{s}\) is the semantic fusion feature output image all neuron activation value; \(\:bn\left\{\bullet\:\right\}\) is the regularization function, and \(\:relu\left\{\bullet\:\right\}\) is the ReLU activation function, respectively; \(\:{X}^{fuse}\:\)is the activation value of all feature fusion output feature maps.
Edge detection can be formulated as a pixel-level binary classification problem38. Because the proportion of edge pixels is much lower than that of non-edge pixels, using a standard loss function can cause the supervised learning process to favor the segmentation of a large proportion of non-edge pixels, resulting in model instability and poor training outcomes. Moreover, there is a greater disparity between the number of edge and non-edge pixels in underwater welding image edge detection. In To further balance the edge pixels and non-edge pixels, a parameter λ must be introduced. Consequently, the ASS model calculates the loss of each pixel relative to the pixel label at each size-side output layer and then performs backward propagation. The formula for backward propagation is as follows:
Where \(\:\left|{Y}^{+}\right|\)、\(\:\left|{Y}^{-}\right|\) represent number of edge pixels and non-edge pixels in the image, respectively. The hyper-parameter λ is to balance edge and non-edge pixels. \(\:{X}_{i\:}\)is the activation value at pixel i; \(\:{Y}_{i}\) is the probability that pixel I is an edge point in the label; W represents learnable parameters in the neural network; \(\:P\left\{\bullet\:\right\}\) represents the \(\:sigmoid\) activation function.
Where \(\:{X}_{i}^{\left(k\right)}\) is the activation value of the i-th pixel in the output image representing the k-th scale feature. \(\:{X}_{i}^{fuse}\) is the activation value of the i-th pixel in all feature fusion output images. \(\:\left|l\right|\) is the total number of pixels per image. \(\:{W}_{i}^{n}\)is them i-th pixel learning parameter in the feature map updated by backward propagation, and \(\:{W}_{i}^{o}\)is the i-th pixel learnable parameter in the feature map updated by input backward propagation. α is the learning rate, which is related to the Adam optimizer and gradually decreases with the training process.

Model test and analysis
The training and testing were conducted on Intel i9-9900Kwith 64G RAM and an NVIDIA GeForce RTX 3080Ti GPU.
Comparison of other models
To comprehensively evaluate the performance of our proposed model, we compared it with several commonly used models in edge detection—RCF, FCN8s, and UNet—as well as state-of-the-art (SOTA) models LDC39and TEED40. We employed a self-constructed underwater wet welding dataset for training, selecting images from the entire bubble growth process as the detection targets. Figure 9 presents the detection results of our model compared to others.
Figure 9 clearly shows that, compared to the standard models, UNet often failed to accurately extract bubble edges in most cases; FCN8s was able to detect edges but often produced unclear, serrated edges that did not accurately represent the true shape of the bubbles; RCF could identify correct edges to some extent but its performance degraded in transitional areas affected by arc light, small bubbles, and impurities, particularly during the early stages of bubble formation. In comparison with SOTA models, LDC’s edge detection results contained many artifacts and unclear edges; TEED performed well in identifying the upper edges of bubbles, but struggled with the lower edges, especially in the later stages of bubble growth, where new small bubbles and arc light caused misidentification of these features as continuous edges.
In contrast, our model excels in detecting bubble edges with improved continuity and clarity. Specifically, in challenging areas where arc light intersects with bubbles, our model demonstrates superior recognition performance. To address discontinuities in the lower edges of bubbles detected by the model, we employed least squares fitting to refine the lower edges in the images, achieving more precise results. In the final detection images, the upper edges of the underwater welding bubbles are marked with blue lines, and the lower edges are marked with red lines.
In summary, this experiment confirms the excellent performance of our ASS model in detecting arc bubble edges in underwater wet welding. It also indicates that even models that perform well under normal conditions may encounter challenges in specific domains, requiring further targeted optimization and adjustment.
Three standard metrics—F-measure, Mean Absolute Error (MAE), Accuracy, and Pixel Error (PE)—were utilized to quantitatively assess the edge detection capability of the proposed method.
The F-measure is the harmonic mean of average precision and recall. Based on the mathematical relationship between average precision, average recall rate, TP, FP, and FN, the following is the formula for F-measure:
Where \(\:{\beta\:}^{2}\)is equal to 0.3, TP is the number of correctly identified defect samples, FP is the number of non-defect samples that are incorrectly identified as defects, and FN is the number of incorrectly identified defect samples. A high F-measure indicates a good defect detection ability.
MAE41 measures the average difference between prediction and ground-truth maps. A low MAE suggests high method stability. Let P and R be the saliency map and ground truth values, respectively. The formula for MAE is:
The row and col represent the row and width of the image, respectively. The accuracy is calculated as:
PE represents the number of incorrectly predicted pixels divided by the total number of pixels. A lower PE indicates that the model’s prediction is closer to the true mask, suggesting better model performance.
As shown in Table 1, we evaluated and compared the ASS model with other edge detection models using MAX F-measure (MAXF), MAE, Accuracy, and PE. Our method demonstrates strong performance on underwater wet welding images (228 images and masks). Even the SOTA models LDC and TEED did not outperform our model on this dataset. Other models achieved a MAXF of approximately 0.91, with precision not exceeding 0.9. However, our model achieved a MAXF of 0.95 and a precision of 0.93, both significantly higher than the other models, indicating optimal edge extraction. Our model also obtained the best scores for MAE and PE, with 0.116 and 0.077, respectively.
The accuracy of the other models and our model on the underwater wet welding image dataset during training and validation are presented in Fig. 10. The change curve of the ASS model over 50 epochs is consistently higher than the other curves. When the model accuracy curve stabilizes, the ASS model accuracy is approximately 10% higher than FCN8s, 8% higher than RCF, 6% higher than UNet, 7% higher than LDC, and 9% higher than TEED. During model training, the ASS model demonstrated high stability and accuracy, with a rapid convergence speed. The overall training process of the ASS model is superior to other models.
Ablation study analyzing
We conducted ablation experiments to analyze our model. The detection results of our model without using CBAM and without using SEM are shown in Fig. 11, and the corresponding evaluation metrics are presented in Fig. 12.
Without the inclusion of SEM, the model that only performs multi-scale fusion loses significant high-level semantic information, resulting in inaccurate detection of arc bubbles. After adding SEM, the regions of arc bubbles can be roughly identified, but further improvement is needed for detection accuracy in detailed areas. The integration of CBAM allows the model to focus on more critical edge features, thereby enhancing the accuracy of edge detection in complex scenes. The precision curve graph also demonstrates that the utilization of SEM and CBAM improves the model’s accuracy by approximately 10%.
Generalization validation
We used the Berkeley Segmentation Data Set 500 (BSDS500) to validate the generalizability of the model. BSDS500 is a standard benchmark for contour detection. This dataset is designed for evaluating natural edge detection that includes not only object contours but also object interior boundaries and background boundaries. We compared our ASS model with other edge detection models and calculated common evaluation metrics, as shown in Table 2. The performance of the ASS model is significantly higher than that of the commonly used UNET and RCF, with higher MAE, accuracy, and PE metrics. However, compared to the SOTA models LDC and TEED, our model performs slightly lower. Therefore, while our model’s generalization is higher than that of common edge detection models, it is still below that of SOTA models and requires further improvement.
Additionally, our model has a total parameter count of 14,096,688, with a size of approximately 53.74 MB. The average prediction time per frame for the ASS model is 93 ms, which meets real-time requirements assuming high accuracy.
Conclusion
In this paper, we propose an advanced transfer learning model—ASS model—specifically designed for detecting arc bubble edges in underwater wet welding processes. This innovative method effectively addresses the issues of scarce underwater wet welding image samples and unclear bubble edge detection, helping researchers more accurately grasp and monitor the dynamic changes in the underwater wet welding process, including the stages of bubble generation, growth, and detachment.
Our ASS model utilizes a transfer learning approach based on model-sharing parameters to pre-train feature extractors and acquire valuable prior knowledge. This strategy not only mitigates the issue of insufficient training samples but also significantly enhances edge feature recognition accuracy through the incorporation of an attention mechanism. Specifically, the model integrates CBAM along with the proposed innovative SCM and SEM, which together maximize the utilization of feature information and simplify the training process. Notably, the skip connections within the SEM effectively integrate low-level and high-level semantic information, addressing the problem of traditional networks losing high-level semantic details.
In the experimental evaluation, we compared the performance of the ASS model with other commonly used edge detection models—RCF, FCN8s, and UNet—as well as state-of-the-art models LDC and TEED. The results demonstrate that the ASS model achieves superior performance across two different datasets, providing edge detection capabilities that are both effective and stable compared to other models. The detected arc bubble edges are not only more precise but also exhibit clearer image quality, which is critical for a deeper understanding of the arc mechanism and bubble behavior in underwater wet welding processes. Moreover, the ASS model achieves a rapid detection speed of 93 milliseconds per frame, meeting the requirements for real-time monitoring applications. This capability not only highlights the model’s potential for practical applications but also provides robust technical support for real-time monitoring and quality control of underwater wet welding processes.
While our model has demonstrated significant success in arc bubble edge detection, there remains room for improvement in measuring bubble size and location, as well as enhancing the model’s generalizability. In the future, we plan to focus on improving the model’s performance in these areas to enable more comprehensive analysis of the underwater wet welding process and more precise assessment of welding quality.
Overall, the findings of this study advance the technology of underwater wet welding arc bubble edge detection and offer new perspectives and tools for optimizing and automating underwater wet welding processes. With continuous technological advancements and broader applications, we anticipate that the ASS model will play a crucial role in improving the quality and efficiency of underwater wet welding.
Data availability
The datasets generated and analyzed during the current study are not publicly available due the confidentialityof the data but are available from the corresponding author on reasonable request.
References
Rowe, M., Liu, S. & J Reynolds, T. The effect of ferro-alloy additions and depth on the quality of underwater wet welds. Weld. J.81, 156S–166S (2002).
Tsai, C. L. & Masubuchi, K. Mechanisms of Rapid cooling and their design considerations in underwater welding. J. Pet. Technol.32, 1825–1833 (1980).
Yurioka, N. & Horii, Y. Recent developments in repair welding technologies in Japan. Sci. Technol. Weld. Join.11, 255–264 (2006).
Guo, N., Du, Y., Feng, J., Guo, W. & Deng, Z. Study of underwater wet welding stability using an X-ray transmission method. J. Mater. Process. Technol.225, 133–138 (2015).
Wang, J. et al. Characterization of the underwater welding arc bubble through a visual sensing method. J. Mater. Process. Technol.251, 95–108 (2018).
Wang, J., Sun, Q., Zhang, T., Xu, P. & Feng, J. Experimental study of arc bubble growth and detachment from underwater wet FCAW. Weld. World. 63, 1747–1759 (2019).
Liang, X. et al. Low-light image enhancement via adaptive frequency decomposition network. Sci. Rep.13, 14107 (2023).
Lu, F., Yao, X., Fu, C. W. & Jia, J. Removing Anomalies as Noises for Industrial Defect Localization. in IEEE/CVF International Conference on Computer Vision (ICCV) 16120–16129 (2023). doi: (2023). https://doi.org/10.1109/ICCV51070.2023.01481
Chen, Y., Xia, R., Yang, K. & Zou, K. Image inpainting via multi-scale feature module with attention module. Comput. Vis. Image Underst.238, 103883 (2024).
Chen, Y., Xia, R., Yang, K. & Zou, K. DNNAM: image inpainting algorithm via deep neural networks and attention mechanism. Appl. Soft Comput.154, 111392 (2024).
Chen, Y., Xia, R., Yang, K. & Zou, K. Image super-resolution via multi-level information compensation and U-net. Expert Syst. Appl.245, 123111 (2024).
Xie, S. & Tu, Z. Holistically-Nested Edge Detection. in IEEE International Conference on Computer Vision (ICCV) 1395–1403 (2015). doi: (2015). https://doi.org/10.1109/ICCV.2015.164
Bertasius, G., Shi, J., Torresani, L. & DeepEdge A multi-scale bifurcated deep network for top-down contour detection. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4380–4389 (2015). doi: (2015). https://doi.org/10.1109/CVPR.2015.7299067
Zhao, J. et al. EGNet: Edge Guidance Network for Salient Object Detection. in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 8778–8787 doi: (2019). https://doi.org/10.1109/ICCV.2019.00887
Wang, Y. et al. Rectangular convolution pyramid and edge enhancement network for salient object detection of pavement cracks. Measurement. 170, 108698 (2021). RENet.
Xiao, Z., Song, K. Y. & Gupta, M. M. Development of a CNN edge detection model of noised X-ray images for enhanced performance of non-destructive testing. Measurement. 174, 109012 (2021).
Elharrouss, O., Hmamouche, Y., Idrissi, A. K. & Khamlichi, E. El Fallah-Seghrouchni, A. Refined edge detection with cascaded and high-resolution convolutional network. Pattern Recognit.138, 109361 (2023).
Ma, G., Yu, L., Yuan, H., Xiao, W. & He, Y. A vision-based method for lap weld defects monitoring of galvanized steel sheets using convolutional neural network. J. Manuf. Process.64, 130–139 (2021).
Wang, Y. et al. Active disturbance rejection control of layer width in wire arc additive manufacturing based on deep learning. J. Manuf. Process.67, 364–375 (2021).
Ganguly, S. & Khatib, O. Accurate edge detection for robotic welding through tactile exploration. in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 7323–7330 (2022). doi: (2022). https://doi.org/10.1109/IROS47612.2022.9981824
Zou, L., Zheng, Y. & Lu, J. An Edge Detection Method for Welding Pool based on an Improved Canny Algorithm. J. Phys. Conf. Ser.2785, 012013 (2024).
Pan, H., Pang, Z., Wang, Y., Wang, Y. & Chen, L. A new image recognition and classification method combining transfer learning Algorithm and MobileNet Model for Welding defects. IEEE Access.8, 119951–119960 (2020).
Kumaresan, S., Aultrin, K. S. J., Kumar, S. S. & Anand, M. D. Transfer learning with CNN for classification of Weld defect. IEEE Access.9, 95097–95108 (2021).
Jiao, W., Wang, Q., Cheng, Y. & Zhang, Y. End-to-end prediction of Weld penetration: a deep learning and transfer learning based method. J. Manuf. Process.63, 191–197 (2021).
Wang, Z. et al. Recognition of GTAW Weld penetration based on the lightweight model and transfer learning. Weld. World. 67, 251–264 (2023).
Bahador, A., Du, C., Ng, H. P., Dzulqarnain, N. A. & Ho, C. L. Cost-effective classification of tool wear with transfer learning based on tool vibration for hard turning processes. Measurement. 201, 111701 (2022).
Yang, X., Qiao, T., Zhang, H., Chen, H. & Hao, G. Research on image recognition and detection method of sapphire bubbles. J. Instrum.14, P12013 (2019).
Fu, Y., Liu, Y. & BubGAN Bubble generative adversarial networks for synthesizing realistic bubbly flow images. Chem. Eng. Sci.204, 35–47 (2019).
Zhao, Q., Li, R. R. & Qian, Q. U. Research on statistical detection method of micro bubbles in transparent layer of quartz crucible based on image processing. J. Cryst. Growth. 556, 125966 (2021).
Saha, P. K., Pal, R., Sarkar, S. & Mukhopadhyay, A. A novel image processing technique for detection of pseudo occluded bubbles and identification of flow regimes in a bubble column reactor. Measurement. 189, 110568 (2022).
Sadeghi, V., Vard, A., Sharifi, M., Mir, H. & Mehridehnavi, A. Segmentation and region quantification of bubbles in small bowel capsule endoscopy images using wavelet transform. Inf. Med. Unlocked. 42, 101364 (2023).
Fu-Zhen, Z. H. U. A. N. G. Survey on transfer Learning Research. J. Softw.26, 26–39 (2014).
Zhang, X. & Zhuang, Y. Status and development of transfer learning based Category-Level Object Recognition and Detection. ResearchGate. 45, 1224–1243 (2019).
Tomasi, C. & Manduchi, R. Bilateral filtering for gray and color images. in Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271) 839–846 doi: (1998). https://doi.org/10.1109/ICCV.1998.710815
Liu, Y. et al. Richer Convolutional features for Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell.41, 1939–1946 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016). doi: (2016). https://doi.org/10.1109/CVPR.2016.90
Woo, S., Park, J., Lee, J. Y. & Kweon, I. S. C. B. A. M. Convolutional Block attention Module. in Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 3–19 (Springer International Publishing, Cham, doi:https://doi.org/10.1007/978-3-030-01234-2_1. (2018).
Kaganami, H. G. & Beiji, Z. Region-Based Segmentation versus Edge Detection. in Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing 1217–1221 (2009). doi: (2009). https://doi.org/10.1109/IIH-MSP.2009.13
Soria, X., Pomboza-Junez, G. & Sappa, A. D. LDC: Lightweight dense CNN for Edge Detection. IEEE Access.10, 68281–68290 (2022).
Soria, X., Li, Y., Rouhani, M. & Sappa, A. D. Tiny and Efficient Model for the Edge Detection Generalization. in IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 1356–1365 (2023). doi: (2023). https://doi.org/10.1109/ICCVW60793.2023.00147
Wang, X., Zhang, Y., Xie, X. & Li, Y. Salient Object Detection Via Double Sparse Representations Under Visual Attention Guidance. in IGARSS –2018 IEEE International Geoscience and Remote Sensing Symposium 3631–3634 (2018). doi: (2018). https://doi.org/10.1109/IGARSS.2018.8518425
Acknowledgements
This work was supported in part by Jiangxi Provincial Natural Science Foundation under grant number 20224BAB202033,in part by Jiangxi Provincial Key Research and Development Program under grant number 20192BBE50015, and in part by Science and Technology Project of Jiangxi Provincial Education Department under grant number GJJ201912.
Author information
Authors and Affiliations
Contributions
Bo Guo: Conceptualization, Methodology, Funding acquisition, Writing – original draft, Software, Writing – review & editing, Supervision. Xu li: Methodology, Formal analysis, Software, Data curation, Writing – original draft, Writing – review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Guo, B., Li, X. Arc bubble edge detection method based on deep transfer learning in underwater wet welding. Sci Rep 14, 22628 (2024). https://doi.org/10.1038/s41598-024-73516-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-73516-3














