Abstract
In recent years, steel surface defect detection based on machine vision has attracted significant attention and has emerged as a research hotspot. However, several challenges remain. In practical industrial scenarios, deep learning-based detection methods often involve high computational complexity, which limits their applicability for real-time defect monitoring. Moreover, due to the complex and noisy background of steel surfaces, conventional deep learning networks frequently suffer from the loss of critical defect features during the feature extraction process. To address these challenges, this paper proposes a novel latent-space attention multi-scale YOLOv10n model (LAM-YOLOv10n). First, a lightweight ghost module is integrated to significantly reduce the model’s parameter count and computational cost. Second, a spatial multi-scale attention (SMA) module is designed to enhance the extraction of discriminative features related to steel surface defects. Finally, a multi-branch feature fusion network (MFFN) is introduced to improve the effectiveness of multi-scale feature aggregation, thereby enhancing the model’s detection performance for various defect types. Experimental results demonstrate that the proposed LAM-YOLOv10n model achieves a 3.47% improvement in precision compared with the baseline YOLOv10n network, outperforming several state-of-the-art object detection models in both accuracy and efficiency. These findings indicate the effectiveness and practicality of the proposed method for real-time steel surface defect detection in complex industrial environments.
Similar content being viewed by others
Introduction
Steel, as a critical industrial material, is widely used in infrastructure construction, industrial manufacturing, transportation and logistics, and shipbuilding, serving as the cornerstone of national economic growth and technological progress1,2,3. Due to its excellent mechanical properties, weldability, durability, and cost-effectiveness, steel plays a crucial role in ensuring the reliability and service life of critical structures and systems. The performance and quality of steel components significantly impact the structural integrity and financial outcomes of engineering projects, affecting not only operational safety but also long-term maintenance costs and lifecycle efficiency. However, during complex multi-stage manufacturing processes—such as rolling, heat treatment, and surface treatment—various defects may appear on the steel surface, as shown in (Fig. 1). These defects can act as stress concentration points, reducing fatigue strength and ultimately affecting the mechanical properties of steel products. If not detected and addressed promptly, such defects may lead to serious safety hazards, equipment failures, or even structural collapse in high-risk applications. Therefore, achieving precise, rapid, and automated identification of surface defects is of critical importance to the mechanical manufacturing industry4,5,6.
Approaches for identifying steel surface defect types can generally be divided into three main categories: human visual inspection7,8,9, conventional photoelectric techniques10,11,12, and modern machine vision-based detection systems13,14,15. Among these, the first two—although relatively simple—face challenges such as elevated labor expenses and considerable inconsistencies due to human subjectivity. These shortcomings result in extended inspection durations, decreased operational efficiency, higher instances of misclassification, and diminished detection precision.
With the development of deep learning, an increasing number of scholars have focused on the field of steel defects. However, most existing models suffer from high computational complexity and insufficient feature fusion capabilities, posing significant challenges for classifying surface defects in steel using computer vision technology. As a classic network in the YOLO series, YOLOv1016 achieves a better balance between speed and accuracy by introducing Neural Architecture Search (NAS) and an improved decoupled head structure. Therefore, this paper proposes a reliable and accurate defect type detection algorithm using YOLOv10 as the baseline network. The main contributions of this paper are as follows:
-
(1)
This paper proposes a steel defect detection method LAM-YOLOv10n, which consists of the original YOLOv10n network combined with the Ghost module, the MFFN module and the SMA module, which has the capability of extracting the defect type feature information under the complex background of the steel surface, and meets the detection requirements in practical industrial scenarios.
-
(2)
In order to solve the problem of low defect detection precision due to the complex background of steel surface, this paper proposes the SMA module, which is used in the backbone feature extraction network to extract the defective feature information on the steel surface, and at the same time, it utilizes the multi-scale connection to avoid the loss of defective feature information in the process of extraction, so as to improve the overall performance of the network model and recognition precision.
-
(3)
To address the issue of target feature loss during the training phase of the model, this paper introduces the MFFN module, which is designed to fuse fine-grained feature information related to steel surface defects. This enhancement aims to strengthen the image recognition model’s comprehension of defect scenarios on steel surfaces, thereby improving both the accuracy and robustness of the recognition process.
Related work
A substantial body of research has investigated the use of conventional machine learning methods for detecting defects on steel surfaces. Xu et al.17 employed a multiscale geometric processing framework that segmented steel surface imagery into directional elements across various resolutions. From these, high-dimensional descriptors were extracted and subsequently transformed into compact representations using graph-based dimensionality reduction, thereby enabling effective defect categorization. Hu et al.18 developed an integrated strategy that fused backpropagation neural networks with support vector machines. In their approach, defect images underwent binarization to simplify the extraction of relevant features and assist in defect classification. Liu et al.19 presented a fully trainable extreme learning machine architecture for recognizing defects, with local binary patterns serving as the primary feature representation. The framework autonomously aggregated results from multiple independent submodules to determine defect types. Experimental results demonstrated that this method outperformed several traditional techniques in steel surface defect recognition tasks.
Driven by the swift evolution of computer vision, digital imaging, and artificial intelligence, steel surface defect detection is steadily moving toward greater levels of automation and intelligence. The integration of ultra-high-definition cameras, sophisticated visual processing techniques, and deep neural networks has significantly advanced the automatic identification and categorization of surface imperfections in steel. Soukup et al.20 utilized a convolutional neural network (CNN) trained in a fully labeled environment to boost detection accuracy, further optimizing performance through regularization methods. Yi et al.21 introduced an approach that segments faulty regions using a symmetric surrounded saliency mechanism, combined with a CNN. This framework bypasses the need for manual feature extraction commonly seen in earlier techniques, thus improving both speed and reliability. Damacharla et al.22 implemented a hybrid deep learning design by embedding ResNet and DenseNet modules into the encoder of a UNet architecture, resulting in enhanced precision in identifying surface anomalies on steel products. Uraon et al.23 developed a specialized neural network aimed at identifying multiple defect types within intricate background environments. Experimental findings indicated that the model delivered strong performance in the context of steel surface defect recognition. Bouguettaya et al.24 designed a composite architecture by merging MobileNet-V2 with Xception under a transfer learning paradigm. Utilizing a deep ensemble methodology, their model preserved the advantage of rapid inference while mitigating issues related to the typically large size of conventional deep learning networks, achieving promising outcomes in defect identification tasks. Akhyar et al.25 incorporated deformable convolution layers and adaptive RoI pooling into a cascaded R-CNN framework, enhancing the network’s responsiveness to irregular object geometries. Furthermore, the application of stochastic and limit-based scaling mechanisms refined the model’s precision in handling detailed target characteristics. Xia et al.26 advanced the YOLOv5s framework by introducing a novel large-kernel C3 module, which strengthened the system’s capacity for perceptual efficiency and detailed feature representation in visually complex textures. Additionally, a tailored training approach was adopted, leveraging multi-scale feature maps aligned with convolution kernels of varying dimensions, thus improving the model’s flexibility in detecting defects with diverse morphological attributes. Raj et al.27 presented the YOLOv7-CSF model, which integrates a lightweight and cost-effective coordinated attention module into the prediction head of the YOLOv7 framework. In addition, the implementation of the SCYLLA-IOU loss function contributed to improved detection performance and computational effectiveness in steel surface defect identification. Huang et al.28 proposed an enhanced architecture named WFE-YOLOv8s, derived from the YOLOv8s model, where the traditional C2F module was substituted with a novel CFN design. This replacement led to a reduction in both model parameters and computational complexity (measured by GFLOPs), while the incorporation of EMA attention further boosted detection precision. He et al.29 developed an improved defect detection network also built upon the YOLOv8s backbone. Their method utilized a foundational convolutional network to extract hierarchical feature representations at different layers. These representations were then aggregated via a multiscale fusion mechanism, followed by the application of a proposal generation module to localize potential defect regions. Experimental validation showed that the approach excelled in accurately identifying a wide range of surface anomalies on steel. Amin et al.30 advanced the detection of steel defects by constructing a machine learning framework capable of recognizing hierarchical defect patterns from sample steel plate imagery and categorizing them into appropriate classes, thereby enhancing overall classification performance. Tabernik et al.31 introduced a segmentation-driven deep neural network tailored for identifying and isolating surface anomalies, which demonstrates strong learning capabilities even when trained on limited datasets. Demir et al.32 presented an innovative deep learning technique for pinpointing and categorizing defect types arising during the steel manufacturing process. Their approach employs concurrent training of residual blocks alongside attention modules to capture rich, discriminative features. Li et al.33 proposed a neural model that embeds a multi-scale representation learning mechanism for improved defect identification. By combining hierarchical feature extraction with a streamlined fusion strategy, the model significantly boosts detection accuracy while keeping the number of parameters relatively small.
Overall structure of LAM-YOLOv10n model
The LAM-YOLOv10n model is specifically developed to achieve high-precision detection of steel surface defects by capturing images from a top-down perspective. The primary objective of the proposed model is to accurately localize defect regions and classify various types of surface anomalies, thereby reducing false positives and enhancing the overall reliability of defect recognition in real industrial environments. To achieve this goal, LAM-YOLOv10n model focuses on fully mining and utilizing the image feature in-formation in and around the defect region to improve the ability to differentiate defect types. By jointly analyzing local defect characteristics and global surface textures, the model effectively enhances its discriminative capability, enabling precise classification among subtle and visually similar defect types. In addition, the LAM-YOLOv10n model adopts an optimized network structure and feature extraction to ensure that it still maintains high detection precision and real-time performance in complex industrial environments.
The overall network framework, as illustrated in Fig. 2, highlights the synergistic interaction among key modules, including the Backbone module, Neck module, and Head module. These components collectively contribute to robust and real-time steel surface defect detection. Among them, the Backbone module serves to extract the defect type feature information in the steel surface image; the Neck module realizes the effective fusion of the deep and shallow feature information of the defect type; and the Head module is responsible for analyzing the steel image data and detecting the defects.
The actual production environment of steel is often characterized by blurred visual characteristics of defect target types due to lighting, contrast and other factors, further increasing the difficulty of defect type classification and detection. In addition, defect targets on steel surfaces are often small, and complex production backgrounds and harsh environmental conditions make it more difficult to detect and localize defect target features. To address these challenges, the LAM-YOLOv10n model is optimized in several ways. First, the data preprocessing module expands the industrial steel surface defect data to provide more sufficient and complete training data for the network model; second, the model introduces the SMA module, which adopts a cross-channel approach to interact with the steel surface defect feature information, and thus is specifically designed to focus on the tiny target defect features on the steel surface. Finally, by adding the MFFN structure to the output of the Neck module, feature information of different depths can be more effectively fused to improve the detection of deep and shallow defects, thus better localizing and detecting the types of steel surface defects. Through the above positive effects, the LAM-YOLOv10n model can significantly improve the accuracy and stability of defect detection, and effectively solve the practical problems of defect recognition on steel surfaces.
Introducing ghost module
Inspired by the literature34, to maintain equilibrium in the parameter scale of the original YOLOv10n model, this paper utilizes the Ghost module to optimize the feature extraction network. Where the Ghost module is shown in Fig. 3, it aims to integrate the balance of spatial information retention and network complexity to achieve a faster and more efficient image recognition model that provides better performance and usability for practical applications.
Specifically, Ghost makes the point that the feature maps obtained by convolution can have other similar features to avoid compromising the overall performance of the model by having too many parameters. For the reduction of resources that is the convolution filter. If a data with width w, height h and number of channels c is given, then the number of parameters needed for the convolution operation to generate the feature map is: h×w×c×convolution kernel size×number of output channels. Then for the Ghost module, after obtaining a part of the feature map through the convolution operation, a linear transformation is performed to obtain the required feature map, as shown in Eq. (1):
where \(\varphi\) is the linear transformation and \({\text{y}}_{i}^{\prime }\) is the i-th feature map obtained by the ordinary convolution operation. The number of feature maps obtained by the linear transformation operation is the same as the number of feature maps obtained by the ordinary convolution operation. That is, under the condition of obtaining the same number of feature maps, the Ghost module operation is much smaller than the ordinary convolution operation.
The GhostBottleneck Structure primarily consists of two Ghost modules and residual blocks stacked together, with two specific stacking methods. As shown in Fig. 3, the GhostBottleneck Structure with a stride of 1 is composed of two Ghost modules connected in series. For input data features, the Ghost module is used to expand the number of data feature channels and then compress the number of data feature channels from bottom to top, ensuring that the number of output feature channels matches the number of input channels. Subsequently, feature sharing is achieved through the residual block, meaning that the input feature map and output feature map share features in the same manner. The GhostBottleneck structure with a stride of 2 is based on the GhostBottleneck structure with a stride of 1 and adds an additional convolution module with a stride of 2.
Introducing SMA module
In order to locate and detect the target feature information of steel surface defects more accurately, this paper proposes the SMA module. This module aims to establish the dependency relationship between the feature information of steel surface defect types through multi-scale parallel modules, and to fuse the feature information of these three parallel modules through cross-space learning and dot product learning methods. Each parallel module adopts a cross-channel approach for the interaction of steel surface defect feature information.
As illustrated in Fig. 4, the SMA structure is composed of three parallel processing branches. In the first branch, the input features are initially passed through a 3 × 3 convolutional layer, followed by both global maximum pooling (GMP) and global average pooling (GAP). The outputs from these pooling operations are then concatenated and fed into a 1 × 1 convolution layer to compress the feature dimensions. Subsequently, a Softmax function is applied to generate attention weights, which are then dot-multiplied with the initial convolutional output to produce the branch’s final output features. The second branch performs average pooling separately along the height and width dimensions of the feature map. These pooled outputs are concatenated and sent through a 1 × 1 convolution layer to reduce dimensionality, after which the processed data enters the cross-spatial attention unit. Similarly, the third branch applies a 3 × 3 convolution to the input feature map before feeding it into the same cross-spatial attention mechanism. Within this module, feature representations from the second and third branches are interactively refined. Finally, the refined output is element-wise multiplied with the results from the first branch to generate the overall output of the SMA module.
The SMA module and the lightweight Ghost module together form the feature extraction network for the LAM-YOLOv10n model. This is designed to improve the extraction of defect type features from the complex background of steel surfaces.
MFFN-based neck module
Traditional feature fusion approaches include both serial and parallel strategies. Serial fusion approach directly connects two sets of identical or different input features to generate a new feature vector whose dimension is equal to the sum of the two sets of input features. The parallel fusion approach fuses multiple sets of feature information into a composite vector. With the development of deep learning, two feature fusion approaches, FPN and PANet, but these fusion approaches tend to lead to the loss of information about the details of the input features and the presence of more redundancy information.
To address the above challenges, this study introduces a MFFN module, which is applied at the three output layers of the Neck component, as illustrated in (Fig. 5). This unit is specifically designed to integrate fine-grained features related to steel surface defects, thereby strengthening the model’s interpretative capability in complex defect scenarios. By refining the contextual understanding of the recognition system, the proposed module contributes to enhanced detection accuracy and improved consistency in performance.
The input features of the MFFN module go through two branches, the first branch and the second branch, the first branch first goes through the GAP and the convolution module of size 3 × 3 to aggregate the spatial feature information of the feature map, and then obtains the corresponding weights through the Softmax operation, and then finally obtains the feature information by multiplying the obtained weights with the input feature information; the second branch goes through the local average pooling operation (LAP) and the local maximum pooling operation (LMP) to obtain a complete feature representation. The second branch is spliced by LAP and LMP to get a complete feature representation, and then multiplied with input features by convolution module of size 3 × 3 and Softmax to get feature information; finally, the and are spliced together to get the output feature information by the MFFN module after dimensionality reduction by convolution module of size 1 × 1. As shown in Eqs. (2–4):
Where \({F_{input}}\) is the input feature information of MFFN module, \(GAP\) is the global average pooling operation, \(LMP\) is the local maximum pooling operation, \(LAP\) is the local average pooling operation, \({f^{3 \times 3}}\) is the convolution operation of size 3 × 3, \({f^{1 \times 1}}\) is the convolution operation of size 1 × 1, and \({F_{output}}\) is the output feature information of MFFN module.
Experimental results and analysis
Evaluation indicators and experimental parameters
In order to evaluate the precision of the LAM-YOLOv10n target detection model in the type of steel surface defect detection, this paper evaluates it using the commonly used metrics in target detection, such as precision (P), average precision (AP), mean average precision (mAP), recall (R), frames per second (FPS). As shown in Eqs. (5–9):
Where TP stands for correct positive samples, FP stands for error positive samples, FN stands for error negative samples,\(\theta\)stands for confidence score, and \(\tau\)stands for IoU threshold.
Where t is the time required by the target detection model to detect all categories of data, and \(Per\_frame\) is the total amount of data required to be detected by the target detection model, and N is the total number of target detection categories.
In this paper, all experiments were conducted on a computer with Intel i7-9700 K processor and NVIDIA GeForce GTX 1060Ti GPU graphics card, all models were trained for 100 epochs with batch size of 16, learning rate of 0.001, optimizer of Adam, confidence score of 0.35, IoU threshold of 0.7, and momentum of 0.9.
Dataset and preprocessing
The source of data for this experiment is the defect dataset of steel surface defects produced by Kechen Song’s team at Northeastern University35, which has six types of defects, namely crazing, inclusion, patches, pitted_surface, rolled-in_scale, and six types of scratches, as shown in (Fig. 6).
At the same time, in order to enrich the steel surface defects dataset, this paper utilizes a variety of image processing techniques, including image flipping, image cropping, brightness adjustment, contrast adjustment, adding noise and so on. On the one hand, it increases the diversity of the defect dataset, on the other hand, it improves the adaptability of the network model to the complex noise on the steel surface, and the processed dataset is named PRO-DataSet. Image flipping involves mirroring images horizontally or vertically to expand the diversity of the dataset. Image cropping involves randomly selecting regions of interest in the original image to produce multiple image fragments of different sizes and positions, thereby increasing the diversity of the dataset. In addition, adding Gaussian noise to images simulates visual interference that may exist in actual industrial environments, thereby improving the network model’s adaptability to complex environments in actual industrial scenarios.
PRO-DataSet steel surface defect type data set specific information as shown in Table 1, did not change the specific defect type, only expand the number of defect type target features. Among them, The number of crazing defect type targets is 402, and the number of target frames is 921; the number of inclusion defect type targets is 510, and the number of target frames is 896; the number of patches defect type targets is 323, and the number of target frames is 1,020; the number of rolled-in_scale defect type targets is 478, and the number of target frames is 1020; the number of targets for the pitted_surface defect type is 501 and the number of target frames is 965; the number of targets for the scratches defect type is 536 and the number of target frames is 1089.
Ablation experiment
In the LAM-YOLOv10n network structure, this paper combines the GhostConv module, the SMA module and the MFFN module based on the original YOLOv10n model design. In order to verify the positive effects of GhostConv module, SMA module and MFFN module in steel surface defect detection, respectively, in this section, the ablation experiments are carried out by adding GhostConv module, SMA module and MFFN module one by one into the original YOLOv10n model, and the results of the ablation experiments are shown in (Table 2). In the table, the GC scheme represents the addition of the GhostConv module, the GC-SMA scheme represents the addition of the GhostConv module and the SMA module, and the GC-SEM-MFFN scheme represents the addition of the GhostConv module, the SMA module and the MFFN module. The evaluation indexes of the ablation experiment and the convergence process of network training are shown in (Fig. 7a–d).
The experimental results show that the design of the LAM-YOLOv10n model with the multi-channel attention modules, SMA and GhostConv modules, and the MFFN fusion method has a significant enhancement on the localization of the steel surface defect type detection task. The ablation experiments with the gradual addition of the GhostConv module, the SMA module, and the MFFN module revealed that the gradual addition of these modules can enhance the detection precision of the network model, respectively. Among them, after adding the GC module, the detection precision rate of the model is improved by 0.24% points. After adding the SMA module, the model further improves by 0.13% points, indicating that the interaction between the SMA module and the GhostConv module helps to extract more accurate information about the surface defect features of steel. Finally, by adding the MFFN module, the model precision increased by 3.47% points, which shows that by fusing the deep and shallow feature information of the steel surface defect type, the precision of detection and positioning is further improved.
Comparison experiment
In order to verify the detection effect of LAM-YOLOv10n, the small target detection algorithm for steel surface defect types proposed in this paper, experiments are carried out on the PRO-DataSet steel surface defect dataset. The experimental results are shown in (Fig. 8). By comparing the experiments, the effectiveness of LAM-YOLOv10n detection algorithm in steel surface defect type detection can be better understood.
Among them, we selected different versions in the baseline network YOLOv10 model to conduct detailed comparison experiments with the LAM-YOLOv10n proposed in this chapter, which are YOLOv10n, YOLOv10s, YOLOv10m, YOLOv10b, YOLOv10l, and YOLOv10x. The experimental results show that, in the steel surface defect type detection tasks, the precision rates of YOLOv10n, YOLOv10s, YOLOv10m, YOLOv10b, YOLOv10l, and YOLOv10x are 93.49, 94.88, 95.12, 95.68, 97.34, and 97.79%, respectively. In comparison, the precision rate of the LAM-YOLOv10n network increased by 3.47% relative to YOLOv10n, increased by 2.08% relative to YOLOv10s, increased by 1.84% relative to YOLOv10m, increased by 1.28% relative to YOLOv10b, decreased by 0.38% relative to YOLOv10l, and YOLOv10x decreased by 0.83%. Although the precision of the LAM-YOLOv10n network does not reach the level of the individual versions of the baseline network YOLOv10, it is worth noting that the precision of the LAM-YOLOv10n network still improves by 3.47% in the YOLOv10n-based case. This result emphasizes the effectiveness of the algorithm proposed in this paper for steel surface defect type detection scenarios.
In contrast, the LAM-YOLOv10n model proposed in this paper aims to balance the requirements of real-time and precision, and to improve the performance of the network model in detecting steel surface defect types without significantly affecting the detection speed of the network model. The model architecture is optimized to enhance the ability to accurately detect and classify diverse types of steel surface defects, while ensuring that the overall inference speed remains suitable for deployment in real-world industrial scenarios where real-time performance is critical. This design philosophy is particularly important for deployment in real-world industrial environments where high precision and fast inference capabilities are critical to ensure timely and reliable defect identification. In addition, in order to further validate the effectiveness of the LAM-YOLOv10n model, we conduct a series of comparative experiments against several representative object detection networks, and the results of the comparative experiments are shown in (Table 3).
From Table 3, the LAM-YOLOv10n model proposed in this paper demonstrates significant advantages. For instance, in terms of precision metrics, the LAM-YOLOv10n model achieves the highest precision, outperforming YOLOv12n and RT-DETR by 2.6 and 2.31%, respectively. Although the FPS is slightly lower than that of YOLOv12n, the model sacrifices minimal speed while improving accuracy. This indicates that the LAM-YOLOv10n model performs best in terms of detection accuracy. It also demonstrates that the proposed LAM-YOLOv10n model achieves superior detection performance in practical application scenarios compared to steel surface defect type detection.
In addition, in order to visualize the detection effect of the LAM-YOLOv10n model, the LAM-YOLOv10n model and other five models with better detection precision are selected for visual comparison of the detection effect, as shown in (Fig. 9).
As can be seen in Fig. 9, compared with the detection effect of the LAM-YOLOv10n model proposed in this paper, the other comparative algorithms have different degrees of leakage detection phenomenon, and the confidence level is low, which is attributed to the fact that this paper adopts the SMA module and the Ghost module to focus on the feature information of the defect types on the steel surface, especially in the background scenario of the complex steel surface. In addition, by introducing the MFFN module, multi-scale fusion of steel surface defect type feature information can be achieved, so that the LAM-YOLOv10n model can achieve optimal detection performance.
Discussion
In this paper, we introduce a model named LAM-YOLOv10n, developed to enhance the effectiveness of detecting defects on industrial steel surfaces. In practical application scenarios, the deep learning algorithms have high computational loads, which makes it difficult to realize real-time monitoring of defect types. In addition, due to the complex background of the steel surface, the existing deep learning network is prone to the loss of target feature information during the feature extraction process. The model firstly introduces the Ghost lightweight module, which effectively reduces the number of parameters of the model; secondly, the SMA module is designed, which focuses on the feature information extraction of the defect types on the steel surface; finally, the MFFN module is utilized to further enhance the multi-scale feature fusion effect of defect targets. The experimental results show that the LAM-YOLOv10n algorithm proposed in this paper improves the precision by 3.47% compared with the original network, which significantly outperforms the existing target detection models.
In this paper, we explore the industrial steel surface defect detection in theory and experiment, and analyze the mainstream deep learning algorithms for industrial steel surface defect detection in recent years. Considering that the detection algorithms need to meet the requirements of low latency and high precision in the actual industrial scenarios, we select the YOLOv10 model as the benchmark model for the research in this paper. In addition, a feature extraction network is constructed on the basis of YOLOv10 by introducing the GhostConv module and the SMA attention mechanism, which balances the spatial information retention and the network complexity, and establishes the dependency between the feature information of the steel surface defect types by means of a multi-scale parallel module, and fuses the three parallel modules by means of cross-spatial learning and pointwise multiplicative learning to merge these three feature information. Each parallel module adopts a cross-channel approach for the interaction of steel surface defect feature information, so as to realize a faster and more efficient image recognition model for industrial steel surface defects, and provide better performance and usability for practical applications. In addition, the fusion performance of the target characteristic in-formation of industrial steel surface defects is enhanced by introducing the MFFN module, and the information of each input feature is dynamically adjusted during the fusion process to obtain a higher precision. It is demonstrated experimentally that our designed LAM-YOLOv10n model achieves an excellent detection precision rate with a small number of parameters.
In addition, the research in this paper has some areas for improvement at the same time. First, the scenarios in this study were only experimented on experimental data, which achieved better results, and the subsequent work can explore the detection performance of the collected steel surface defect images in real industrial scenarios. Secondly, for real industrial scenarios, due to the interfering nature of the data, it is recommended that the concept of edge computing such as FPGA be placed at the front end of the image acquisition module for real-time preprocessing of industrial steel surface defect images to prevent the propagation of redundant information into the subsequent transmission and processing. Finally, a feature fusion network with stronger attention can be designed to focus on fusing the target feature information of industrial steel surface defect types to prevent the interference of redundant feature information and enhance the detection performance.
In addition to the above work, the detection of industrial steel surface defects can be improved in the following ways to enhance the detection performance:
-
(1)
Using image segmentation technology to focus on the target edge feature information of steel surface defect types, so as to more accurately segment the steel surface defect types in the industrial video stream. Since the image segmentation technique performs pixel-level segmentation, it improves the defect type detection precision.
-
(2)
Using infrared detection technology to detect defect types in industrial steel video streams with different light intensities. Due to the environmental impacts of image texture, noise, and light intensity captured by the capture image module in the actual scene, the data is first processed using infrared detection technology, and then the performance is improved by classification or detection techniques.
Conclusions
In this paper, a model LAM-YOLOv10n is proposed by selecting YOLOv10n as the benchmark network model for the current situation of steel surface defect type detection. The model first introduces the Ghost lightweight module to effectively reduce the number of model parameters, thereby reducing the computational complexity; secondly, the SMA module is designed to focus on extracting and enhancing the key feature information of steel surface defect types to ensure the accurate capture of fine defect targets under complex backgrounds; finally, the MFFN module is adopted to further enhance the detection ability of defect targets at different scales. Through experimental verification, the LAM-YOLOv10n algorithm proposed in this paper has achieved a high detection precision. This model can not only effectively improve the recognition rate of steel surface defect types, but also meet the dual requirements of real-time and precision in actual industrial detection scenarios, and can provide a more efficient and reliable solution for defect detection in the steel production process.
Data availability
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
References
Zhou, X., Fang, H., Fei, X. & Zhang, J. Edge-aware multi-level interactive network for salient object detection of strip steel surface defects. IEEE Access. 9, 149465–149476 (2021).
Yu, J., Cheng, X. & Li, Q. Surface defect detection of steel strips based on anchor-free network with channel attention and bidirectional feature fusion. IEEE T Instrum. Meas. 71, 1–10 (2021).
Qiao, J., Sun, C., Cheng, X., Yang, J. & Chen, N. Stainless steel cylindrical pot outer surface defect detection method based on cascade neural network. Meas. Sci. Technol. 35, 036201 (2023).
Neogi, N., Mohanta, D. K. & Dutta, P. K. Review of vision-based steel surface inspection systems. J. Image. Video. Proc. 1–19 (2014).
Tang, B., Chen, L., Sun, W. & Lin, Z. Review of surface defect detection of steel products based on machine vision. IET Image Process. 17, 303–322 (2023).
Yeung, C. C. & Lam, K. M. Efficient fused-attention model for steel surface defect detection. IEEE T Instrum. Meas. 71, 1–11 (2022).
Chu, M., Gong, R., Gao, S. & Zhao, J. Steel surface defects recognition based on multi-type statistical features and enhanced twin support vector machine. Chemometr Intell. Lab. 171, 140–150 (2017).
Liu, K. et al. Steel surface defect detection using a new Haar–Weibull-variance model in unsupervised manner. IEEE T Instrum. Meas. 66, 2585–2596 (2017).
Ghorai, S., Mukherjee, A., Gangadaran, M. & Dutta, P. K. Automatic defect detection on hot-rolled flat steel products. IEEE T Instrum. Meas. 62, 612–621 (2012).
Wang, G. et al. Multifrequency AC magnetic flux leakage testing for the detection of surface and backside defects in Thick steel plates. IEEE Magn. Lett. 13, 1–5 (2022).
Jing, X., Yang, X. Y., Xu, C. H., Chen, G. & Ge, S. Infrared thermal images detecting surface defect of steel specimen based on morphological algorithm. J. China Univ. Pet. 36, 146–150 (2012).
Meng, X., Lu, M., Yin, W., Bennecer, A. & Kirk, K. J. Evaluation of coating thickness using lift-off insensitivity of eddy current sensor. Sensors 21, 419 (2021).
Luo, Q., Fang, X., Liu, L., Yang, C. & Sun, Y. Automated visual defect detection for flat steel surface: A survey. IEEE T Instrum. Meas. 69, 626–644 (2020).
Huang, X., Zhu, J. & Huo, Y. SSA-YOLO: an improved YOLO for hot-rolled strip steel surface defect detection. IEEE T Instrum. Meas. 73, 5040017 (2024).
Li, Z., Tai, Y., Huang, Z., Peng, T. & Zhang, Z. MPFANet: a multipath feature aggregation network for steel surface defect detection. Meas. Sci. Technol. 35, 045409 (2024).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. ArXiv 2405, 14458 (2024).
Xu, K., Ai, Y. & Wu, X. Application of multi-scale feature extraction to surface defect classification of hot-rolled steels. Int. J. Min. Metall. Mater. 20, 37–41 (2013).
Hu, H. J., Li, Y. X., Liu, M. F. & Liang, W. H. Steel strip surface defects classification based on machine learning. Comput. Eng. Des. 35, 620–624 (2014).
Liu, Y., Jin, Y. & Ma, H. Surface defect classification of steels based on ensemble of extreme learning machines. WRC Symp. Adv. Robot. Autom. (WRC SARA). 203–208 (2019).
Soukup, D. & Huber-Mörk, R. Convolutional neural networks for steel surface defect detection from photometric stereo images. Int. Symp. Vis. Comput. 8887, 668–677 (2014).
Yi, L., Li, G. & Jiang, M. An end-to‐end steel strip surface defects recognition system based on convolutional neural networks. Steel Res. Int. 88, 1600068 (2017).
Damacharla, P., Rao, A., Ringenberg, J. & Javaid, A. Y. TLU-net: a deep learning approach for automatic steel surface defect detection. Int Conf. Appl. Artif. Intell. (ICAPAI). 1–6 (2021).
Uraon, P. K., Verma, A. & Badholia, A. Steel sheet defect detection using feature pyramid network and RESNET. Int Conf. Edge Comput. Appl. (ICECAA) 1543–1550 (2022).
Bouguettaya, A., Mentouri, Z. & Zarzour, H. Deep ensemble transfer learning-based approach for classifying hot-rolled steel strips surface defects. Int. J. Adv. Manuf. Technol. 125, 5313–5322 (2023).
Akhyar, F., Liu, Y., Hsu, C. Y., Shih, T. K. & Lin, C. Y. FDD: a deep learning–based steel defect detectors. Int. J. Adv. Manuf. Technol. 126, 1093–1107 (2023).
Xia, K. et al. Mixed receptive fields augmented YOLO with multi-path Spatial pyramid pooling for steel surface defect detection. Sensors 23, 5114 (2023).
Raj, G. D. & Prabadevi, B. Steel strip quality assurance with yolov7-csf: a coordinate attention and Siou fusion approach. IEEE Access. 11, 129493–129506 (2023).
Huang, Y., Tan, W., Li, L. & Wu, L. Wfre-yolov8s: a new type of defect detector for steel surfaces. Coatings 13, 2011 (2023).
He, Y., Song, K., Meng, Q. & Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE T Instrum. Meas. 69, 1493–1504 (2019).
Amin, D. & Akhter, S. Deep learning-based defect detection system in steel sheet surfaces. IEEE Reg. 10 Symp. (TENSYMP). 444–448 (2020).
Tabernik, D., Šela, S., Skvarč, J. & Skocaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 31, 759–776 (2020).
Demir, K., Ay, M., Cavas, M. & Demir, F. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput. Appl. 35, 8389–8406 (2023).
Li, Z., Wei, X., Hassaballah, M., Li, Y. & Jiang, X. A deep learning model for steel surface defect detection. Complex. Intell. Syst. 10, 885–897 (2024).
Han, K. et al. Ghostnet: More features from cheap operations. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 1580–1589 (2020).
Song, K. & Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 285, 858–864 (2013).
Liu, W. et al. Ssd: single shot multibox detector. Eur. Conf. Comput. Vis. (ECCV). 9905, 21–37 (2016).
Redmon, J., Divvala, S., Girshick, R. & Farhadr, A. You only look once: Unified, real-time object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recogn. 779–788 (2016).
Girshick, R. Fast R-CNN, Proc. IEEE Int. Conf. Comput. Vis. 1440–1448 (2015).
Cheng, P. et al. Tiny-YOLOv7: tiny object detection model for drone imagery. Int. Conf. Image Graph. 14357, 53–65 (2023).
Reis, D., Kupec, J., Hong, J. & Daoudi, A. Real-time flying object detection with YOLOv8. ArXiv Preprint arXiv.2305.09972. (2023).
Khanam, R. & Hussain, M. Yolov11: an overview of the key architectural enhancements. ArXiv Preprint arXiv2410.17725. (2024).
Tian, Y., Ye, Q. & Doermann, D. Yolov12: Attention-centric real-time object detectors. ArXiv Preprint arXiv.2502.12524. (2025).
Zhao, Y. et al. Detrs beat yolos on real-time object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 16965–16974 (2024).
Funding
This work is supported by the Higher Education Key Scientific Research Program Funded of by Henan Province under Grant 22A520023 and 24A520008, Doctoral Cultivation Fund Project of Henan University of Engineering under Grant D2022030 and D2022032, Henan Province Science and Technology Research Projects under Grant 232102210118.6.
Author information
Authors and Affiliations
Contributions
Conceptualization, Laomo Zhang; methodology, Laomo Zhang; software, Zhike Wang; validation, Ying Ma and Guowei Li; investigation, Zhike Wang; data curation, Zhike Wang; writing—original draft preparation, Laomo Zhang; writing-review and editing, Guowei Li; visualization, Ying Ma.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, L., Wang, Z., Ma, Y. et al. Steel surface defect detection algorithm based on improved YOLOv10. Sci Rep 15, 32827 (2025). https://doi.org/10.1038/s41598-025-16725-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-16725-8