Abstract
Accurate and real-time metal surface defect detection under complex backgrounds and large appearance variations remains a critical challenge in intelligent manufacturing. Existing lightweight detectors often suffer from suboptimal performance due to uniformly applied feature refinement strategies across different network depths, which limits their ability to balance fine-grained representation and computational efficiency. To address this issue, we propose a hierarchical depth-aware refinement framework, termed HDR-YOLO, which explicitly aligns feature enhancement mechanisms with the distinct roles of shallow and deep representations. Specifically, a Query-Focused Convolution (QFC) block is introduced in shallow layers to enhance high-resolution texture and edge information, while a Query-Based Fusion (QBF) block is employed in deeper layers to improve global semantic modeling through adaptive feature interaction. The proposed design enables more effective detection of small-scale defects and irregular fine-grained patterns. Extensive experiments on the NEU-DET and GC10-DET datasets demonstrate that HDR-YOLO improves mAP@0.5 by 3.92% and 7.67%, respectively, over the baseline, while maintaining competitive inference efficiency. These results validate that depth-aware refinement is an effective strategy for enhancing lightweight defect detection under real-time industrial constraints.
Similar content being viewed by others
Data availability
The datasets used in this study are publicly available. The NEU-DET dataset can be obtained from its original source, and the GC10-DET dataset is available from the corresponding public repository. The source code and additional data generated during the current study are available from the corresponding author upon reasonable request.
References
Maleki, E. et al. Surface post-treatments for metal additive manufacturing: Progress, challenges, and opportunities. Addit. Manuf. 37, 101619 (2021).
Ragab, M. G. et al. A comprehensive systematic review of yolo for medical object detection (2018 to 2023). IEEE Access 12, 57815–57836 (2024).
Wang, Q. et al. A casting surface dataset and benchmark for subtle and confusable defect detection in complex contexts. IEEE Sens. J. 24, 16721–16733 (2024).
Mery, D. Aluminum casting inspection using deep object detection methods and simulated ellipsoidal defects. Mach. Vision Appl. 32, 72 (2021).
Karimi, N., Mishra, M. & Lourenço, P. B. Automated surface crack detection in historical constructions with various materials using deep learning-based yolo network. Int. J. Archit. Herit. 19, 581–597 (2025).
Ren, Z., Fang, F., Yan, N. & Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf. Green Technol. 9, 661–691 (2022).
Tian, J. H. et al. An improved yolov5n algorithm for detecting surface defects in industrial components. Sci. Rep. 15, 9756 (2025).
Maruschak, P., Konovalenko, I. & Osadtsa, Y. Surface defects of rolled metal products recognised by a deep neural network under different illuminance levels and low-amplitude vibration. Int. J. Adv. Manuf. Technol. 139, 449–464 (2025).
Hütten, N. et al. Deep learning for automated visual inspection in manufacturing and maintenance: A survey of open-access papers. Appl. Syst. Innov. 7, 11 (2024).
Alzubaidi, L. et al. Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J. Big Data 8, 53 (2021).
Agarwal, A. & Patni, K. Lung cancer detection and classification based on alexnet cnn. In Proceedings of the 6th International Conference on Communication and Electronics Systems (ICCES) (2021).
Shah, S. R. et al. Comparing inception v3, vgg16, vgg19, cnn, and resnet50: A case study on early detection of a rice disease. Agronomy 13, 1633 (2023).
Xu, W., Fu, Y.-L. & Zhu, D. Resnet and its application to medical image processing: Research progress and challenges. Comput. Methods Programs Biomed 240, 107660 (2023).
Xie, X. et al. Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
Xu, X. et al. Crack detection and comparison study based on faster R-CNN and mask R-CNN. Sensors 22, 1215 (2022).
Bi, X., Hu, J., Xiao, B., Li, W. & Gao, X. Iemask R-CNN: Information-enhanced mask R-CNN. IEEE Trans. Big Data 9, 688–700 (2022).
Arwidiyarti, D. Single shot multibox detector (SSD) in object detection: A review. IJACI: Int. J. Adv. Comput. Inform. 1, 118–127 (2025).
Jiang, P., Ergu, D., Liu, F., Cai, Y. & Ma, B. A review of yolo algorithm developments. Proc. Comput. Sci. 199, 1066–1073 (2022).
Cheng, T. et al. Yolo-world: Real-time open-vocabulary object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16901–16911 (2024).
Qu, Y. et al. Optimization algorithm for steel surface defect detection based on pp-yoloe. Electronics 12, 4161 (2023).
Wang, C. et al. Gold-yolo: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst. 36, 51094–51112 (2023).
Terven, J., Córdova-Esparza, D. M. & Romero-González, J. A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowled. Extract. 5, 1680–1716 (2023).
Hussain, M. Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature toward digital manufacturing and industrial defect detection. Machines 11, 677 (2023).
He, L. H., Zhou, Y. Z., Liu, L., Cao, W. & Ma, J. H. Research on object detection and recognition in remote sensing images based on yolov11. Sci. Rep. 15, 14032 (2025).
Choromanski, K., Likhosherstov, V., Dohan, D. et al. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR) (2021).
Sinha, D. & El-Sharkawy, M. Thin mobilenet: An enhanced mobilenet architecture. In Proceedings of the IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), 280–285 (2019).
Rahimzadeh, M. & Attar, A. A modified deep convolutional neural network for detecting covid-19 and pneumonia from chest x-ray images based on the concatenation of xception and resnet50v2. Inform. Med. Unlocked 19, 100360 (2020).
Wang, Y., Wang, H. & Xin, Z. Efficient detection model of steel strip surface defects based on yolo-v7. IEEE Access 10, 133936–133944 (2022).
Wang, K., Teng, Z. & Zou, T. Metal defect detection based on yolov5. In Journal of Physics: Conference Series 2218, 012050 (2022).
Jocher, G. et al. ultralytics/yolov5: v3.0 (2020). Zenodo.
Sohan, M., Thotakura, T. S. R. & Reddy, C. V. R. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics (2024).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. Adv. Neural Inform. Process. Syst. 37, 107984–108011 (2024).
Sun, Y., Yan, H., Shang, Z. & Yang, M. Mch-yolov12: Research on surface defect detection algorithm for aluminum profiles based on improved yolov12. Sensors 25, 5389 (2025).
Xue, R., Hua, S. & Xu, H. Feci-rtdetr: A lightweight unmanned aerial vehicle infrared small target detector algorithm based on rt-detr. IEEE Access (2025).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017).
Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling (2024).
Acknowledgement
This research work was supported by Universiti Malaya under Project Number UMREG025-2025.
Funding
This research work was supported by Universiti Malaya under Project Number UMREG025-2025.
Author information
Authors and Affiliations
Contributions
Q.Q. conceived and designed the study, developed the methodology, conducted all experiments, analyzed the results, and wrote the manuscript. A.S.B.M.K. supervised the research and provided critical feedback on the manuscript. N.B.I., H.L., L.F., Z.Y., J.L., and C.Z. provided guidance and advice on the research design and data interpretation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Qin, Q., Khairuddin, A.S.M., Idros, N. et al. Hierarchical depth aware YOLO for efficient metal surface defect detection. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46074-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-46074-z


