Abstract
Unlike many other fields of object detection, underwater object detection presents some unique challenges. Underwater object detection is a key technology that enables underwater robots to explore aquatic environments. This task is affected by many unavoidable factors, including poor image quality, high environmental randomness, and concealment of fish. These factors make it difficult to perceive and detect fish underwater. To address these complex issues in underwater environments, this paper proposes a fish detection model named YOLO-Starfish and an Underwater Freshwater Fish Dataset (UFFD). The UFFD contains 16,904 unduplicated images of 19 species, encompassing photographs of a variety of complex underwater environments. YOLO-Starfish builds upon YOLOv8, specifically, the C2Star module and the Attention-driven Enhancement Module (ADEM) are proposed. The C2Star leverages the “star operation” (element-wise multiplication) to achieve high-dimensional feature distribution in a physically consistent manner, mimicking the modulation characteristics of underwater optical degradation. Meanwhile, the ADEM mitigates the impact of image channel imbalance by adaptively enhancing channel-driven features, thereby improving the model’s robustness in underwater environments. Experimental results demonstrate that YOLO-Starfish not only performs well on underwater object detection datasets (RUOD and our UFFD) but also achieves excellent performance on the common object detection dataset benchmark COCO2017. The source code is available at https://github.com/Sdafah/YOLO-Starfish.
Similar content being viewed by others
Data availability
Code of YOLO-Starfish is available from https://github.com/Sdafah/YOLO-Starfish. UFFD’s data are available, readers should contact the corresponding author for details. The RUOD dataset and the COCO2017 dataset is available online.
References
Zhang, Y. et al. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 619, 526–532. https://doi.org/10.1038/s41586-023-06184-4 (2023).
Qin, Y. et al. UrbanEvolver: Function-Aware Urban Layout Regeneration. Int. J. Comput. Vision 132, 3408–3427. https://doi.org/10.1007/s11263-024-02030-w (2024).
Dai, L. et al. A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 30, 584–594 (2024).
He, W. et al. DFALaneNet: A dynamic real-time lane detection network based on adaptive scheduling. Hum. Centric Comput. Inf. Sci. 15 (2025).
Szegedy, C., Toshev, A. & Erhan, D. Deep neural networks for object detection. Adv. Neural Inf. Process. Syst. 26, 1 (2013).
Zhao, Z.-Q., Zheng, P., Xu, S.-T. & Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 30, 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 (2019).
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27, 379–393 (2017).
Pedersen, M., Bruslund Haurum, J., Gade, R. & Moeslund, T. B. Detection of marine animals in a new underwater dataset with varying visibility. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 18–26 (2019).
Saleh, A. et al. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 10, 14671 (2020).
Fu, C. et al. Rethinking general underwater object detection: Datasets, challenges, and solutions. Neurocomputing 517, 243–256 (2023).
Xu, X., Wang, S., Wang, Z., Zhang, X. & Hu, R. Exploring image enhancement for salient object detection in low light images. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1–19. https://doi.org/10.1145/3414839 (2021).
Wang, B., Wang, Z., Guo, W. & Wang, Y. A dual-branch joint learning network for underwater object detection. Knowl. Based Syst. 293, 111672 (2024).
Dai, L., Liu, H., Song, P. & Liu, M. A gated cross-domain collaborative network for underwater object detection. Pattern Recogn. 149, 110222 (2024).
Liu, Z., Wang, B., Li, Y., He, J. & Li, Y. UnitModule: A lightweight joint image enhancement module for underwater object detection. Pattern Recogn. 151, 110435 (2024).
Tian, T., Cheng, J., Wu, D. & Li, Z. Lightweight underwater object detection based on image enhancement and multi-attention. Multimedia Tools Appl. 83, 63075–63093. https://doi.org/10.1007/s11042-023-18008-8 (2024).
Zhang, H. et al. Cdf-uie: Leveraging cross-domain fusion for underwater image enhancement. IEEE Trans. Geosci. Remote Sens. 63, 1–15. https://doi.org/10.1109/TGRS.2025.3553557 (2025).
Zhang, H. et al. Conditional variational underwater image enhancement with kernel decomposition and adaptive hybrid normalization. Neurocomputing 650, 130845. https://doi.org/10.1016/j.neucom.2025.130845 (2025).
Wang, Y. et al. Is underwater image enhancement all object detectors need?. IEEE J. Ocean. Eng. 49, 606–621 (2023).
Wen, J. et al. A real-time framework for domain-adaptive underwater object detection with image enhancement. https://doi.org/10.48550/arXiv.2403.19079 (2024).
Al Muksit, A. et al. YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inf. 72, 101847 (2022).
Zhang, J., Zhang, R., Yan, X., Zhuang, X. & Cao, R. BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection, https://doi.org/10.48550/arXiv.2404.08979 (2024). arXiv:2404.08979 [cs].
Zhang, D. et al. Fish detection based on Gather-and-Distribute mechanism Multi-scale feature fusion network and Structural Re-parameterization method. J. Hydroinf. 26, 1234–1250 (2024).
Li, X., Zhao, Y., Su, H., Wang, Y. & Chen, G. Efficient underwater object detection based on feature enhancement and attention detection head. Sci. Rep. 15, 5973. https://doi.org/10.1038/s41598-025-89421-2 (2025).
Nguyen-Ngoc, H., Shin, C., Hong, S. & Jeong, H. Cyber physical solutions for aquatic monitoring using YOLO with BCP loss for intelligent underwater camouflaged object detection. Sci. Rep. 15, 41214. https://doi.org/10.1038/s41598-025-25090-5 (2025).
Fu, C., Xiao, J., Yuan, W., Liu, R. & Fan, X. Learning cruxes to push for object detection in low-quality images. IEEE Trans. Circ. Syst. Video Technol. 1, 1–1. https://doi.org/10.1109/TCSVT.2024.3432580 (2024).
Yu, W., Zhao, L. & Jia, Y. Dehaze-EEGAN: Remote sensing image dehazing using a generative adversarial network with. Hum. Centric Comput. Inf. Sci. 15, 1 (2025).
Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint . arXiv:2312.00752 (2023).
Guo, M.-H., Lu, C.-Z., Liu, Z.-N., Cheng, M.-M. & Hu, S.-M. Visual attention network. Comput. Visual Media 9, 733–752 (2023).
Yang, J., Li, C., Dai, X. & Gao, J. Focal modulation networks. Adv. Neural. Inf. Process. Syst. 35, 4203–4217 (2022).
Ma, X., Dai, X., Bai, Y., Wang, Y. & Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5694–5703 (2024).
Xie, C. et al. MAT: Multi-range attention transformer for efficient image super-resolution. IEEE Trans. Circ. Syst. Video Technol. (2025).
Cao, Q. et al. LH-YOLO: a Lightweight and high-precision SAR ship detection model based on the improved YOLOv8n. Remote Sens. 16, 4340 (2024).
Ling, J., Fu, Z. & Yuan, X. Lightweight coal mine conveyor belt foreign object detection based on improved Yolov8n. Sci. Rep. 15, 10361 (2025).
Lu, X. et al. Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion. Vis. Comput. 41, 4221–4238. https://doi.org/10.1007/s00371-024-03656-0 (2025).
Jiang, F., Hou, X. & Xia, M. Element-wise Multiplication Based Deeper Physics-Informed Neural Networks, https://doi.org/10.48550/arXiv.2406.04170 (2024). arXiv:2406.04170 [cs].
Inayatullah, G. & Shafiq, P. Element-wise multiplicative operators in vision, language, and multimodal learning. Preprints https://doi.org/10.20944/preprints202505.1290.v1 (2025).
Sanjeet, V., Inayatullah, G. & Shib, J. Element-wise Multiplicative Operations in Neural Architectures: A Comprehensive Survey of the Hadamard Product. Authorea Preprints (2025).
Chrysos, G. G., Wu, Y., Pascanu, R., Torr, P. & Cevher, V. Hadamard product in deep learning: Introduction, Advances and Challenges, https://doi.org/10.48550/arXiv.2504.13112 (2025). arXiv:2504.13112 [cs].
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27, 379–393 (2017).
Xiang, W., Yang, P., Wang, S., Xu, B. & Liu, H. Underwater image enhancement based on red channel weighted compensation and gamma correction model. Opto-Electron. Adv. 1, 180024 (2018).
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Sbert, M. Color channel compensation (3C): A fundamental pre-processing step for image enhancement. IEEE Trans. Image Process. 29, 2653–2665 (2019).
Tao, Y., Dong, L. & Xu, W. A novel two-step strategy based on white-balancing and fusion for underwater image enhancement. IEEE Access 8, 217651–217670 (2020).
Zhang, W., Wang, Y. & Li, C. Underwater image enhancement by attenuated color channel correction and detail preserved contrast enhancement. IEEE J. Ocean. Eng. 47, 718–735 (2022).
Sun, Y., Yuan, B., Li, Z., Liu, Y. & Zhao, D. Rethinking underwater crab detection via defogging and channel compensation. Fishes 9, 60 (2024).
Li, A., Yu, L. & Tian, S. Underwater biological detection based on YOLOv4 combined with channel attention. J. Mar. Sci. Eng. 10, 469 (2022).
Pramanick, A., Sarma, S. & Sur, A. X-caunet: Cross-color channel attention with underwater image-enhancing transformer. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3550–3554 (IEEE, 2024).
Li, C. et al. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2019).
Jiang, L. et al. Underwater Species Detection using Channel Sharpening Attention. In Proceedings of the 29th ACM International Conference on Multimedia, 4259–4267, https://doi.org/10.1145/3474085.3475563 (ACM, Virtual Event China, 2021).
Kay, J. & Merrifield, M. The Fishnet Open Images Database: A Dataset for Fish Detection and Fine-Grained Categorization in Fisheries, https://doi.org/10.48550/arXiv.2106.09178 (2021). arXiv:2106.09178 [cs].
Akkaynak, D. & Treibitz, T. Sea-thru: A method for removing water from underwater images. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1682–1691, https://doi.org/10.1109/CVPR.2019.00178 (2019).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. Adv. Neural. Inf. Process. Syst. 37, 107984–108011 (2024).
Khanam, R. & Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements, https://doi.org/10.48550/arXiv.2410.17725 (2024). arXiv:2410.17725 [cs].
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132–7141, https://doi.org/10.1109/CVPR.2018.00745 (2018).
Liu, C. et al. A dataset and benchmark of underwater object detection for robot picking. In 2021 IEEE international conference on multimedia & expo workshops (ICMEW), 1–6 (IEEE, 2021).
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. In et al.Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, vol. 8693, 740–755, https://doi.org/10.1007/978-3-319-10602-1_48 (Springer International Publishing, Cham, 2014). Series Title: Lecture Notes in Computer Science.
Chen, Y. et al. YOLO-MS: Rethinking multi-scale representation learning for real-time object detection. IEEE Trans. Pattern Anal. Mach. Intell. (2025). Publisher: IEEE.
Zhou, J. et al. Spatial residual for underwater object detection. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2025.3548652 (2025).
Ge, Z., Liu, S., Wang, F., Li, Z. & Sun, J. Yolox: Exceeding yolo series in 2021. arXiv preprint . arXiv:2107.08430 (2021).
Xu, S. et al. Pp-yoloe: An evolved version of yolo (2022). arXiv:2203.16250.
Wang, C. et al. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural. Inf. Process. Syst. 36, 51094–51112 (2023).
Zhang, H. et al. Conditional variational underwater image enhancement with kernel decomposition and adaptive hybrid normalization. Neurocomputing 650, 130845. https://doi.org/10.1016/j.neucom.2025.130845 (2025).
Acknowledgements
This work was supported in part by a project supported by Scientific Research Fund of Hunan Provincial Education Department of China under Grant 25C1710.
Author information
Authors and Affiliations
Contributions
Rongrong Gong and Jihan Xu wrote the main manuscript text and Zhixiang Zheng contributed to the development and fine-tuning of the algorithm, and Dengyong Zhang assisted with manuscript writing and revisions. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gong, R., Xu, J., Zheng, Z. et al. YOLO-Starfish: fish object detection learning complex underwater features. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44187-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-44187-z


