YOLO-Starfish: fish object detection learning complex underwater features

Gong, Rongrong; Xu, Jihan; Zheng, Zhixiang; Zhang, Dengyong

doi:10.1038/s41598-026-44187-z

Download PDF

Article
Open access
Published: 18 March 2026

YOLO-Starfish: fish object detection learning complex underwater features

Rongrong Gong¹^na1,
Jihan Xu²^na1,
Zhixiang Zheng² &
…
Dengyong Zhang²

Scientific Reports , Article number: (2026) Cite this article

1188 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Unlike many other fields of object detection, underwater object detection presents some unique challenges. Underwater object detection is a key technology that enables underwater robots to explore aquatic environments. This task is affected by many unavoidable factors, including poor image quality, high environmental randomness, and concealment of fish. These factors make it difficult to perceive and detect fish underwater. To address these complex issues in underwater environments, this paper proposes a fish detection model named YOLO-Starfish and an Underwater Freshwater Fish Dataset (UFFD). The UFFD contains 16,904 unduplicated images of 19 species, encompassing photographs of a variety of complex underwater environments. YOLO-Starfish builds upon YOLOv8, specifically, the C2Star module and the Attention-driven Enhancement Module (ADEM) are proposed. The C2Star leverages the “star operation” (element-wise multiplication) to achieve high-dimensional feature distribution in a physically consistent manner, mimicking the modulation characteristics of underwater optical degradation. Meanwhile, the ADEM mitigates the impact of image channel imbalance by adaptively enhancing channel-driven features, thereby improving the model’s robustness in underwater environments. Experimental results demonstrate that YOLO-Starfish not only performs well on underwater object detection datasets (RUOD and our UFFD) but also achieves excellent performance on the common object detection dataset benchmark COCO2017. The source code is available at https://github.com/Sdafah/YOLO-Starfish.

Real-time jellyfish classification and detection algorithm based on improved YOLOv4-tiny and improved underwater image enhancement algorithm

Article Open access 10 August 2023

An integration of ensemble deep learning with hybrid optimization approaches for effective underwater object detection and classification model

Article Open access 29 March 2025

A Lightweight underwater detector enhanced by Attention mechanism, GSConv and WIoU on YOLOv8

Article Open access 28 October 2024

Data availability

Code of YOLO-Starfish is available from https://github.com/Sdafah/YOLO-Starfish. UFFD’s data are available, readers should contact the corresponding author for details. The RUOD dataset and the COCO2017 dataset is available online.

References

Zhang, Y. et al. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 619, 526–532. https://doi.org/10.1038/s41586-023-06184-4 (2023).
Google Scholar
Qin, Y. et al. UrbanEvolver: Function-Aware Urban Layout Regeneration. Int. J. Comput. Vision 132, 3408–3427. https://doi.org/10.1007/s11263-024-02030-w (2024).
Google Scholar
Dai, L. et al. A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 30, 584–594 (2024).
Google Scholar
He, W. et al. DFALaneNet: A dynamic real-time lane detection network based on adaptive scheduling. Hum. Centric Comput. Inf. Sci. 15 (2025).
Szegedy, C., Toshev, A. & Erhan, D. Deep neural networks for object detection. Adv. Neural Inf. Process. Syst. 26, 1 (2013).
Google Scholar
Zhao, Z.-Q., Zheng, P., Xu, S.-T. & Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 30, 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 (2019).
Google Scholar
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27, 379–393 (2017).
Google Scholar
Pedersen, M., Bruslund Haurum, J., Gade, R. & Moeslund, T. B. Detection of marine animals in a new underwater dataset with varying visibility. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 18–26 (2019).
Saleh, A. et al. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 10, 14671 (2020).
Google Scholar
Fu, C. et al. Rethinking general underwater object detection: Datasets, challenges, and solutions. Neurocomputing 517, 243–256 (2023).
Google Scholar
Xu, X., Wang, S., Wang, Z., Zhang, X. & Hu, R. Exploring image enhancement for salient object detection in low light images. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1–19. https://doi.org/10.1145/3414839 (2021).
Google Scholar
Wang, B., Wang, Z., Guo, W. & Wang, Y. A dual-branch joint learning network for underwater object detection. Knowl. Based Syst. 293, 111672 (2024).
Google Scholar
Dai, L., Liu, H., Song, P. & Liu, M. A gated cross-domain collaborative network for underwater object detection. Pattern Recogn. 149, 110222 (2024).
Google Scholar
Liu, Z., Wang, B., Li, Y., He, J. & Li, Y. UnitModule: A lightweight joint image enhancement module for underwater object detection. Pattern Recogn. 151, 110435 (2024).
Google Scholar
Tian, T., Cheng, J., Wu, D. & Li, Z. Lightweight underwater object detection based on image enhancement and multi-attention. Multimedia Tools Appl. 83, 63075–63093. https://doi.org/10.1007/s11042-023-18008-8 (2024).
Google Scholar
Zhang, H. et al. Cdf-uie: Leveraging cross-domain fusion for underwater image enhancement. IEEE Trans. Geosci. Remote Sens. 63, 1–15. https://doi.org/10.1109/TGRS.2025.3553557 (2025).
Google Scholar
Zhang, H. et al. Conditional variational underwater image enhancement with kernel decomposition and adaptive hybrid normalization. Neurocomputing 650, 130845. https://doi.org/10.1016/j.neucom.2025.130845 (2025).
Google Scholar
Wang, Y. et al. Is underwater image enhancement all object detectors need?. IEEE J. Ocean. Eng. 49, 606–621 (2023).
Google Scholar
Wen, J. et al. A real-time framework for domain-adaptive underwater object detection with image enhancement. https://doi.org/10.48550/arXiv.2403.19079 (2024).
Al Muksit, A. et al. YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inf. 72, 101847 (2022).
Google Scholar
Zhang, J., Zhang, R., Yan, X., Zhuang, X. & Cao, R. BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection, https://doi.org/10.48550/arXiv.2404.08979 (2024). arXiv:2404.08979 [cs].
Zhang, D. et al. Fish detection based on Gather-and-Distribute mechanism Multi-scale feature fusion network and Structural Re-parameterization method. J. Hydroinf. 26, 1234–1250 (2024).
Google Scholar
Li, X., Zhao, Y., Su, H., Wang, Y. & Chen, G. Efficient underwater object detection based on feature enhancement and attention detection head. Sci. Rep. 15, 5973. https://doi.org/10.1038/s41598-025-89421-2 (2025).
Google Scholar
Nguyen-Ngoc, H., Shin, C., Hong, S. & Jeong, H. Cyber physical solutions for aquatic monitoring using YOLO with BCP loss for intelligent underwater camouflaged object detection. Sci. Rep. 15, 41214. https://doi.org/10.1038/s41598-025-25090-5 (2025).
Google Scholar
Fu, C., Xiao, J., Yuan, W., Liu, R. & Fan, X. Learning cruxes to push for object detection in low-quality images. IEEE Trans. Circ. Syst. Video Technol. 1, 1–1. https://doi.org/10.1109/TCSVT.2024.3432580 (2024).
Google Scholar
Yu, W., Zhao, L. & Jia, Y. Dehaze-EEGAN: Remote sensing image dehazing using a generative adversarial network with. Hum. Centric Comput. Inf. Sci. 15, 1 (2025).
Google Scholar
Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint . arXiv:2312.00752 (2023).
Guo, M.-H., Lu, C.-Z., Liu, Z.-N., Cheng, M.-M. & Hu, S.-M. Visual attention network. Comput. Visual Media 9, 733–752 (2023).
Google Scholar
Yang, J., Li, C., Dai, X. & Gao, J. Focal modulation networks. Adv. Neural. Inf. Process. Syst. 35, 4203–4217 (2022).
Google Scholar
Ma, X., Dai, X., Bai, Y., Wang, Y. & Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5694–5703 (2024).
Xie, C. et al. MAT: Multi-range attention transformer for efficient image super-resolution. IEEE Trans. Circ. Syst. Video Technol. (2025).
Cao, Q. et al. LH-YOLO: a Lightweight and high-precision SAR ship detection model based on the improved YOLOv8n. Remote Sens. 16, 4340 (2024).
Google Scholar
Ling, J., Fu, Z. & Yuan, X. Lightweight coal mine conveyor belt foreign object detection based on improved Yolov8n. Sci. Rep. 15, 10361 (2025).
Google Scholar
Lu, X. et al. Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion. Vis. Comput. 41, 4221–4238. https://doi.org/10.1007/s00371-024-03656-0 (2025).
Google Scholar
Jiang, F., Hou, X. & Xia, M. Element-wise Multiplication Based Deeper Physics-Informed Neural Networks, https://doi.org/10.48550/arXiv.2406.04170 (2024). arXiv:2406.04170 [cs].
Inayatullah, G. & Shafiq, P. Element-wise multiplicative operators in vision, language, and multimodal learning. Preprints https://doi.org/10.20944/preprints202505.1290.v1 (2025).
Sanjeet, V., Inayatullah, G. & Shib, J. Element-wise Multiplicative Operations in Neural Architectures: A Comprehensive Survey of the Hadamard Product. Authorea Preprints (2025).
Chrysos, G. G., Wu, Y., Pascanu, R., Torr, P. & Cevher, V. Hadamard product in deep learning: Introduction, Advances and Challenges, https://doi.org/10.48550/arXiv.2504.13112 (2025). arXiv:2504.13112 [cs].
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 27, 379–393 (2017).
Google Scholar
Xiang, W., Yang, P., Wang, S., Xu, B. & Liu, H. Underwater image enhancement based on red channel weighted compensation and gamma correction model. Opto-Electron. Adv. 1, 180024 (2018).
Google Scholar
Ancuti, C. O., Ancuti, C., De Vleeschouwer, C. & Sbert, M. Color channel compensation (3C): A fundamental pre-processing step for image enhancement. IEEE Trans. Image Process. 29, 2653–2665 (2019).
Google Scholar
Tao, Y., Dong, L. & Xu, W. A novel two-step strategy based on white-balancing and fusion for underwater image enhancement. IEEE Access 8, 217651–217670 (2020).
Google Scholar
Zhang, W., Wang, Y. & Li, C. Underwater image enhancement by attenuated color channel correction and detail preserved contrast enhancement. IEEE J. Ocean. Eng. 47, 718–735 (2022).
Google Scholar
Sun, Y., Yuan, B., Li, Z., Liu, Y. & Zhao, D. Rethinking underwater crab detection via defogging and channel compensation. Fishes 9, 60 (2024).
Google Scholar
Li, A., Yu, L. & Tian, S. Underwater biological detection based on YOLOv4 combined with channel attention. J. Mar. Sci. Eng. 10, 469 (2022).
Google Scholar
Pramanick, A., Sarma, S. & Sur, A. X-caunet: Cross-color channel attention with underwater image-enhancing transformer. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3550–3554 (IEEE, 2024).
Li, C. et al. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2019).
Google Scholar
Jiang, L. et al. Underwater Species Detection using Channel Sharpening Attention. In Proceedings of the 29th ACM International Conference on Multimedia, 4259–4267, https://doi.org/10.1145/3474085.3475563 (ACM, Virtual Event China, 2021).
Kay, J. & Merrifield, M. The Fishnet Open Images Database: A Dataset for Fish Detection and Fine-Grained Categorization in Fisheries, https://doi.org/10.48550/arXiv.2106.09178 (2021). arXiv:2106.09178 [cs].
Akkaynak, D. & Treibitz, T. Sea-thru: A method for removing water from underwater images. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1682–1691, https://doi.org/10.1109/CVPR.2019.00178 (2019).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. Adv. Neural. Inf. Process. Syst. 37, 107984–108011 (2024).
Google Scholar
Khanam, R. & Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements, https://doi.org/10.48550/arXiv.2410.17725 (2024). arXiv:2410.17725 [cs].
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132–7141, https://doi.org/10.1109/CVPR.2018.00745 (2018).
Liu, C. et al. A dataset and benchmark of underwater object detection for robot picking. In 2021 IEEE international conference on multimedia & expo workshops (ICMEW), 1–6 (IEEE, 2021).
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. In et al.Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, vol. 8693, 740–755, https://doi.org/10.1007/978-3-319-10602-1_48 (Springer International Publishing, Cham, 2014). Series Title: Lecture Notes in Computer Science.
Chen, Y. et al. YOLO-MS: Rethinking multi-scale representation learning for real-time object detection. IEEE Trans. Pattern Anal. Mach. Intell. (2025). Publisher: IEEE.
Zhou, J. et al. Spatial residual for underwater object detection. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2025.3548652 (2025).
Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z. & Sun, J. Yolox: Exceeding yolo series in 2021. arXiv preprint . arXiv:2107.08430 (2021).
Xu, S. et al. Pp-yoloe: An evolved version of yolo (2022). arXiv:2203.16250.
Wang, C. et al. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural. Inf. Process. Syst. 36, 51094–51112 (2023).
Google Scholar
Zhang, H. et al. Conditional variational underwater image enhancement with kernel decomposition and adaptive hybrid normalization. Neurocomputing 650, 130845. https://doi.org/10.1016/j.neucom.2025.130845 (2025).
Google Scholar

Download references

Acknowledgements

This work was supported in part by a project supported by Scientific Research Fund of Hunan Provincial Education Department of China under Grant 25C1710.

Author information

These authors contributed equally: Rongrong Gong and Jihan Xu.

Authors and Affiliations

School of Software, Changsha Social Work College, Changsha, 410004, China
Rongrong Gong
School of Computer Science and Technology, Changsha University of Science and Technology, Changsha, 410114, China
Jihan Xu, Zhixiang Zheng & Dengyong Zhang

Authors

Rongrong Gong
View author publications
Search author on:PubMed Google Scholar
Jihan Xu
View author publications
Search author on:PubMed Google Scholar
Zhixiang Zheng
View author publications
Search author on:PubMed Google Scholar
Dengyong Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Rongrong Gong and Jihan Xu wrote the main manuscript text and Zhixiang Zheng contributed to the development and fine-tuning of the algorithm, and Dengyong Zhang assisted with manuscript writing and revisions. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dengyong Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gong, R., Xu, J., Zheng, Z. et al. YOLO-Starfish: fish object detection learning complex underwater features. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44187-z

Download citation

Received: 26 November 2025
Accepted: 10 March 2026
Published: 18 March 2026
DOI: https://doi.org/10.1038/s41598-026-44187-z