Abstract
Underwater object detection plays a crucial role in applications such as marine ecological monitoring and underwater rescue operations. However, challenges such as limited underwater data availability and low scene diversity hinder detection accuracy. In this paper, we propose the Underwater Layout-Guided Diffusion Framework (ULGF), a diffusion model-based framework designed to augment underwater detection datasets. Unlike conventional methods that generate underwater images by integrating in-air information, ULGF operates exclusively on a small set of underwater images and their corresponding labels, requiring no external data. We have publicly released the ULGF source code and the generated dataset for further research. Our approach enables the generation of high-fidelity, diverse, and theoretically infinite underwater images, substantially enhancing object detection performance in real-world underwater scenarios. Furthermore, we evaluate the quality of the generated underwater images, demonstrating that ULGF produces images with a smaller domain gap.
Similar content being viewed by others
Data availability
In terms of data availability, the datasets and models generated or used in this study have been made available in the following ways: The RUOD dataset can be accessed at https://github.com/dlut-dimt/RUOD. The UDD dataset is available at https://github.com/chongweiliu/udd_official. The underwater image generation dataset, the pre-trained detection model and weights related to ULGF are available from the corresponding author on reasonable request.
Code availability
The code for ULGF can be found at the following URL: https://github.com/maxiaoha666/ULGF.
References
Wu, Z. et al. Self-supervised underwater image generation for underwater domain pre-training. IEEE Trans. Instrum. Meas. 73, 5012714 (2024).
Zhou, J. et al. AMSP-UOD: when vortex convolution and stochastic perturbation meet underwater object detection. Proc. AAAI Conf. Artif. Intell. 38, 7659–7667 (2024).
Zeng, L., Sun, B. & Zhu, D. Underwater target detection based on Faster R-CNN and adversarial occlusion network. Eng. Appl. Artif. Intell. 100, 0952–1976 (2021).
Sun, B., Zhang, W., Xing, C. & Li, Y. Underwater moving target detection and tracking based on enhanced You Only Look Once and deep simple online and real-time tracking strategy. Eng. Appl. Artif. Intell. 143, 0952–1976 (2025).
Cao, J. et al. Unveiling the underwater world: CLIP perception model-guided underwater image enhancement. Pattern Recognit. 162, 111395 (2025).
Fu, Z., Wang, W., Huang, Y., Ding, X. & Ma, K. Uncertainty inspired underwater image enhancement. Proceedings of the European Conference on Computer Vision 465–482 (Springer, 2022).
Zhang, W. et al. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 31, 3997–4010 (2022).
Peng, L., Zhu, C. & Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 32, 3066–3079 (2021).
Fu, C. et al. Rethinking general underwater object detection: datasets, challenges, and solutions. Neurocomputing 517, 243–256 (2023).
Chen, K. et al. GeoDiffusion: text-prompted geometric control for object detection data generation. In Proceedings of International Conference on Learning Representations, 1–12 (ICLR, 2024).
Redmon, J. et al. You only look once: unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision on Pattern Recognition, 779–788 (IEEE, 2016).
Liu, H., Song, P. & Ding, R. WQT and DG-YOLO: towards domain generalization in underwater object detection. Preprint at https://arxiv.org/abs/2004.06333 (2020).
Li, C., et al. YOLOv6: a single-stage object detection framework for industrial applications. Preprint at https://arxiv.org/abs/2209.02976 (2022).
Zhao, Y., et al. DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision on Pattern Recognition, 16965–16974 (IEEE, 2024).
Li, C. et al. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2019).
Zhang, F. et al. Atlantis: enabling underwater depth estimation with stable diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision on Pattern Recognition 11852–11861 (IEEE, 2024).
Bynagari, N. B. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Asian J. Appl. Sci. Eng. 8, 25–34 (2019).
Li, C., Anwar, S. & Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 98, 0031–3203 (2020).
Wang, N. et al. UWGAN: Underwater GAN for real-world underwater color restoration and dehazing. Preprint at https://arxiv.org/abs/1912.10269 (2019).
Ye, T. et al. Underwater light field retention: neural rendering for underwater imaging.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 488–497 (IEEE, 2022).
Salimans, T. et al. Improved techniques for training GANs. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 2234–2242 (2016).
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision on Pattern Recognition, 1–9 (IEEE, 2015).
Wang, Z. et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Yan, J. et al. Underwater image enhancement via multiscale disentanglement strategy. Sci Rep 15, 6076 (2025).
Liu, K. et al. A maneuverable underwater vehicle for near-seabed observation. Nat Commun 15, 10284 (2024).
Xu, S. et al. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 527, 204–232 (2023).
Rombach, R. et al. High-resolution image synthesis with latent diffusion models. InProceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10684–10695 (IEEE, 2022).
Zhang, L., Rao, A. & Agrawala, M. Scaling in-the-wild training for diffusion-based illumination harmonization and editing by imposing consistent light transport. In Proceedings of the 13th International Conference on Representation Learning (ICLR, 2024).
Radford, A., et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, 8748–8763 (PMLR, 2021).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Gao, H. et al. SCP-Diff: spatial-categorical joint prior for diffusion based semantic image synthesis. Proceedings of the European Conference on Computer Vision 37–54 (Springer Nature Switzerland, 2024).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2015, 234–241 (Springer Int. Publ., 2015).
Acknowledgements
This research was supported in part by the National Natural Science Foundation of China (62403108, 42301256), the Liaoning Provincial Natural Science Foundation Joint Fund (2023-MSBA-075), the Aeronautical Science Foundation of China (20240001042002), the Scientific Research Foundation of Liaoning Provincial Education Department (LJKQR20222509), the Fundamental Research Funds for the Central Universities (N2426005), the Science and Technology Planning Project of Liaoning Province (2023JH1/11200011 and 2024JH2/10240049).
Author information
Authors and Affiliations
Contributions
Y.Z.: Data management, annotation of training datasets, data analysis, and interpretation. L.M.: Data management and preparation, algorithm development and validation, comparative experiments, data analysis and interpretation, and manuscript drafting. J.L.: Comparative experiments and manuscript drafting. Y.X.: Algorithm design and implementation, experimental validation, and data visualization. B.C.: Method design and supervision of the research process. L.L.: Project management, funding acquisition, resource provision, and manuscript review. C.W.: Validation of experimental results and critical feedback on the manuscript. W.C.: Data curation and model performance evaluation. Z.L.: Investigation and assistance with manuscript revision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Engineering thanks Fenglei Han, Ajisha Mathias and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: [Philip Coatsworth].
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhuang, Y., Ma, L., Liu, J. et al. A diffusion model-based image generation framework for underwater object detection. Commun Eng (2025). https://doi.org/10.1038/s44172-025-00579-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44172-025-00579-z


