Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 08 March 2026

Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

  • YiQiang Li1,
  • ZhenBao Luo1,
  • Tao Chen1,
  • XinJun Huang2,
  • ChaoZe Zhong1,
  • Ge Zhu1,
  • DaiZhong Jin1,
  • Chen Cheng1,
  • Yi Zhang1,
  • JingTong Zhao1 &
  • …
  • PengCheng Gao1 

Scientific Reports , Article number:  (2026) Cite this article

  • 563 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Semantic segmentation of moving objects in dynamic backgrounds faces core challenges such as background interference and blurred target features. This study proposes an innovative architecture that integrates Generative Adversarial Network (GAN) with Transformers. The GAN module enhances adaptability to dynamic backgrounds through adversarial training, while the self-attention mechanism in the Transformer captures long-range semantic dependencies. A gated fusion strategy is designed to achieve dynamic balancing of multimodal features. The method employs a conditional GAN to generate dynamic background samples with variations in illumination and motion blur. A Transformer-based encoder-decoder structure is used to model global contextual relationships. A temporal attention module is introduced to incorporate motion vector fields, improving temporal consistency. Additionally, a KL-divergence (KL) constrained semantic consistency loss optimizes the plausibility of generated samples. Experiments are conducted on both a multi-dimensional simulated dataset and the real-world KITTI dataset. Results show that the proposed model achieves an average Intersection over Union (IoU) of 85.6% in standard dynamic scenes, outperforming DeepLabv3 + by 9.2% points. In low-light and high-speed motion scenarios, the robustness index reaches 92.0%, 8.5 points higher than baseline models. Ablation studies demonstrate that removing the Transformer leads to a 6.7% drop in mIoU, while excluding the feature fusion module reduces robustness by 4.0%, confirming the necessity of both components. Temporal analysis reveals that the model maintains a stable performance of 84.5–86.5% over 20-frame sequences, with fluctuation reduced by 63% compared to baseline. The adversarial training improves the model’s adaptability to lighting changes by 5.3%. The multi-head self-attention (MSA) mechanism reduces long-range misclassification by 6.7%. The gated fusion strategy lowers false positive rates in background-disturbed regions by 12.8%. This framework optimizes segmentation through a generator-segmenter feedback loop, effectively balancing dynamic background noise suppression and semantic fidelity. The contributions are threefold: (1) The first semantic segmentation framework to deeply integrate GANs and Transformers. (2) A theoretical model for dynamic feature gating and semantic consistency constraints. (3) A standardized evaluation system covering 10 dynamic background types and five illumination gradients. This study provides key technical support for real-time environmental perception in autonomous driving and intelligent surveillance, advancing both the theoretical and practical frontiers of dynamic scene understanding.

Data availability

Data is provided within the manuscript or supplementary information files.

References

  1. Zeller, M. et al. Gaussian radar transformer for semantic segmentation in noisy radar data. IEEE Rob. Autom. Lett. 8 (1), 344–351 (2022).

    Google Scholar 

  2. Lee, J. et al. Improved real-time monocular SLAM using semantic segmentation on selective frames. IEEE Trans. Intell. Transp. Syst. 24 (3), 2800–2813 (2022).

    Google Scholar 

  3. Esparza, D. & Flores, G. The STDyn-SLAM: a stereo vision and semantic segmentation approach for VSLAM in dynamic outdoor environments. IEEE Access. 10, 18201–18209 (2022).

    Google Scholar 

  4. Fan, Y. et al. Blitz-SLAM: a semantic SLAM in dynamic environments. Pattern Recogn. 121, 108225 (2022).

    Google Scholar 

  5. Kuang, B., Yuan, J. & Liu, Q. A robust RGB-D SLAM based on multiple geometric features and semantic segmentation in dynamic environments. Meas. Sci. Technol. 34 (1), 015402 (2022).

    Google Scholar 

  6. Jia, S. LRD-SLAM: a lightweight robust dynamic SLAM method by semantic segmentation network. Wirel. Commun. Mob. Comput. 2022 (1), 7332390 (2022).

    Google Scholar 

  7. Mersch, B. et al. Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Rob. Autom. Lett. 8 (8), 5180–5187 (2023).

    Google Scholar 

  8. He, W. et al. Where can we help? A visual analytics approach to diagnosing and improving semantic segmentation of movable objects. IEEE Trans. Vis. Comput. Graph. 28 (1), 1040–1050 (2021).

    Google Scholar 

  9. Chen, X. et al. Moving object segmentation in 3D lidar data: a learning-based approach exploiting sequential data. IEEE Rob. Autom. Lett. 6 (4), 6529–6536 (2021).

    Google Scholar 

  10. Kim, J., Woo, J. & Im, S. Rvmos: range-view moving object segmentation leveraged by semantic and motion features. IEEE Rob. Autom. Lett. 7 (3), 8044–8051 (2022).

    Google Scholar 

  11. Bielski, A. & Favaro, P. Move: unsupervised movable object segmentation and detection. Adv. Neural. Inf. Process. Syst. 35, 33371–33386 (2022).

    Google Scholar 

  12. Mersch, B. et al. Receding moving object segmentation in 3d lidar data using sparse 4d convolutions. IEEE Rob. Autom. Lett. 7 (3), 7503–7510 (2022).

    Google Scholar 

  13. Dang, T. V. & Bui, N. T. Multi-scale fully convolutional network-based semantic segmentation for mobile robot navigation. Electronics 12 (3), 533 (2023).

    Google Scholar 

  14. Manakitsa, N. et al. A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies 12 (2), 15 (2024).

    Google Scholar 

  15. Song, T. et al. Ssf-mos: semantic scene flow assisted moving object segmentation for autonomous vehicles. IEEE Trans. Instrum. Meas. 73, 1–12 (2024).

    Google Scholar 

  16. Tang, F., Zhu, B. & Sun, J. Gradient enhancement techniques and motion consistency constraints for moving object segmentation in 3D LiDAR point clouds. Remote Sens. 17 (2), 195 (2025).

    Google Scholar 

  17. Cheng, G. & Zheng, J. Y. Sequential semantic segmentation of road profiles for path and speed planning. IEEE Trans. Intell. Transp. Syst. 23 (12), 23869–23882 (2022).

    Google Scholar 

  18. Acharya, D. et al. Single-image localisation using 3D models: combining hierarchical edge maps and semantic segmentation for domain adaptation. Autom. Constr. 136, 104152 (2022).

    Google Scholar 

  19. Lu, Y. et al. Label-efficient video object segmentation with motion clues. IEEE Trans. Circ. Syst. Video Technol. 34 (8), 6710–6721 (2023).

    Google Scholar 

  20. Gupta, D. & Kumar, M. Moving object tracking for surveillance application using semantic segmentation excellence (SemSegX) and TripForceNet. Circ. Syst. Signal. Process. 13 (1), 1–40 (2025).

    Google Scholar 

  21. Singh, G. et al. Fast semantic-aware motion state detection for visual SLAM in dynamic environment. IEEE Trans. Intell. Transp. Syst. 23 (12), 23014–23030 (2022).

    Google Scholar 

  22. Fong, W. K. et al. Panoptic nuscenes: a large-scale benchmark for lidar Panoptic segmentation and tracking. IEEE Rob. Autom. Lett. 7 (2), 3795–3802 (2022).

    Google Scholar 

  23. Kaihao, Z. et al. Adversarial spatio-temporal learning for video deblurring. IEEE Trans. Image Process. 28 (1), 291–301 (2018).

    Google Scholar 

  24. Muhammad, K. et al. Vision-based semantic segmentation in scene understanding for autonomous driving: recent achievements, challenges, and outlooks. IEEE Trans. Intell. Transp. Syst. 23 (12), 22694–22715 (2022).

    Google Scholar 

  25. Wilson, J. et al. MotionSC: data set and network for real-time semantic mapping in dynamic environments. IEEE Rob. Autom. Lett. 7 (3), 8439–8446 (2022).

    Google Scholar 

  26. Sehar, U. & Naseem, M. L. How deep learning is empowering semantic segmentation: traditional and deep learning techniques for semantic segmentation: a comparison. Multimedia Tools Appl. 81 (21), 30519–30544 (2022).

    Google Scholar 

  27. Arora, M. et al. Static map generation from 3D lidar point clouds exploiting ground segmentation. Robot. Auton. Syst. 159, 104287 (2023).

    Google Scholar 

  28. Pham, H. N. et al. A new deep learning approach based on bilateral semantic segmentation models for sustainable estuarine wetland ecosystem management. Sci. Total Environ. 838, 155826 (2022).

    Google Scholar 

  29. Nuo, C. et al. Motion and appearance decoupling representation for event cameras. IEEE Trans. Image Process. 34, 5964–5977 (2025).

    Google Scholar 

  30. Yuanbo, W. et al. All-in-one Weather-degraded image restoration via adaptive degradation-aware self-prompting model. IEEE Trans. Multimedia. 27, 3343–3355 (2025).

    Google Scholar 

  31. Hoyer, L. et al. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. Int. J. Comput. Vision. 131 (8), 2070–2096 (2023).

    Google Scholar 

  32. Wang, S., Zhu, J. & Zhang, R. Meta-rangeseg: lidar sequence semantic segmentation using multiple feature aggregation. IEEE Rob. Autom. Lett. 7 (4), 9739–9746 (2022).

    Google Scholar 

  33. Zurbrügg, R. et al. Embodied active domain adaptation for semantic segmentation via informative path planning. IEEE Rob. Autom. Lett. 7 (4), 8691–8698 (2022).

    Google Scholar 

  34. Sharma, D., Dhiman, C. & Kumar, D. XGL-T transformer model for intelligent image captioning. Multimedia Tools Appl. 83 (2), 4219–4240 (2024).

    Google Scholar 

  35. Sharma, D., Dhiman, C. & Kumar, D. FDT – Dr 2 T: a unified dense radiology report generation transformer framework for X-ray images. Mach. Vis. Appl. 35 (4), 68 (2024).

    Google Scholar 

  36. Rautela, K. et al. Obscenity detection transformer for detecting inappropriate contents from videos. Multimedia Tools Appl. 83 (4), 10799–10814 (2024).

    Google Scholar 

  37. Sharma, D., Dhiman, C. & Kumar, D. Control with style: style embedding-based variational autoencoder for controlled stylized caption generation framework. IEEE Trans. Cogn. Dev. Syst. 16 (6), 2032–2042 (2024).

    Google Scholar 

Download references

Funding

This study received no funding.

Author information

Authors and Affiliations

  1. Norla Institute of Technical Physics, No. 7, Section 4, Renmin South Road, Wuhou District, Chengdu City, 610095, Sichuan Province, China

    YiQiang Li, ZhenBao Luo, Tao Chen, ChaoZe Zhong, Ge Zhu, DaiZhong Jin, Chen Cheng, Yi Zhang, JingTong Zhao & PengCheng Gao

  2. Jiangxi Hongdu Aviation Industry Group Co., Ltd., Building 16, Zone 8, Hongdu Aviation Industry Group, Xinxiqiao Road, Qingyunpu District, Nanchang City, 330024, Jiangxi Province, China

    XinJun Huang

Authors
  1. YiQiang Li
    View author publications

    Search author on:PubMed Google Scholar

  2. ZhenBao Luo
    View author publications

    Search author on:PubMed Google Scholar

  3. Tao Chen
    View author publications

    Search author on:PubMed Google Scholar

  4. XinJun Huang
    View author publications

    Search author on:PubMed Google Scholar

  5. ChaoZe Zhong
    View author publications

    Search author on:PubMed Google Scholar

  6. Ge Zhu
    View author publications

    Search author on:PubMed Google Scholar

  7. DaiZhong Jin
    View author publications

    Search author on:PubMed Google Scholar

  8. Chen Cheng
    View author publications

    Search author on:PubMed Google Scholar

  9. Yi Zhang
    View author publications

    Search author on:PubMed Google Scholar

  10. JingTong Zhao
    View author publications

    Search author on:PubMed Google Scholar

  11. PengCheng Gao
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization, Y.Q.L.; Z.B.L.; T.C. and X.J.H.; methodology, C.Z.; software, G.Z.; validation, D.Z.J.; C.C. and Y.Q.L.; formal analysis, Z.B.L.; investigation, Y.Z. and J.T.Z.; resources, X.J.H.; P.C.G. and G.Z.; data curation, T.C.; writing—original draft preparation, Y.Q.L. and Z.B.L.; writing—review and editing, Y.Q.L. and Z.B.L.; visualization, X.J.H.; G.Z. and D.Z.J.; supervision, P.C.G.; project administration, J.T.Z.; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to YiQiang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Luo, Z., Chen, T. et al. Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration. Sci Rep (2026). https://doi.org/10.1038/s41598-026-39249-1

Download citation

  • Received: 10 September 2025

  • Accepted: 03 February 2026

  • Published: 08 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-39249-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Dynamic background
  • Semantic segmentation
  • Transformer
  • Generative adversarial network
  • Dynamic characteristics
Download PDF

Associated content

Collection

Deep learning for real-time object detection

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics