Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

Li, YiQiang; Luo, ZhenBao; Chen, Tao; Huang, XinJun; Zhong, ChaoZe; Zhu, Ge; Jin, DaiZhong; Cheng, Chen; Zhang, Yi; Zhao, JingTong; Gao, PengCheng

doi:10.1038/s41598-026-39249-1

Download PDF

Article
Open access
Published: 08 March 2026

Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

YiQiang Li¹,
ZhenBao Luo¹,
Tao Chen¹,
XinJun Huang²,
ChaoZe Zhong¹,
Ge Zhu¹,
DaiZhong Jin¹,
Chen Cheng¹,
Yi Zhang¹,
JingTong Zhao¹ &
…
PengCheng Gao¹

Scientific Reports , Article number: (2026) Cite this article

563 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Semantic segmentation of moving objects in dynamic backgrounds faces core challenges such as background interference and blurred target features. This study proposes an innovative architecture that integrates Generative Adversarial Network (GAN) with Transformers. The GAN module enhances adaptability to dynamic backgrounds through adversarial training, while the self-attention mechanism in the Transformer captures long-range semantic dependencies. A gated fusion strategy is designed to achieve dynamic balancing of multimodal features. The method employs a conditional GAN to generate dynamic background samples with variations in illumination and motion blur. A Transformer-based encoder-decoder structure is used to model global contextual relationships. A temporal attention module is introduced to incorporate motion vector fields, improving temporal consistency. Additionally, a KL-divergence (KL) constrained semantic consistency loss optimizes the plausibility of generated samples. Experiments are conducted on both a multi-dimensional simulated dataset and the real-world KITTI dataset. Results show that the proposed model achieves an average Intersection over Union (IoU) of 85.6% in standard dynamic scenes, outperforming DeepLabv3 + by 9.2% points. In low-light and high-speed motion scenarios, the robustness index reaches 92.0%, 8.5 points higher than baseline models. Ablation studies demonstrate that removing the Transformer leads to a 6.7% drop in mIoU, while excluding the feature fusion module reduces robustness by 4.0%, confirming the necessity of both components. Temporal analysis reveals that the model maintains a stable performance of 84.5–86.5% over 20-frame sequences, with fluctuation reduced by 63% compared to baseline. The adversarial training improves the model’s adaptability to lighting changes by 5.3%. The multi-head self-attention (MSA) mechanism reduces long-range misclassification by 6.7%. The gated fusion strategy lowers false positive rates in background-disturbed regions by 12.8%. This framework optimizes segmentation through a generator-segmenter feedback loop, effectively balancing dynamic background noise suppression and semantic fidelity. The contributions are threefold: (1) The first semantic segmentation framework to deeply integrate GANs and Transformers. (2) A theoretical model for dynamic feature gating and semantic consistency constraints. (3) A standardized evaluation system covering 10 dynamic background types and five illumination gradients. This study provides key technical support for real-time environmental perception in autonomous driving and intelligent surveillance, advancing both the theoretical and practical frontiers of dynamic scene understanding.

Data availability

Data is provided within the manuscript or supplementary information files.

References

Zeller, M. et al. Gaussian radar transformer for semantic segmentation in noisy radar data. IEEE Rob. Autom. Lett. 8 (1), 344–351 (2022).
Google Scholar
Lee, J. et al. Improved real-time monocular SLAM using semantic segmentation on selective frames. IEEE Trans. Intell. Transp. Syst. 24 (3), 2800–2813 (2022).
Google Scholar
Esparza, D. & Flores, G. The STDyn-SLAM: a stereo vision and semantic segmentation approach for VSLAM in dynamic outdoor environments. IEEE Access. 10, 18201–18209 (2022).
Google Scholar
Fan, Y. et al. Blitz-SLAM: a semantic SLAM in dynamic environments. Pattern Recogn. 121, 108225 (2022).
Google Scholar
Kuang, B., Yuan, J. & Liu, Q. A robust RGB-D SLAM based on multiple geometric features and semantic segmentation in dynamic environments. Meas. Sci. Technol. 34 (1), 015402 (2022).
Google Scholar
Jia, S. LRD-SLAM: a lightweight robust dynamic SLAM method by semantic segmentation network. Wirel. Commun. Mob. Comput. 2022 (1), 7332390 (2022).
Google Scholar
Mersch, B. et al. Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Rob. Autom. Lett. 8 (8), 5180–5187 (2023).
Google Scholar
He, W. et al. Where can we help? A visual analytics approach to diagnosing and improving semantic segmentation of movable objects. IEEE Trans. Vis. Comput. Graph. 28 (1), 1040–1050 (2021).
Google Scholar
Chen, X. et al. Moving object segmentation in 3D lidar data: a learning-based approach exploiting sequential data. IEEE Rob. Autom. Lett. 6 (4), 6529–6536 (2021).
Google Scholar
Kim, J., Woo, J. & Im, S. Rvmos: range-view moving object segmentation leveraged by semantic and motion features. IEEE Rob. Autom. Lett. 7 (3), 8044–8051 (2022).
Google Scholar
Bielski, A. & Favaro, P. Move: unsupervised movable object segmentation and detection. Adv. Neural. Inf. Process. Syst. 35, 33371–33386 (2022).
Google Scholar
Mersch, B. et al. Receding moving object segmentation in 3d lidar data using sparse 4d convolutions. IEEE Rob. Autom. Lett. 7 (3), 7503–7510 (2022).
Google Scholar
Dang, T. V. & Bui, N. T. Multi-scale fully convolutional network-based semantic segmentation for mobile robot navigation. Electronics 12 (3), 533 (2023).
Google Scholar
Manakitsa, N. et al. A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies 12 (2), 15 (2024).
Google Scholar
Song, T. et al. Ssf-mos: semantic scene flow assisted moving object segmentation for autonomous vehicles. IEEE Trans. Instrum. Meas. 73, 1–12 (2024).
Google Scholar
Tang, F., Zhu, B. & Sun, J. Gradient enhancement techniques and motion consistency constraints for moving object segmentation in 3D LiDAR point clouds. Remote Sens. 17 (2), 195 (2025).
Google Scholar
Cheng, G. & Zheng, J. Y. Sequential semantic segmentation of road profiles for path and speed planning. IEEE Trans. Intell. Transp. Syst. 23 (12), 23869–23882 (2022).
Google Scholar
Acharya, D. et al. Single-image localisation using 3D models: combining hierarchical edge maps and semantic segmentation for domain adaptation. Autom. Constr. 136, 104152 (2022).
Google Scholar
Lu, Y. et al. Label-efficient video object segmentation with motion clues. IEEE Trans. Circ. Syst. Video Technol. 34 (8), 6710–6721 (2023).
Google Scholar
Gupta, D. & Kumar, M. Moving object tracking for surveillance application using semantic segmentation excellence (SemSegX) and TripForceNet. Circ. Syst. Signal. Process. 13 (1), 1–40 (2025).
Google Scholar
Singh, G. et al. Fast semantic-aware motion state detection for visual SLAM in dynamic environment. IEEE Trans. Intell. Transp. Syst. 23 (12), 23014–23030 (2022).
Google Scholar
Fong, W. K. et al. Panoptic nuscenes: a large-scale benchmark for lidar Panoptic segmentation and tracking. IEEE Rob. Autom. Lett. 7 (2), 3795–3802 (2022).
Google Scholar
Kaihao, Z. et al. Adversarial spatio-temporal learning for video deblurring. IEEE Trans. Image Process. 28 (1), 291–301 (2018).
Google Scholar
Muhammad, K. et al. Vision-based semantic segmentation in scene understanding for autonomous driving: recent achievements, challenges, and outlooks. IEEE Trans. Intell. Transp. Syst. 23 (12), 22694–22715 (2022).
Google Scholar
Wilson, J. et al. MotionSC: data set and network for real-time semantic mapping in dynamic environments. IEEE Rob. Autom. Lett. 7 (3), 8439–8446 (2022).
Google Scholar
Sehar, U. & Naseem, M. L. How deep learning is empowering semantic segmentation: traditional and deep learning techniques for semantic segmentation: a comparison. Multimedia Tools Appl. 81 (21), 30519–30544 (2022).
Google Scholar
Arora, M. et al. Static map generation from 3D lidar point clouds exploiting ground segmentation. Robot. Auton. Syst. 159, 104287 (2023).
Google Scholar
Pham, H. N. et al. A new deep learning approach based on bilateral semantic segmentation models for sustainable estuarine wetland ecosystem management. Sci. Total Environ. 838, 155826 (2022).
Google Scholar
Nuo, C. et al. Motion and appearance decoupling representation for event cameras. IEEE Trans. Image Process. 34, 5964–5977 (2025).
Google Scholar
Yuanbo, W. et al. All-in-one Weather-degraded image restoration via adaptive degradation-aware self-prompting model. IEEE Trans. Multimedia. 27, 3343–3355 (2025).
Google Scholar
Hoyer, L. et al. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. Int. J. Comput. Vision. 131 (8), 2070–2096 (2023).
Google Scholar
Wang, S., Zhu, J. & Zhang, R. Meta-rangeseg: lidar sequence semantic segmentation using multiple feature aggregation. IEEE Rob. Autom. Lett. 7 (4), 9739–9746 (2022).
Google Scholar
Zurbrügg, R. et al. Embodied active domain adaptation for semantic segmentation via informative path planning. IEEE Rob. Autom. Lett. 7 (4), 8691–8698 (2022).
Google Scholar
Sharma, D., Dhiman, C. & Kumar, D. XGL-T transformer model for intelligent image captioning. Multimedia Tools Appl. 83 (2), 4219–4240 (2024).
Google Scholar
Sharma, D., Dhiman, C. & Kumar, D. FDT – Dr 2 T: a unified dense radiology report generation transformer framework for X-ray images. Mach. Vis. Appl. 35 (4), 68 (2024).
Google Scholar
Rautela, K. et al. Obscenity detection transformer for detecting inappropriate contents from videos. Multimedia Tools Appl. 83 (4), 10799–10814 (2024).
Google Scholar
Sharma, D., Dhiman, C. & Kumar, D. Control with style: style embedding-based variational autoencoder for controlled stylized caption generation framework. IEEE Trans. Cogn. Dev. Syst. 16 (6), 2032–2042 (2024).
Google Scholar

Download references

Funding

This study received no funding.

Author information

Authors and Affiliations

Norla Institute of Technical Physics, No. 7, Section 4, Renmin South Road, Wuhou District, Chengdu City, 610095, Sichuan Province, China
YiQiang Li, ZhenBao Luo, Tao Chen, ChaoZe Zhong, Ge Zhu, DaiZhong Jin, Chen Cheng, Yi Zhang, JingTong Zhao & PengCheng Gao
Jiangxi Hongdu Aviation Industry Group Co., Ltd., Building 16, Zone 8, Hongdu Aviation Industry Group, Xinxiqiao Road, Qingyunpu District, Nanchang City, 330024, Jiangxi Province, China
XinJun Huang

Authors

YiQiang Li
View author publications
Search author on:PubMed Google Scholar
ZhenBao Luo
View author publications
Search author on:PubMed Google Scholar
Tao Chen
View author publications
Search author on:PubMed Google Scholar
XinJun Huang
View author publications
Search author on:PubMed Google Scholar
ChaoZe Zhong
View author publications
Search author on:PubMed Google Scholar
Ge Zhu
View author publications
Search author on:PubMed Google Scholar
DaiZhong Jin
View author publications
Search author on:PubMed Google Scholar
Chen Cheng
View author publications
Search author on:PubMed Google Scholar
Yi Zhang
View author publications
Search author on:PubMed Google Scholar
JingTong Zhao
View author publications
Search author on:PubMed Google Scholar
PengCheng Gao
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, Y.Q.L.; Z.B.L.; T.C. and X.J.H.; methodology, C.Z.; software, G.Z.; validation, D.Z.J.; C.C. and Y.Q.L.; formal analysis, Z.B.L.; investigation, Y.Z. and J.T.Z.; resources, X.J.H.; P.C.G. and G.Z.; data curation, T.C.; writing—original draft preparation, Y.Q.L. and Z.B.L.; writing—review and editing, Y.Q.L. and Z.B.L.; visualization, X.J.H.; G.Z. and D.Z.J.; supervision, P.C.G.; project administration, J.T.Z.; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to YiQiang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Luo, Z., Chen, T. et al. Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration. Sci Rep (2026). https://doi.org/10.1038/s41598-026-39249-1

Download citation

Received: 10 September 2025
Accepted: 03 February 2026
Published: 08 March 2026
DOI: https://doi.org/10.1038/s41598-026-39249-1

Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1 (download XLSX )

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1 (download XLSX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links