Object tracking algorithm based on deformable attention mechanism

Liu, Qiaoling; Yu, Na; Cheng, Jinfu

doi:10.1038/s41598-026-43147-x

Download PDF

Article
Open access
Published: 06 March 2026

Object tracking algorithm based on deformable attention mechanism

Qiaoling Liu^1,2,
Na Yu¹ &
Jinfu Cheng³

Scientific Reports , Article number: (2026) Cite this article

601 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Occlusion, sudden illumination changes, and rapid motion in complex scenes severely degrade the robustness of existing object tracking methods. To address this issue, this paper proposes a novel object tracking algorithm that integrates a deformable attention mechanism. The method first embeds a deformable attention module into the ResNet-18 feature extraction network to enable adaptive enhancement of target key features. Second, the method adopts an improved Bidirectional Feature Pyramid Network as the feature fusion module to enhance the representational capability of multi-scale features. Finally, the method incorporates a dynamic Kalman filtering prediction module to improve the algorithm’s adaptability to changes in the target’s motion state and its continuous tracking capability. Experimental results show that the improved feature extraction network achieves an average overlap rate and success rate of 61.5% and 68.4%, respectively, on the GOT-10k dataset, with a computational load of only 1.96 GFLOPs and an increase of only 0.23 M in parameters. On the MOT20 dataset, the proposed object tracking network achieves a Multiple Object Tracking Accuracy of 77.5%, an Identity F1 Score of 77.0%, with 54.6% Majority of Tracked Trajectories and 12.5% Majority of Lost Trajectories. Its tracking performance surpasses that of the compared object tracking algorithms. These results confirm the efficacy of the Deformable Attention Mechanism and present a robust solution for complex dynamic tracking scenarios.

Method for reconstructing safety and arming motion process by integrating Kalman filter and KCF

Article Open access 11 March 2025

A two stage multi object tracking algorithm with transformer and attention mechanism

Article Open access 26 August 2025

Adaptive sparse attention-based compact transformer for object tracking

Article Open access 28 May 2024

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Verma, J. K., Chhabra, J. K. & Ranga, V. Track consensus-based labeled multi-target tracking in mobile distributed sensor network. IEEE Trans. Mob. Comput. 23(6), 7351–7362. https://doi.org/10.1109/TMC.2023.3333916 (2024).
Google Scholar
Mokayed, H., Quan, T. Z., Alkhaled, L. & Sivakumar, V. Real-time human detection and counting system using deep learning computer vision techniques. AIA. 1(4), 205–213. https://doi.org/10.47852/bonviewAIA2202391 (2023).
Google Scholar
Sun, N., Zhao, J., Shi, Q., Liu, C. & Liu, P. Moving target tracking by unmanned aerial vehicle: A survey and taxonomy. IEEE Trans. Ind. Inform. 20(5), 7056–7068. https://doi.org/10.1109/TII.2024.3363084 (2024).
Google Scholar
Zhou, L. & Kumar, V. Robust multi-robot active target tracking against sensing and communication attacks. IEEE Trans. Robot. 39(3), 1768–1780. https://doi.org/10.1109/TRO.2022.3233341 (2023).
Google Scholar
Zhou, G., Zhu, B. & Ye, X. Switch-constrained multiple-model algorithm for maneuvering target tracking. IEEE Trans. Aerosp. Electron. Syst. 59(4), 4414–4433. https://doi.org/10.1109/TAES.2023.3242944 (2023).
Google Scholar
Ma, B. et al. Target tracking control of UAV through deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 24(6), 5983–6000. https://doi.org/10.1109/TITS.2023.3249900 (2023).
Google Scholar
Lin, B. et al. Motion-aware correlation filter-based object tracking in satellite videos. IEEE T GEOSCI REMOTE 62, 1–13. https://doi.org/10.1109/TGRS.2024.3350988 (2024).
Google Scholar
Wang, Y. & Mariano, V. Y. A multi-object tracking framework based on YOLOv8s and bytetrack algorithm. IEEE Access 12, 120711–120719. https://doi.org/10.1109/ACCESS.2024.3450370 (2024).
Google Scholar
Zha, C., Luo, S. & Xu, X. Infrared multi-target detection and tracking in dense urban traffic scenes. IET Image Process. 18(6), 1613–1628. https://doi.org/10.1049/ipr2.13053 (2024).
Google Scholar
Liu, Y., An, B., Chen, S. & Zhao, D. Multi-target detection and tracking of shallow marine organisms based on improved YOLO v5 and DeepSORT. IET Image Process. 18(9), 2273–2290. https://doi.org/10.1049/ipr2.13090 (2024).
Google Scholar
Nguyen, T. T., Nguyen, H. H., Sartipi, M. & Fisichella, M. LaMMOn: Language model combined graph neural network for multi-target multi-camera tracking in online scenarios. Mach. Learn. 113(9), 6811–6837. https://doi.org/10.1007/S10994-024-06592-1 (2024).
Google Scholar
Ishtiaq, N., Gostar, A. K., Bab-Hadiashar, A. & Hoseinnezhad, R. Interaction-aware labeled multi-Bernoulli filter. IEEE Trans. Intell. Transp. Syst. 11, 11668–11681. https://doi.org/10.1109/TITS.2023.3294519 (2023).
Google Scholar
Szántó, P., Kiss, T. & Sipos, K. J. FPGA accelerated DeepSORT object tracking. ICCC 7(1), 423–428. https://doi.org/10.1109/ICCC57093.2023.10178935 (2023).
Google Scholar
Razak, R. N. & Abdullah, H. N. Improving multi-object detection and tracking with deep learning, DeepSORT, and frame cancellation techniques. Open Eng. 14(1), 533–545. https://doi.org/10.1515/eng-2024-0056 (2024).
Google Scholar
Alamri, F. S. & El-Hadidy, M. A. A. Optimal linear tracking for a hidden target on one of K-intervals. J. Eng. Math. 144(1), 8. https://doi.org/10.1007/s10665-023-10315-1 (2024).
Google Scholar
Ayman, B., Malik, M. & Lotfi, B. DAM-SLAM: Depth attention module in a semantic visual SLAM based on objects interaction for dynamic environments. Appl. Intell. 53(21), 25802–25815. https://doi.org/10.1007/s10489-023-04720-3 (2023).
Google Scholar
Ge, Q. et al. Hyper-progressive real-time detection transformer (HPRT-DETR) algorithm for defect detection on metal bipolar plates. Int. J. Hydrogen Energy 74(7), 49–55. https://doi.org/10.1016/j.ijhydene.2024.06.028 (2024).
Google Scholar
Pan, Y., Zhu, C., Luo, L., Liu, Y. & Cheng, Z. FedTrack: A collaborative target tracking framework based on adaptive federated learning. IEEE Trans. Veh. Technol. 73(9), 13868–13882. https://doi.org/10.1109/TVT.2024.3395292 (2024).
Google Scholar
Diaz-Vilor, C., Lozano, A. & Jafarkhani, H. A reinforcement learning approach for wildfire tracking with UAV swarms. IEEE Trans. Wirel. Commun. 24(4), 2766–2782. https://doi.org/10.1109/TWC.2024.3524324 (2025).
Google Scholar
Vial, A., Hendeby, G., Daamen, W., van Arem, B. & Hoogendoorn, S. Framework for network-constrained tracking of cyclists and pedestrians. IEEE Trans. Intell. Transp. Syst. 24(3), 3282–3296. https://doi.org/10.1109/TITS.2022.3225467 (2022).
Google Scholar
Zhang, Z., Zhang, F., Cao, M., Feng, C. & Chen, D. Enhancing UAV-assisted vehicle edge computing networks through a digital twin-driven task offloading framework. Wirel. Netw. 31(1), 965a–9981. https://doi.org/10.1007/s11276-024-03804-3 (2025).
Google Scholar
Aishwarya, N., Chandhana, C. & Gowri, P. Y. S. A Hybrid Approach using modified ResNet18 for Marine Mammal Sound classification. PCS 257, 864–871. https://doi.org/10.1016/PROCS.2025.03.111 (2025).
Google Scholar
Mei, Y. ResNet18 facial feature extraction algorithm improved based on hybrid domain attention mechanism. PLoS One 20(3), e0319921. https://doi.org/10.1371/JOURNAL.PONE.0319921 (2025).
Google Scholar
Fahad, M. et al. Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling. Visual Comput. 41(5), 3473–3486. https://doi.org/10.1007/S00371-024-03613-X (2025).
Google Scholar
Gao, Y., Liu, B., Wang, P. & Wang, P. Acceleration of ResNet18 based on run-time inference engine. ICICM https://doi.org/10.1109/ICICM63644.2024.10814151 (2024).
Google Scholar
Yang, H., Chen, D. & Feng, X. Abnormality Monitoring and Recognition of Surveillance Video Based on ResNet Residual Network. AICIT https://doi.org/10.1109/AICIT62434.2024.10730173 (2024).
Google Scholar
Zhang, H. et al. A defect detection network for painted wall surfaces based on YOLOv5 enhanced by attention mechanism and bi-directional FPN. Soft Comput. 28(17), 10391–10402. https://doi.org/10.1007/s00500-024-09799-5 (2024).
Google Scholar
Adli, T., Bujaković, D., Bondžulić, B., Laidouni, M. Z. & Andrić, M. A modified YOLOv5 architecture for aircraft detection in remote sensing images. J. Indian Soc. Remote Sens. 53(3), 933–948. https://doi.org/10.1007/s12524-024-02033-7 (2025).
Google Scholar
Gao, J. & Zhang, Z. Small target detection based on attention mechanism feature fusion. In: Proc. Fourth Int. Conf. Comput. Vis. Data Mining (ICCVDM 2023). 13063(2): 213–217. https://doi.org/10.1117/12.3021360 (2024).
Divya, G. N. & Koteswara Rao, S. Implementation of ensemble Kalman filter algorithm for underwater target tracking. J. Control Decis. 11(3), 345–354. https://doi.org/10.1080/23307706.2022.2092039 (2024).
Google Scholar
Liu, Y., Nie, L., Dong, R. & Chen, G. BP neural network-Kalman filter fusion method for unmanned aerial vehicle target tracking. Proc. Inst. Mech. Eng. C J. Mech. Eng. Sci. 237(18), 4203–4212. https://doi.org/10.1177/0954406220983864 (2023).
Google Scholar
Shao, D., Gao, G. & Ma, L. Attentional residual network based spatial transformer mechanism for facial expression recognition. J. Intell. Fuzzy Syst. Appl. Eng. Technol. 49(3), 751–766. https://doi.org/10.1177/18758967251355732 (2025).
Google Scholar
Yang, W., Zhang, L., Guo, J., Peng, H. & Liu, Z. Optimizing Facial Expression Recognition: A One-Class Classification Approach Using ResNet18 and CBAM. ICCTech https://doi.org/10.1109/ICCTECH61708.2024.00009 (2024).
Google Scholar
Li, J. et al. Image recognition based on thgs algorithm to optimize resnet-18 model. AAI 1(1), 169–191. https://doi.org/10.59782/AAI.V1I1.284 (2024).
Google Scholar
Xue, C. et al. Similarity-guided layer-adaptive vision transformer for UAV tracking. In: Proceedings of the Computer Vision and Pattern Recognition Conference. 6730–6740. https://doi.org/10.48550/ARXIV.2503.06625 (2025).
Wang, H., Qian, H., Feng, S. & Yan, S. Calyolov4: Lightweight yolov4 target detection based on coordinated attention. J. Supercomput. 79(16), 18947–18969. https://doi.org/10.1007/s11227-023-05380-3 (2023).
Google Scholar
Wu, Q. et al. A lightweight deep learning algorithm for multi-objective detection of recyclable domestic waste. Environ. Eng. Sci. 40(12), 667–677. https://doi.org/10.1089/ees.2023.0138 (2023).
Google Scholar

Download references

Funding

The research is supported by Research on Nonlinear System Model Identification and Optimal Control Method under Weak Continuous Incentive Conditions in Sichuan Province Science and Technology Plan Project (2025ZNSFSC1513).

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu, 610106, China
Qiaoling Liu & Na Yu
Entrepreneurship College, Chengdu University, Chengdu, 610106, China
Qiaoling Liu
Department Engineering, Datong Vocational and Technical College of Coal, Datong, 037000, China
Jinfu Cheng

Authors

Qiaoling Liu
View author publications
Search author on:PubMed Google Scholar
Na Yu
View author publications
Search author on:PubMed Google Scholar
Jinfu Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

Q.L.L. processed the numerical attribute linear programming of communication big data, and the mutual information feature quantity of communication big data numerical attribute was extracted by the cloud extended distributed feature fitting method. N.Y. Combined with fuzzy C-means clustering and linear regression analysis, the statistical analysis of big data numerical attribute feature information was carried out, and the associated attribute sample set of communication big data numerical attribute cloud grid distribution was constructed. J.F.C. did the experiments, recorded data, and created manuscripts. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Qiaoling Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, Q., Yu, N. & Cheng, J. Object tracking algorithm based on deformable attention mechanism. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43147-x

Download citation

Received: 30 September 2025
Accepted: 02 March 2026
Published: 06 March 2026
DOI: https://doi.org/10.1038/s41598-026-43147-x