Abstract
Dynamic object detection and tracking are essential components of intelligent video surveillance systems, enabling real-time monitoring and early identification of anomalous activities. Existing approaches often rely on either spatial appearance modeling or temporal sequence analysis, which limits robustness in crowded and dynamically evolving scenes. This study first evaluates representative spatial and temporal baseline models for theft detection, including an EfficientNetV2B0–HOG framework and a ConvLSTM-based temporal model, which achieve F1-scores of 0.86 and high recall but suffer from limited temporal consistency and sensitivity to data imbalance. To address these limitations, we propose an attention-guided spatio-temporal hybrid framework, referred to as HybridModel-1, which integrates object-level spatial detection with temporal motion modeling. The proposed model incorporates an Adaptive Feature Fusion Module (AFFM) to dynamically emphasize salient spatial features and a Temporal Confidence Reweighting Loss to suppress temporally inconsistent predictions. Evaluated on large-scale surveillance benchmarks including UCF-Crime, ShanghaiTech, and DCSASS, the proposed framework achieves an accuracy of 87.6%, a precision of 95.6%, a recall of 77.1%, and a ROC–AUC of 0.96, outperforming standalone spatial and temporal baselines. Ablation studies further confirm the effectiveness of the proposed fusion and temporal consistency mechanisms, demonstrating the model’s suitability for real-time surveillance applications.
Data availability
The DCSASS (Dynamic Crime and Security Anomaly Surveillance System) dataset used in this study is publicly available on Kaggle at: https://www.kaggle.com/datasets/mateohervas/dcsass-dataset. The UCF-Crime dataset is also publicly available for academic research at: https://www.crcv.ucf.edu/projects/real-world/. Both datasets are open-access and do not involve direct interaction with human subjects. All the data used in this research were obtained from publicly available repositories and are fully anonymized.
References
Pandurangan, K. & Nagappan, K. A Deep Assessment of Thermal Image-Based Object Detection for a Wide Range of Applications. in 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications (AIMLA) (IEEE, 2024). https://doi.org/10.1109/AIMLA59606.2024.10531492.
Ilić, V. The Integration of Artificial Intelligence and Computer Vision in Large-Scale Video Surveillance of Railway Stations. in 2024 Zooming Innovation in Consumer Technologies Conference (ZINC) (IEEE, 2024). https://doi.org/10.1109/ZINC61849.2024.10579411.
Bose, S., Kolekar, M. H., Nawale, S. & Khut, D. LoLTV: A low light two-wheeler violation dataset with anomaly detection technique. IEEE Access. 11, 124951–124961 (2023). https://doi.org/10.1109/ACCESS.2023.3329737
Ul Amin, S. et al. EADN: An efficient deep learning model for anomaly detection in videos. Mathematics 10(9), 1555. https://doi.org/10.3390/math10091555 (2022).
Yang, Y. Research on Real-time Dynamic Object Detection Based on YOLOv3 Deep Learning Network. 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI). IEEE. (2023). https://doi.org/10.1109/ICETCI57876.2023.10176887
Thinakaran, N. T. J. K. CNN-Based Moving Object Detection from Surveillance Video in Comparison with GMM (IEEE, 2022).
Amin, S. U., Hussain, A., Kim, B. & Seo, S. Deep learning based active learning technique for data annotation and improve the overall performance of classification models, Expert Syst. Appl. 228, 120391. https://doi.org/10.1016/j.eswa.2023.120391 (2023).
Modi, P., Menon, D., Areeckal, A. S. & Verma, A. Real- time Object Tracking in Videos using Deep Learning and Optical Flow. in Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024) (IEEE, 2024). https://doi.org/10.1109/IDCIOT59759.2024.10467997.
Jyothi, D. N., Vardhan, N. V., Reddy, G. H. & Prashanth, B. Collaborative Training of Object Detection and Re- Identification in Multi-Object Tracking Using YOLOv8. in 2024 International Conference on Computing and Data Science (ICCDS) (IEEE, 2024). https://doi.org/10.1109/ICCDS60734.2024.10560451.
Ul Amin, S., Sibtain Abbas, M., Kim, B., Jung, Y. & Seo, S. Enhanced Anomaly detection in pandemic surveillance videos: An attention approach With EfficientNet-B0 and CBAM Integration. IEEE Access. 12, 162697–162712 (2024). https://doi.org/10.1109/ACCESS.2024.3488797
Al-Jawahry, H. M., Alkhafaji, M. A., Ravindran, G., Kumar, P. S. & Hussein, A. H. An Effective Object Tracking Using YOLOv3 with Bidirectional Feature Pyramid Network on Video Surveillance (IEEE, 2023).
Elaoua, A., Nadour, M., Elasri, A. & Cherroun, L. Real- Time People Counting System using YOLOv8 Object Detection. in 2023 2nd International Conference on Electronics, Energy and Measurement(IC2EM) (IEEE, 2023). https://doi.org/10.1109/IC2EM59347.2023.10419684.
Supreeth, H. S. G. & Patil, C. M. Moving Object Detection and Tracking Using Deep Learning Neural Network and Correlation Filter. in Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT) (IEEE, 2018).
Al-E’mari, S., Sanjalawe, Y. & Alqudah, H. Integrating Enhanced Security Protocols with Moving Object Detection: A Yolo-Based Approach for Real-Time Surveillance. in 2024 2nd International Conference on Cyber Resilience (ICCR) (IEEE, 2024). https://doi.org/10.1109/ICCR61006.2024.10532863.
Thomas, K. L. R., Pandeeswaran, C., Sanjay, G. J. & Raghi, K. R. Advanced CCTV Surveillance Anomaly Detection, Alert Generation, and Crowd Management using Deep Learning Algorithm. in 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT) (IEEE, 2024).
Bose, S., Ramesh, C. D. & Kolekar, M. H. Vehicle Classification and Counting for Traffic Video Monitoring Using YOLO-v3. in International Conference on Connected Systems & Intelligence (CSI), Trivandrum, India, 2022, 1–8, (Trivandrum, India, 2022). https://doi.org/10.1109/CSI54720.2022.9924018.
Kapoor, P. Video Surveillance Detection of Moving Object Using Deep Learning. in 2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) (IEEE, 2023). https://doi.org/10.1109/SMARTGENCON60755.2023.10442023.
Devi, M. T. S. & Dhanalakshmi, M. A., S., & M. L., S., & N., L. Anomaly Detection in Video Surveillance. in 2024 IEEE 9th International Conference for Convergence in Technology (I2CT) (IEEE, 2024). https://doi.org/10.1109/I2CT61223.2024.10543949.
Yan, R., Schubert, L., Kamm, A., Komar, M. & Schreier, M. Deep Generic Dynamic Object Detection Based on Dynamic Grid Maps. in 2024 IEEE Intelligent Vehicles Symposium (IV) (IEEE, 2024). https://doi.org/10.1109/IV55156.2024.10588415.
Antony, J. C., Chowdary, C. L. S., Prabhu, N., Murali, E. & Mayan, A. Advancing Crowd Management through Innovative Surveillance using YOLOv8 and ByteTrack. in 2024 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET) (IEEE, 2024). https://doi.org/10.1109/WISPNET61464.2024.10533138.
Chandan, G., Jain, A., Jain, H. & Mohana Real-Time Object Detection and Tracking Using Deep Learning and OpenCV. in Proceedings of the International Conference on Inventive Research in Computing Applications (ICIRCA) (IEEE, 2018).
Funding
Open access funding provided by Vellore Institute of Technology.
Author information
Authors and Affiliations
Contributions
S.D.N. Conceptualization, Methodology design, Supervision, Manuscript writing, and Corresponding author responsibilities. S.J. Model development, Dataset preparation. K.V. Implementation of framework and result analysis. A.V. Development of the model and effective implementation of the model. V.V.S. Experimental Validation. M.S.P. Structural design of the research framework, technical validation, and proofreading. P.P. Implementation of ConvLSTM model, data preprocessing, and performance evaluation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval and informed consent
All methods were carried out in accordance with relevant guidelines and regulations. The datasets (DCSASS and UCF-Crime) used in this study consist entirely of publicly available, anonymized surveillance video footage that does not contain identifiable human subjects. Therefore, this research did not require ethical approval or informed consent, as no human participants or personal data were involved.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nivethika, S.D., Joshi, S., Verma, K. et al. Attention-guided saptio-temporal feature fusion for robus video surveillance anomaly detection. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36130-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-36130-z