Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Integrating psychological profiling with deep learning for enhanced boxing action recognition
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 09 January 2026

Integrating psychological profiling with deep learning for enhanced boxing action recognition

  • Taiping Li1,
  • Yanqing Yan1 &
  • Lingtao Wen2 

Scientific Reports , Article number:  (2026) Cite this article

  • 542 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational science
  • Information technology

Abstract

The recognition of the physical activities of humans, especially in sports events such as boxing, is an intricate issue that has been addressed mainly by the traditional models of videos without the input of psychological dynamics. Other than this, mental states like anxiety, confidence, and focus have been found to impact performance massively, the use of which has been underdeveloped in current deep learning frameworks. This study proposes a multimodal deep learning framework that combines psychological profiling with video-based boxing action recognition. The approach is designed to overcome the shortcomings of existing visual analysis models, which fail to disengage mechanically similar actions because of their differing contextual backgrounds. The proposed framework combines 3D-ResNet for spatiotemporal feature extraction from boxing videos with a BERT-based encoder for athlete psychological profiles, and the resulting representations are fused at the feature level for classification. Experiments were conducted using the HMDB51-Boxing subset and the newly constructed PsyBox-20 dataset, which links psychological states with action instances through standardized self-report scales. Results demonstrate that the multimodal model achieves an accuracy of 91.2% and an F1-score of 90.9%, outperforming video-only and psychology-only baselines as well as several state-of-the-art unimodal methods. Further analysis shows that psychological characteristics are especially appreciable at distinguishing between visually similar actions, e.g., between jab and hook, where context and cognitive condition play a key role in action execution. It is necessary to mention that the current framework does not support real-time deployment and is aimed to be developed in the future. However, the obtained results validate the hypothesis that psychological profiling adds accuracy to recognition and gives helpful information to AI-led sports analytics and coaching behaviors.

Similar content being viewed by others

Boxing behavior recognition based on artificial intelligence convolutional neural network with sports psychology assistant

Article Open access 01 April 2024

Improving the functional status of young boxers in the preparatory period of training with biofeedback

Article Open access 13 November 2025

Research on the construction of cheerleading technique evaluation and teaching system integrating deep visual recognition and cognitive feedback mechanism

Article Open access 26 September 2025

Data availability

The experimental data can be obtained by contacting the corresponding author.

References

  1. Tong, A. W. The Science and Philosophy of Martial Arts: Exploring the Connections between the cognitive, physical, and Spiritual Aspects of Martial Arts (Blue Snake Books, 2022).

  2. Host, K. & Ivašić-Kos, M. An overview of human action recognition in sports based on computer vision. Heliyon 8, (2022).

  3. Williams, J. M. E. Applied Sport Psychology: Personal Growth To Peak Performance (Mayfield Publishing Co, 1993).

  4. Hanin, Y. Emotions in sport: current issues and perspectives. In G. Tenenbaum & R. Eklund. Handbook of Sport Psychology 31–58. (2007).

  5. Zhang, L. Behaviour detection and recognition of college basketball players based on multimodal sequence matching and deep neural networks. Comput. Intell. Neurosci. 2022, 7599685 (2022).

    Google Scholar 

  6. Ji, S., Xu, W., Yang, M. & Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2012).

    Google Scholar 

  7. Martens, R., Vealey, R. S. & Burton, D. Competitive anxiety in sport. (1990).

  8. Vealey, R. S. Conceptualization of sport-confidence and competitive orientation: preliminary investigation and instrument development. J. Sport Exerc. Psychol. 8, 221–246 (1986).

    Google Scholar 

  9. Buss, A. H. & Perry, M. The aggression questionnaire. J. Personal. Soc. Psychol. 63, 452 (1992).

    Google Scholar 

  10. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T. & Serre, T. HMDB: a large video database for human motion recognition. In International Conference on Computer Vision 2556–2563. (2011).

  11. Donahue, J. et al. Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2625–2634. (2015).

  12. Poppe, R. A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010).

    Google Scholar 

  13. Tu, Z. et al. Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recogn. 79, 32–43 (2018).

    Google Scholar 

  14. Aasman, S., Ben-David, A. & Brügger, N. The Routledge Companion To Transnational Web Archive Studies (Taylor & Francis Group, 2024).

  15. Pang, Y. et al. Applications of AI in martial arts: A survey. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology 17543371241273827, (2024).

  16. Hackfort, D. & Schinke, R. J. The Routledge International Encyclopedia of Sport and Exercise Psychology: Theoretical and Methodological Concepts 1 (Routledge, 2020).

  17. Bandura, A. Self-efficacy: toward a unifying theory of behavioral change. Psychol. Rev. 84, 191 (1977).

    Google Scholar 

  18. Filaire, E., Alix, D., Ferrand, C. & Verger, M. Psychophysiological stress in tennis players during the first single match of a tournament. Psychoneuroendocrinology 34 150–157, (2009).

  19. Taylor, J. A conceptual model for integrating athletes’ needs and sport demands in the development of competitive mental Preparation strategies. Sport Psychol. 9, 339–357 (1995).

    Google Scholar 

  20. Neverova, N., Wolf, C., Taylor, G. & Nebout, F. Moddrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1692–1706 (2015).

    Google Scholar 

  21. Baltrušaitis, T., Ahuja, C. & Morency, L. P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018).

    Google Scholar 

  22. Poria, S. et al. Multi-level multiple attentions for contextual multimodal sentiment analysis. In IEEE International Conference on Data Mining (ICDM) 1033–1038. (2017).

  23. Zolfaghari, M., Singh, K. & Brox, T. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV) 695–712. (2018).

  24. Petmezas, G., Vanian, V., Konstantoudakis, K., Almaloglou, E. E. & Zarpalas, D. Video deepfake detection using a hybrid CNN-LSTM-Transformer model for identity verification. Multimedia Tools Appl. 1–20, (2025).

  25. Chen, Y. C. et al. Uniter: Learning universal image-text representations. https://arXiv.org/abs/1909.11740. (2019).

  26. Dey, A. & Biswas, S. Workout action recognition in video streams using an attention driven residual DC-GRU network. Comput. Mater. Continua 79, (2024).

  27. Wei, X. & Wang, Z. TCN-attention-HAR: human activity recognition based on attention mechanism time convolutional network. Sci. Rep. 14, 7414 (2024).

    Google Scholar 

  28. SCNUlyx E. L and HMDB51-Boxing Subset, Kaggle. https://www.kaggle.com/datasets/easonlll/hmdb51 (2020).

  29. Si, C., Jing, Y., Wang, W., Wang, L. & Tan, T. Skeleton-based action recognition with hierarchical Spatial reasoning and Temporal stack learning network. Pattern Recogn. 107, 107511 (2020).

    Google Scholar 

  30. Hara, K., Kataoka, H. & Satoh, Y. Learning spatio-temporal features with 3d residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops 3154–3160. (2017).

  31. Han, Y., Zhang, P., Zhuo, T., Huang, W. & Zhang, Y. Going deeper with two-stream ConvNets for action recognition in video surveillance. Pattern Recognit. Lett. 107, 83–90 (2018).

    Google Scholar 

  32. Yan, S., Xiong, Y. & Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, (2018).

  33. Liu, Z. et al. Video swin transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 3202–3211. (2022).

Download references

Author information

Authors and Affiliations

  1. Department of Physical Education, Guangdong University of Science & Technology, Dongguan, 523000, Guangdong, China

    Taiping Li & Yanqing Yan

  2. Department of Physical Education, Guangzhou Huashang College, Guangzhou, 511300, Guangdong, China

    Lingtao Wen

Authors
  1. Taiping Li
    View author publications

    Search author on:PubMed Google Scholar

  2. Yanqing Yan
    View author publications

    Search author on:PubMed Google Scholar

  3. Lingtao Wen
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization, Lingtao Wen; methodology, Taiping Li; investigation, Taiping Li; resources, Yanqing Yan; writing—original draft preparation, Yanqing Yan; writing—review and editing, Lingtao Wen; supervision, Lingtao Wen. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lingtao Wen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study was conducted in accordance with the ethical standards of Guangzhou Huashang College. The research protocol was reviewed and approved by the Institutional Review Board of Guangzhou Huashang College with the approval number 20250116. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Consent to participate

Informed consent was obtained from all individual participants included in the study. All participants were provided with a full explanation of the study’s purpose, procedures, and potential risks and benefits, and they were assured of the confidentiality of their responses and the voluntary nature of their participation.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Appendices

Appendix

Appendix A. Detailed specification of the PsyBox-20 dataset

The PsyBox-20 dataset was created to capture psychological conditions associated with boxing actions under controlled training conditions. The data set comprises a profile of 20 trained athletes who were required to reenact standard boxing moves (jab, hook, uppercut, block, and footwork). It is balanced with psychological measures taken before each action set. This appendix provides an in-depth explanation of the psychological measures employed, the form of the participants’ profiles, and the scoring methods used to derive model-ready features.

Psychological assessment instruments

Competitive state anxiety inventory-2 (CSAI-2)

CSAI-2 is a 27-item questionnaire widely used in sport psychology to assess three performance-relevant constructs:

  • Cognitive Anxiety (CA): worries, intrusive thoughts, impaired concentration.

  • Somatic Anxiety (SA): perceived physiological activation (e.g., muscle tension, heart rate).

  • Self-Confidence (SC): belief in one’s ability to execute effectively under pressure.

The sub-scales have nine items, rated on a 4-point Likert scale. The sum of the subscales (range 936) is conventional because the subscale items are relatively redundant, and only at the aggregate level make sense. An increase in CA/SA scores means high anxiety; an increase in SC scores means preparedness and perceived control. Such subscale scales were normalized to [0, 1] and added as three distinct features in PsyBox-20.

Sport confidence inventory (SCI)

The SCI assesses an athlete’s confidence in competitive contexts. It includes items measuring:

  • confidence in skills and decision-making.

  • confidence in physical conditioning.

  • confidence relative to opponents.

Responses were summed to produce a single confidence index, later normalized to [0, 1]. This measure complements CSAI-2 by capturing stable rather than state-dependent confidence.

Aggression questionnaire (AQ)

The AQ evaluates four dimensions of aggression:

  • Physical Aggression.

  • Verbal Aggression.

  • Anger.

  • Hostility.

  • These subcomponents are behavioral orientations that can affect an athlete’s risk tolerance and responses in combat sports. Just like with other scales, the subscale scores were normalized to [0, 1]. The aggregated aggression index is included, and separate subcomponent scores are included in the final PsyBox-20 profile in order to maintain the nuance.

Dataset structure and feature composition

Each PsyBox-20 entry corresponds to one athlete × one action batch, producing a structured profile with the following elements:

  • Participant ID.

  • Action Label (jab, hook, uppercut, block, footwork).

  • CSAI-2 Subscale Scores (CA, SA, SC).

  • SCI Score.

  • AQ Subscale Scores (physical, verbal, anger, hostility).

  • Optional biometric indicators (e.g., heart rate, reaction time), recorded when available.

All aspects were brought to the [0,1] range. The last psychological vector comprises several semantically significant items, not raw questionnaire items. This eliminates duplication and allows compatibility with feature-level fusion.

Data collection and alignment

Psychological tests were conducted directly before the participants completed the action sets assigned to them. The timing is the mental state of the situation pertinent to the next physical performance. In model training, each profile was then compared to video sequences of the same action class.

Important clarification

Because HMDB51-Boxing does not contain psychological labels, PsyBox-20 profiles were matched to HMDB51 clips by action class, rather than by the presumed psychological condition of the people in HMDB51. It is a modeling approach that investigates the possibility of improving recognition performance by leveraging psychological tendencies related to the execution of particular actions. It is not claimed that the HMdb51 actors were experiencing the same psychological conditions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Yan, Y. & Wen, L. Integrating psychological profiling with deep learning for enhanced boxing action recognition. Sci Rep (2026). https://doi.org/10.1038/s41598-025-34771-0

Download citation

  • Received: 26 May 2025

  • Accepted: 31 December 2025

  • Published: 09 January 2026

  • DOI: https://doi.org/10.1038/s41598-025-34771-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Deep learning
  • Psychological profiling
  • Multimodal fusion
  • BERT
  • Sports analytics
  • Feature-level fusion
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics