Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
A novel multi-module neural networks strategy of human emotion recognition in the human-robot interaction
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 28 February 2026

A novel multi-module neural networks strategy of human emotion recognition in the human-robot interaction

  • Khalid Zaman1,2,3,4,
  • Ammad Ul Islam5,
  • Gan Zengkang3,
  • Muhammad Bilal6,
  • Ayman Alharbi7,
  • Sayyed Mudassar Shah8,
  • Sohail Asghar9 &
  • …
  • Hongzhao Wang1,2 

Scientific Reports , Article number:  (2026) Cite this article

  • 1239 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Engineering
  • Mathematics and computing

Abstract

New technologies in human emotion recognition (HER) have drawn considerable attention to use in the fields of security, intelligent customer service, healthcare, educational, human-robot interaction (HRI), and adaptive system training. To identify human emotions, our model incorporates MobileNetV3, Vision Transformer (ViT), RegNet and SE-ResNeXt into a unique deep ensemble classification structure. A Novel Multi Module Neural Networks (MMNNs) architecture is designed in this research for HER for practical application the main purpose of is to identify the human emotions. An innovative approach to improve the performance of HER by integrating MMNNs with Transfer Learning (TL) to train CNNs is researched. The MMNNs classification model is trained by combining features from four CNN models using feature pooling. The key novelty of the model is the novel DEtection TRansformer (DETR) which enhances the CNN learning block. It consists of a CNN that learns low dimensional feature representation, an encoder decoder transformer and a simple Feed Forward Network (FFN) that outputs the final detection prediction, which ultimately boosts face recognition efficiency and accuracy. The MMNNs results are validated on AffectNet, CK + and a custom-made dataset (CMD) achieving accuracy of 91.07%, 87.03% and 96.98% respectively which is further increased by data augmentation technique to 95.09%, 89.15% and 98.13% respectively.

Similar content being viewed by others

Emotion recognition with multiple physiological parameters based on ensemble learning

Article Open access 06 June 2025

RF sensing enabled tracking of human facial expressions using machine learning algorithms

Article Open access 13 November 2024

Multi-branch convolutional neural network with cross-attention mechanism for emotion recognition

Article Open access 01 February 2025

Data availability

The dataset used in this research is available on the following web link [https://github.com/123456789khalid/Human-Emotion-HE-.git] (https:/github.com/123456789khalid/Human-Emotion-HE-.git) .

Abbreviations

MMNNs:

Multi-Module Neural Networks

HER:

Human Emotion Recognition

HRI:

Human Robot Interaction

HCI:

Human Computer Interaction

ER:

Emotion Recognition

DNNs:

Deep Neural Networks

CNNs:

Convolutional Neural Networks

TL:

Transfer Learning

DETR:

DEtection TRansformer

FFN:

Feed Forward Network

CMD:

Custom-Made-Dataset

IIMT Lab:

Institute of Intelligent Manufacturing Technology, Laboratory

ViLT:

Vision and Language Transformers

NMS:

non-maximal suppression

DACL:

Deep Attention Centre Loss

STN:

Spatial Transformation Network

FER:

Facial Expression Recognition

MLP:

multilayer perception

RNNs:

recurrent neural networks

LSTM:

long short-term memory

BDBNs:

Boosted Deep Belief Networks

DCNN:

Deep Convolutional Neural Network

DTN:

deep temporal network

DSN:

deep spatial network

SE:

Squeeze-and-excitation

GRU:

Gated Recurrent Units

IRB:

Inverted Residual Block

CK+:

Cohn-Kanade

References

  1. Hirota, K. & Dong, F. Development of mascot robot system in NEDO project. In 2008 4th International IEEE Conference Intelligent Systems (Vol. 1, pp. 1–38). IEEE. (2008), September.

  2. Yamazaki, Y., Dong, F., Masuda, Y., Uehara, Y., Kormushev, P., Vu, H. A., … Hirota,K. (2009). Intent expression using eye robot for mascot robot system. arXiv preprint arXiv:0904.1631.

  3. Yamazaki, Y., Vu, H. A., Le, Q. P., Fukuda, K., Matsuura, Y., Hannachi, M. S., … Hirota,K. (2008, November). Mascot robot system by integrating eye robot and speech recognition using RT middleware and its casual information recommendation. In Proc. 3rd International Symposium on Computational Intelligence and Industrial Applications (pp. 375–384).

  4. Fukuda, T. et al. Human-robot mutual communication system. In Proceedings 10th IEEE International Workshop on Robot and Human Interactive Communication. ROMAN 2001 (Cat. No. 01TH8591) (pp. 14–19). IEEE. (2001), September.

  5. Liu, Z., Wu, M., Cao, W., Chen, L., Xu, J., Zhang, R., … Mao, J. (2017). A facial expression emotion recognition-based human-robot interaction system. IEEE CAA J. Autom. Sinica, 4(4), 668–676.

  6. Carion, N. et al. End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Cham: Springer International Publishing. (2020), August.

  7. Yahyaoui, M. A., Oujabour, M., Letaifa, L. B. & Bohi, A. Multi-face emotion detection for effective Human-Robot Interaction. arXiv preprint arXiv:2501.07213. (2025).

  8. Pan, S. Y. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (10), 1345–1359 (2010).

    Google Scholar 

  9. Zaman, K. et al. A novel driver emotion recognition system based on deep ensemble classification. Complex & Intelligent Systems, 9(6), 6927–6952. (2023).

  10. Zaman, K., Zengkang, G., Zhaoyun, S., Shah, S. M., Riaz, W., Ji, J., … Attar, R. W.(2025). A Novel Emotion Recognition System for Human–Robot Interaction (HRI) Using Deep Ensemble Classification. International Journal of Intelligent Systems, 2025(1), 6611276.

  11. Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79 (8), 2554–2558 (1982).

    Google Scholar 

  12. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1), 1929–1958 (2014).

    Google Scholar 

  13. Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456). pmlr. (2015), June.

  14. Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 315–323). JMLR Workshop and Conference Proceedings. (2011), June.

  15. Yu, Z. & Zhang, C. Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 435–442). (2015), November.

  16. Perez-Gaspar, L. A., Caballero-Morales, S. O. & Trujillo-Romero, F. Multimodal emotion recognition with evolutionary computation for human-robot interaction. Expert Syst. Appl. 66, 42–61 (2016).

    Google Scholar 

  17. Zaman, K. et al. Driver emotions recognition based on improved faster R-CNN and neural architectural search network. Symmetry 14 (4), 687 (2022).

    Google Scholar 

  18. Riaz, W., Ji, J., Zaman, K. & Zengkang, G. Neural Network-Based Emotion Classification in Medical Robotics: Anticipating Enhanced Human–Robot Interaction in Healthcare. Electronics 14 (7), 1320 (2025).

    Google Scholar 

  19. Mudassar Shah, S., Zengkang, G., Sun, Z., Hussain, T., Zaman, K., Alwabli, A., … Ali,F. (2025). AI-enabled driver assistance: monitoring head and gaze movements for enhanced safety. Complex & Intelligent Systems, 11(7), 297.

  20. Zaman, K., Zengkang, G., Zhaoyun, S., Mansoor, M., Wei, C., Tao, G., … Xiaozhi, Q.(2025). FTDGT: Federated Temporal Dense Granular Transformer-Based Wind Power Forecasting in Medium and Long Term. International Journal of Energy Research, 2025(1), 9377203.

  21. Zaman, K. et al. Accurately recognizing driver emotions through using CNN fused features and NasNet-large model. Alexandria Eng. J. 134, 177–196 (2026).

    Google Scholar 

  22. Khor, H. Q., See, J., Phan, R. C. W. & Lin, W. Enriched long-term recurrent convolutional network for facial micro-expression recognition. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 667–674). IEEE. (2018), May.

  23. Mollahosseini, A., Chan, D. & Mahoor, M. H. Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV) (pp. 1–10). IEEE. (2016), March.

  24. Burkert, P., Trier, F., Afzal, M. Z., Dengel, A. & Liwicki, M. Dexpression: Deep convolutional neural network for expression recognition. arXiv preprint arXiv :150905371. (2015).

  25. Liu, P., Han, S., Meng, Z. & Tong, Y. Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1805–1812). (2014).

  26. Agrawal, A. & Mittal, N. Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. 36 (2), 405–412 (2020).

    Google Scholar 

  27. Liang, D., Liang, H., Yu, Z. & Zhang, Y. Deep convolutional BiLSTM fusion network for facial expression recognition. Visual Comput. 36 (3), 499–508 (2020).

    Google Scholar 

  28. Mohanraj, V., Chakkaravarthy, S. & Vaidehi, V. Ensemble of convolutional neural networks for face recognition. In Recent Developments in Machine Learning and Data Analytics: IC3 2018 467–477 (Springer Singapore, 2018).

    Google Scholar 

  29. Wang, Y., Li, Y., Song, Y. & Rong, X. Facial expression recognition based on auxiliary models. Algorithms 12 (11), 227 (2019).

    Google Scholar 

  30. Li, T. H. S., Kuo, P. H., Tsai, T. N. & Luan, P. C. CNN and LSTM based facial expression analysis model for a humanoid robot. IEEE Access. 7, 93998–94011 (2019).

    Google Scholar 

  31. Nguyen, L. D., Gao, R., Lin, D. & Lin, Z. Biomedical image classification based on a feature concatenation and ensemble of deep CNNs. J. Ambient Intell. Humaniz. Comput. 14 (11), 15455–15467 (2023).

    Google Scholar 

  32. Fan, Y., Lam, J. C. & Li, V. O. Multi-region ensemble convolutional neural network for facial expression recognition. In International Conference on Artificial Neural Networks (pp. 84–94). Cham: Springer International Publishing. (2018), September.

  33. Renda, A., Barsacchi, M., Bechini, A. & Marcelloni, F. Comparing ensemble strategies for deep learning: An application to facial expression recognition. Expert Syst. Appl. 136, 1–11 (2019).

    Google Scholar 

  34. Vinyals, O., Bengio, S. & Kudlur, M. Order matters: Sequence to sequence for sets. (2015). arXiv preprint arXiv:1511.06391.

  35. Alsenan, A., Youssef, B., Alhichri, H. & B., & Mobileunetv3—a combined unet and mobilenetv3 architecture for spinal cord gray matter segmentation. Electronics 11 (15), 2388 (2022).

    Google Scholar 

  36. Prasad, S. B. R. & Chandana, B. S. Mobilenetv3: a deep learning technique for human face expressions identification. Int. J. Inform. Technol. 15 (6), 3229–3243 (2023).

    Google Scholar 

  37. Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., … Adam, H. (2019).Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314–1324).

  38. Fard, A. P., Hosseini, M. M., Sweeny, T. D. & Mahoor, M. H. AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels. IEEE Transactions on Aff ective Computing. (2024). arXiv preprint arXiv:2410.22506.

  39. Lucey, P. et al. The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94–101. (2010).

  40. Lyons, M., Kamachi, M. & Gyoba, J. The Japanese female facial expression (JAFFE) dataset. (No Title). (1998).

  41. Kosti, R., Alvarez, J. M., Recasens, A. & Lapedriza, A. Context based emotion recognition using emotic dataset. IEEE Trans. Pattern Anal. Mach. Intell. 42 (11), 2755–2766 (2019).

    Google Scholar 

  42. Liu, Y. et al. Mafw: A large-scale, multi-modal, compound affective database for dynamic facial expression recognition in the wild. In Proceedings of the 30th ACM international conference on multimedia (pp. 24–32). (2022), October.

  43. Mirzaee, H., Peymanfard, J., Moshtaghin, H., Zeinali, H. & H., & Armanemo: A persian dataset for text-based emotion detection 1–23 (Language Resources and Evaluation, 2025).

  44. Bota, P., Brito, J., Fred, A., Cesar, P. & Silva, H. A real-world dataset of group emotion experiences based on physiological data. Sci. data. 11 (1), 116 (2024).

    Google Scholar 

  45. Liu, S. et al. Spectral Efficient Neural Network-Based M-ary Chirp Spread Spectrum Receivers for Underwater Acoustic Communication. Arab. J. Sci. Eng. 49 (12), 16593–16609 (2024).

    Google Scholar 

  46. Gang, Q. et al. A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance. Electronics 13 (22), 4388 (2024).

    Google Scholar 

  47. Farid, G. et al. An improved deep Q-Learning approach for navigation of an autonomous UAV agent in 3D Obstacle-Cluttered environment. Drones 9 (8), 518 (2025).

    Google Scholar 

  48. Ali, W., Bilal, M., Alharbi, A., Jaffar, A. & SA Hassnain Mohsan. Intelligent Bayesian regularization backpropagation neuro computing paradigm for state features estimation of underwater passive object. Front. Phys. 12, 1374138 (2024).

    Google Scholar 

  49. W ur Rahman, Q. et al. &. A MACA-based energy-efficient MAC protocol using Q-learning technique for underwater acoustic sensor network, 2023 IEEE 11th international conference on computer science and network and technology (ICCSNT) (2023).

  50. Xuezhi, X., Ali, S. M., Farid, G. & M Bilal. Image processing in visual tracking by various techniques with the use of a particle filter – a critical review. J. Flow. Visualization Image Process. 23, 1–2 (2016).

    Google Scholar 

  51. Zaidi, S. M. H., Ashraf, S. N., Iqbal, R., Bilal, M. & HH Zuberi. A Alharbi &. Based AI-Driven posture correction and personalized fitness assistant using computer vision and augmented reality. Int. J. E-Health Med. Commun. (IJEHMC) 16 (1), 1–25 (2025).

  52. Khan, M. A., Songzuo, L., Bilal, M. & Y Wang. Low probability of detection constrained Covert underwater acoustic communication receiver using autoencoders (IEEE Transactions on Communications, 2025).

  53. Ali, S. M., Bilal, M. & R Amin., A Alharbi & A Novel Deep Reinforcement Learning Based Extended Fractal Radial Basis Function Network for State-of‐Charge Estimation. IET Power Electron. 18 (1), e70101 (2026).

  54. Khan, M. A., Liu, S., Bilal, M. & Hassan, A. Convolutional autoencoders for low probability of detection constrained underwater acoustic communications. Ocean Eng. 344, 123720 (2026).

    Google Scholar 

  55. Sankoh, A. P. et al. Automated Facial Pain Assessment Using Dual-Attention CNN with Clinically Calibrated High-Reliability and Reproducibility Framework. Biomimetics 11 (1), 51 (2026).

    Google Scholar 

  56. Bilal, M. et al. Covert underwater communication through cepstrum modulation mimicking Pseudorca crassidens whistles using machine learning. Scientific Reports (2026).

Download references

Acknowledgements

The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia for funding this research work through grant number: 26UQU4290339GSSR01.

Funding

This research work was funded by Umm Al-Qura University, Saudi Arabia under grant number: 26UQU4290339GSSR01.

Author information

Authors and Affiliations

  1. School of Software, Northwestern Polytechnical University, Xi’an Shaanxi, 710129, China

    Khalid Zaman & Hongzhao Wang

  2. China Aviation International and Investment CO.,LTD, Beijing, China

    Khalid Zaman & Hongzhao Wang

  3. Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, 518000, Guangdong, China

    Khalid Zaman & Gan Zengkang

  4. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China

    Khalid Zaman

  5. Institute for Advanced Study in Nuclear Energy & Safety College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, China

    Ammad Ul Islam

  6. School of Engineering, Nanfang College Guangzhou, Guangzhou, 510970, China

    Muhammad Bilal

  7. Computer and Network Engineering Department, College of Computing, Umm Al-Qura University, Mecca, 24231, Saudi Arabia

    Ayman Alharbi

  8. School of Civil and Transportation Engineering, Shenzhen University, Guangdong, Shenzhen, China

    Sayyed Mudassar Shah

  9. Institute for Advanced Study in Nuclear Energy & Safety College of Physics, Shenzhen University, Shenzhen, China

    Sohail Asghar

Authors
  1. Khalid Zaman
    View author publications

    Search author on:PubMed Google Scholar

  2. Ammad Ul Islam
    View author publications

    Search author on:PubMed Google Scholar

  3. Gan Zengkang
    View author publications

    Search author on:PubMed Google Scholar

  4. Muhammad Bilal
    View author publications

    Search author on:PubMed Google Scholar

  5. Ayman Alharbi
    View author publications

    Search author on:PubMed Google Scholar

  6. Sayyed Mudassar Shah
    View author publications

    Search author on:PubMed Google Scholar

  7. Sohail Asghar
    View author publications

    Search author on:PubMed Google Scholar

  8. Hongzhao Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Khalid Zaman, Ammad Ul Islam and Sayyed Mudassar Shah have contributed equally to this work and are the first coauthors.

Corresponding authors

Correspondence to Gan Zengkang, Muhammad Bilal or Hongzhao Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics declarations

“Written informed consent was obtained from all participants for the publication of identifiable images in all figures in this manuscript and confirm that informed consent was obtained from all subjects and/or their legal guardian(s) prior to their participation in the study.”

Compliance with Guidelines and Regulations

All methods involving human participants and/or human tissue samples were carried out in accordance with the relevant ethical guidelines and regulations. The study was approved by “Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, Guangdong 518000, China” with the approval number “Supported by the Post-Doctoral Foundation Project of Shenzhen Polytechnic University (Grant No.6024331021K)”. All participants provided informed consent prior to inclusion in the study.

Approval by institutional committee

We have provided details of the institutional that approved the experimental protocols, including the name of the committee and any relevant approval numbers.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zaman, K., Islam, A.U., Zengkang, G. et al. A novel multi-module neural networks strategy of human emotion recognition in the human-robot interaction. Sci Rep (2026). https://doi.org/10.1038/s41598-026-40798-8

Download citation

  • Received: 10 August 2025

  • Accepted: 16 February 2026

  • Published: 28 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-40798-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Multi-Module Neural Networks (MMNNs)
  • Human-Robot Interaction (HRI)
  • DEtection TRansformer (DETR)
  • Custom-Made Dataset (CMD)
  • Human Computer Interaction (HCI)
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics