Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
An explainable hybrid CNN–transformer model for sign language recognition on edge devices using adaptive fusion and knowledge distillation
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 February 2026

An explainable hybrid CNN–transformer model for sign language recognition on edge devices using adaptive fusion and knowledge distillation

  • Ismail Lamaakal1,
  • Chaymae Yahyati1,
  • Yassine Maleh2,
  • Khalid El Makkaoui1 &
  • …
  • Ibrahim Ouahbi1 

Scientific Reports , Article number:  (2026) Cite this article

  • 157 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Despite recent advances in deep learning (DL) for sign language recognition (SLR), most existing systems remain limited to monolingual datasets, lack interpretability, and are too computationally intensive for real-time edge deployment. With the growing need for inclusive and real-time communication technologies, efficient and deployable SLR systems are of critical importance. This paper presents TinyMSLR, an explainable, lightweight framework designed for isolated-sign (gloss) classification on resource-constrained devices. TinyMSLR combines a ConvNeXt-Tiny encoder for fine-grained local visual cues with a Swin Transformer encoder for long-range spatio-temporal context, and integrates an adaptive fusion gate to balance both streams. To further improve accuracy under strict compute and memory budgets, we introduce a dual-teacher knowledge distillation (KD) scheme that transfers complementary spatial and contextual knowledge from high-capacity CNN and Transformer teachers to the compact student model. We evaluate TinyMSLR in a controlled multilingual setting using two public datasets (DGS RWTH-PHOENIX-Weather 2014T and Mandarin CSL) by constructing a shared subset of 20 semantically aligned sign classes and segmenting RWTH continuous sequences into single-gloss clips. Therefore, all reported results correspond to isolated-sign recognition rather than continuous sentence-level multilingual CSLR. On this benchmark, TinyMSLR achieves 99.28% training accuracy and 99.01% validation accuracy, with an F1-score of 98.96%, while keeping the parameter count under 2.7M. Inference latency is 24 ms on standard CPUs and under 13.5 ms on edge GPUs. Overall, TinyMSLR demonstrates a practical accuracy–efficiency–explainability trade-off that is well aligned with deployment-ready multilingual isolated-sign systems on the edge.

Similar content being viewed by others

Harnessing attention-driven hybrid deep learning with combined feature representation for precise sign language recognition to aid deaf and speech-impaired people

Article Open access 01 September 2025

Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach

Article Open access 30 October 2024

A dual-stream deep learning framework for continuous sign language recognition to enhance communication accessibility in the Ha’il region

Article Open access 03 February 2026

Data availability

The original data presented in the study are openly available. The RWTH-PHOENIX-Weather 2014 Dataset: https://www-i6.informatik.rwth-aachen.de/koller/RWTH-PHOENIX/, accessed on 16 September 2025. The Chinese Sign Language (CSL) Dataset: https://service.tib.eu/ldmservice/dataset/chinese-sign-language, accessed on 16 September 2025.

References

  1. Zhang, Y. & Jiang, X. Recent advances on deep learning for sign language recognition. Comput. Model. Eng. Sci. 139 (3), 2399–2450 (2024).

  2. Essahraui, S. et al. Human behavior analysis: A comprehensive survey on techniques, applications, challenges, and future directions. IEEE Access 13, 128379–128419. https://doi.org/10.1109/ACCESS.2025.3589938 (2025).

    Google Scholar 

  3. Ugale, M., Rodrigues, O., Shinde, A., Desle, K. & Yadav, S. A review on sign language recognition using CNN. In International Conference on Advanced Machine Intelligence and Data Analytics (pp. 251–259) (2023).

  4. Khan, S. et al. Transformers in vision: A survey. ACM Computing Surveys (CSUR) 54(10s), 1–41 (2022).

    Google Scholar 

  5. Jamil, S., Piran, M. J. & Kwon, O. J. A comprehensive survey of transformers for computer vision. Drones 7(5), 287 (2023).

    Google Scholar 

  6. Wang, Y. et al. Vision transformers for image classification: A comparative survey. Technologies 13(1), 32 (2025).

    Google Scholar 

  7. Karche, A. S., Kamble, A. V., Maru, K. A., Kedari, S. S. & Sarpate, D. D. American sign language recognition application. In 2025 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1–6). (IEEE, 2025).

  8. Khan, A. et al. Deep learning approaches for continuous sign language recognition: A comprehensive review. IEEE Access 13, 55524–55544. https://doi.org/10.1109/ACCESS.2025.3554046 (2025).

    Google Scholar 

  9. Salmani Pour Avval, S., Eskue, N. D., Groves, R. M. & Yaghoubi, V. Systematic review on neural architecture search. Artif. Intell. Rev. 58(3), 73 (2025).

    Google Scholar 

  10. Tao, T., Zhao, Y., Liu, T. & Zhu, J. Sign language recognition: A comprehensive review of traditional and deep learning approaches, datasets, and challenges. IEEE Access 12, 75034–75060 (2024).

    Google Scholar 

  11. Ridwan, A. E., Chowdhury, M. I., Mary, M. M. & Abir, M. T. C. Deep neural network-based sign language recognition: A comprehensive approach using transfer learning with explainability. arXiv preprint arXiv:2409.07426 (2024).

  12. Rachha, A. & Seyam, M. Explainable AI in education: Current trends, challenges, and opportunities. In Proceedings of SoutheastCon 2023 (pp. 232–239) (2023).

  13. Renjith, S., Suresh, M. S. & Rashmi, M. An effective skeleton-based approach for multilingual sign language recognition. Eng. Appl. Artif. Intell. 143, 109995 (2025).

    Google Scholar 

  14. Lamaakal, I. et al. A comprehensive survey on tiny machine learning for human behavior analysis. IEEE Internet Things J. 12(16), 32419–32443. https://doi.org/10.1109/JIOT.2025.3565688 (2025).

  15. Tsoukas, V., Gkogkidis, A., Boumpa, E. & Kakarountas, A. A review on the emerging technology of TinyML. ACM Comput. Surv. 56(10), 1–37 (2024).

    Google Scholar 

  16. Trpcheska, A., Zevnik, F. & Bader, S. Towards real-time vision-based sign language recognition on edge devices. In Proceedings of the 2024 IEEE Sensors Applications Symposium (SAS) (pp. 1–6). (IEEE, 2024).

  17. Kallimani, R., Pai, K., Raghuwanshi, P., Iyer, S. & López, O. L. TinyML: Tools, applications, challenges, and future research directions. Multimed. Tools Appl. 83(10), 29015–29045 (2024).

    Google Scholar 

  18. Rajapakse, V., Karunanayake, I. & Ahmed, N. Intelligence at the extreme edge: A survey on reformable TinyML. ACM Comput. Surv. 55(13s), 1–30 (2023).

    Google Scholar 

  19. Lamaakal, I. et al. A tiny inertial transformer for human activity recognition via multimodal knowledge distillation and explainable AI. Sci. Rep. 15, 42335. https://doi.org/10.1038/s41598-025-26297-2 (2025).

    Google Scholar 

  20. Lamaakal, I. et al. Tiny deep learning models with hybrid compression techniques for gesture-based air handwriting recognition of English alphabets on edge device. IEEE Internet Things J. 13(1), 801–820. https://doi.org/10.1109/JIOT.2025.3624283 (2026).

    Google Scholar 

  21. Saini, M. & Susan, S. Tackling class imbalance in computer vision: A contemporary review. Artif. Intell. Rev. 56(Suppl 1), 1279–1335 (2023).

    Google Scholar 

  22. Shabaninia, E., Nezamabadi-pour, H. & Shafizadegan, F. Multimodal action recognition: A comprehensive survey on temporal modeling. Multimed. Tools Appl. 83(20), 59439–59489 (2024).

    Google Scholar 

  23. Miah, A. S. M., Hasan, M. A. M., Nishimura, S. & Shin, J. Sign language recognition using graph and general deep neural network based on large scale dataset. IEEE Access 12, 34553–34569 (2024).

    Google Scholar 

  24. Das, S., Imtiaz, M. S., Neom, N. H., Siddique, N. & Wang, H. A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst. Appl. 213, 118914 (2023).

    Google Scholar 

  25. Arooj, S., Altaf, S., Ahmad, S., Mahmoud, H. & Mohamed, A. S. N. Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language. J. King Saud Univ. - Comput. Inf. Sci. 36(2), 101934 (2024).

    Google Scholar 

  26. Aldhahri, E. et al. Arabic sign language recognition using convolutional neural network and MobileNet. Arab. J. Sci. Eng. 48, 2147–2154. https://doi.org/10.1007/s13369-022-07144-2 (2023).

    Google Scholar 

  27. Das, S., Biswas, S. K. & Purkayastha, B. A deep sign language recognition system for Indian sign language. Neural Comput. Appl. 35, 1469–1481. https://doi.org/10.1007/s00521-022-07840-y (2023).

    Google Scholar 

  28. Alnabih, A. F. & Maghari, A. Y. Arabic sign language letters recognition using Vision Transformer. Multimed. Tools Appl. 83, 81725–81739. https://doi.org/10.1007/s11042-024-18681-3 (2024).

    Google Scholar 

  29. Liu, Y. et al. Sign language recognition from digital videos using feature pyramid network with detection transformer. Multimed. Tools Appl. 82, 21673–21685. https://doi.org/10.1007/s11042-023-14646-0 (2023).

    Google Scholar 

  30. Cui, Z. et al. Spatial–temporal transformer for end-to-end sign language recognition. Complex Intelligent Systems 9, 4645–4656. https://doi.org/10.1007/s40747-023-00977-w (2023).

    Google Scholar 

  31. Kothadiya, D. R., Bhatt, C. M., Saba, T., Rehman, A. & Bahaj, S. A. SIGNFORMER: Deep vision transformer for sign language recognition. IEEE Access 11, 4730–4739. https://doi.org/10.1109/ACCESS.2022.3231130 (2023).

    Google Scholar 

  32. Woods, L. T. & Rana, Z. A. Modelling sign language with encoder-only transformers and human pose estimation keypoint data. Mathematics 11(9), 2129. https://doi.org/10.3390/math11092129 (2023).

    Google Scholar 

  33. Aly, M. & Fathi, I. S. Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model. Sci. Rep. 15(1), 20253. https://doi.org/10.1038/s41598-025-06344-8 (2025).

    Google Scholar 

  34. Lamaakal, I. et al. An explainable Tiny-Fast Kolmogorov-Arnold network for gesture-based air handwriting recognition of Tifinagh letters in resource-constrained IoT device. IEEE Internet Things J. 12(24), 55756–55773. https://doi.org/10.1109/JIOT.2025.3625087 (2025).

    Google Scholar 

  35. Yahyati, C. et al. A systematic review of state-of-the-art TinyML applications in healthcare, education, and transportation. IEEE Access 13, 204513–204562. https://doi.org/10.1109/ACCESS.2025.3633575 (2025).

    Google Scholar 

  36. Lamaakal, I., Yahyati, C., Ouahbi, I., El Makkaoui, K. & Maleh, Y. A survey of model compression techniques for TinyML applications. In 2025 International Conference on Circuit, Systems and Communication (ICCSC) (pp. 1–6). https://doi.org/10.1109/ICCSC66714.2025.11135279 (IEEE, 2025).

  37. Shin, J. et al. Korean sign language recognition using transformer-based deep neural network. Appl. Sci. 13, 3029. https://doi.org/10.3390/app13053029 (2023).

    Google Scholar 

  38. Kondo, T., Narumi, S., He, Z., Shin, D. & Kang, Y. A performance comparison of Japanese sign language recognition with ViT and CNN using angular features. Appl. Sci. 14(8), 3228. https://doi.org/10.3390/app14083228 (2024).

    Google Scholar 

  39. Koller, O., Forster, J. & Ney, H. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015).

    Google Scholar 

  40. Zhu, Q., Li, J., Yuan, F., Fan, J. & Gan, Q. A Chinese continuous sign language dataset based on complex environments. arXiv Preprint, arXiv:2409.11960 (2024).

  41. Zhong, Q. et al. Key frame extraction algorithm of motion video based on priori. IEEE Access 8, 174424–174436 (2020).

    Google Scholar 

  42. Corley, I., Robinson, C., Dodhia, R., Ferres, J. M. L. & Najafirad, P. Revisiting pre-trained remote sensing model benchmarks: Resizing and normalization matters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3162–3172) (2024).

  43. Liu, J., Liu, B., Zhou, H., Li, H. & Liu, Y. TokenMix: Rethinking image mixing for data augmentation in vision transformers. In European Conference on Computer Vision (pp. 455–471). (Cham: Springer, 2022).

  44. Kumar, T., Brennan, R., Mileo, A. & Bendechache, M. Image data augmentation approaches: A comprehensive survey and future directions. IEEE Access 12, 1–23. https://doi.org/10.1109/ACCESS.2024.3470122 (2024).

    Google Scholar 

  45. El-Makkaoui, K., Lamaakal, I., Ouahbi, I., Maleh, Y. & Abd El-Latif, A.A. (Eds.). Tiny Machine Learning Techniques for Constrained Devices (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003544449 (2026).

  46. Chen, L., Vivone, G., Nie, Z., Chanussot, J. & Yang, X. Spatial data augmentation: Improving the generalization of neural networks for pansharpening. IEEE Trans. Geosci. Remote Sens. 61, 1–11. https://doi.org/10.1109/TGRS.2023.3262262 (2023).

    Google Scholar 

  47. Bozgeyikli, L. L., Bozgeyikli, E., Schnell, C. & Clark, J. Exploring horizontally flipped interaction in virtual reality for improving spatial ability. IEEE Trans. Vis. Comput. Graph. 29(11), 4514–4524. https://doi.org/10.1109/TVCG.2023.3320241 (2023).

    Google Scholar 

  48. Lee, E., Kim, S., Kang, W., Seo, D. & Paik, J. Contrast enhancement using dominant brightness level analysis and adaptive intensity transformation for remote sensing images. IEEE Geosci. Remote Sens. Lett. 10(1), 62–66. https://doi.org/10.1109/LGRS.2012.2192412 (2012).

    Google Scholar 

  49. Deshmukh, M. & Bhosle, U. A survey of image registration. Int. J. Image Process. (IJIP) 5(3), 245 (2011).

    Google Scholar 

  50. Yang, L., Liu, S. & Salvi, M. A survey of temporal antialiasing techniques. Comput. Graph. Forum 39(2), 607–621 (2020).

    Google Scholar 

  51. Ferdous, G. J., Sathi, K. A., Hossain, M. A. & Dewan, M. A. A. SPT-Swin: A shifted patch tokenization Swin transformer for image classification. IEEE Access 12, 117617–117626. https://doi.org/10.1109/ACCESS.2024.3448304 (2024).

    Google Scholar 

  52. Liu, Z. et al. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 11976–11986) (2022).

  53. Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 10012–10022) (2021).

  54. Lamaakal, I. et al. Tiny language models for automation and control: Overview, potential applications, and future research directions. Sensors 25(5), 1318 https://doi.org/10.3390/s25051318 (2025).

  55. Lamaakal, I. et al. A TinyDL model for gesture-based air handwriting Arabic numbers and simple Arabic letters recognition. IEEE Access 12, 76589–76605 (2024).

    Google Scholar 

  56. Dwivedi, R. et al. Explainable AI (XAI): Core ideas, techniques, and solutions. ACM Comput. Surv. 55(9), 1–33 (2023).

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

  1. Multidisciplinary Faculty of Nador, Mohammed Premier University, Oujda, Morocco

    Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui & Ibrahim Ouahbi

  2. Laboratory LaSTI, ENSAK, Sultan Moulay Slimane University, Khouribga, Morocco

    Yassine Maleh

Authors
  1. Ismail Lamaakal
    View author publications

    Search author on:PubMed Google Scholar

  2. Chaymae Yahyati
    View author publications

    Search author on:PubMed Google Scholar

  3. Yassine Maleh
    View author publications

    Search author on:PubMed Google Scholar

  4. Khalid El Makkaoui
    View author publications

    Search author on:PubMed Google Scholar

  5. Ibrahim Ouahbi
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Ismail Lamaakal, Chaymae Yahyati: Conceptualization, Methodology, Software, Writing - Original Draft. Ismail Lamaakal, Chaymae Yahyati, Yassine Maleh, Khalid El Makkaoui, Ibrahim Ouahbi: Methodology, Investigation, Writing - Review & Editing.

Corresponding author

Correspondence to Ismail Lamaakal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lamaakal, I., Yahyati, C., Maleh, Y. et al. An explainable hybrid CNN–transformer model for sign language recognition on edge devices using adaptive fusion and knowledge distillation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38478-8

Download citation

  • Received: 11 December 2025

  • Accepted: 29 January 2026

  • Published: 03 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-38478-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Applications of artificial intelligence in video- and audio-signal processing

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics