ADAT novel time-series-aware adaptive transformer architecture for sign language translation

Shahin, Nada; Ismail, Leila

doi:10.1038/s41598-026-36293-9

Download PDF

Article
Open access
Published: 28 January 2026

ADAT novel time-series-aware adaptive transformer architecture for sign language translation

Nada Shahin^1,2 &
Leila Ismail^1,2,3

Scientific Reports , Article number: (2026) Cite this article

251 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Current sign language machine translation systems rely on recognizing hand movements, facial expressions, and body postures, and natural language processing, to convert signs into text. While recent approaches use Transformer architectures to model long-range dependencies via positional encoding, they lack accuracy in recognizing fine-grained, short-range temporal dependencies between gestures captured at high frame rates. Moreover, their quadratic attention complexity leads to inefficient training. To mitigate these issues, we introduce ADAT, an Adaptive Transformer architecture that combines convolutional feature extraction, log-sparse self-attention, and an adaptive gating mechanism to efficiently model both short- and long-range temporal dependencies in sign language sequences. We evaluate ADAT on three datasets: the benchmark RWTH-PHOENIX-Weather-2014 (PHOENIX14T), the ISL-CSLTR, and the newly introduced MedASL, a medical-domain American Sign Language corpus. In sign-to-gloss-to-text translation, ADAT outperforms the state-of-the-art baselines, improving BLEU-4 by at least 0.1% and reducing training time by an average of 21% across datasets. In sign-to-text translation, ADAT consistently surpasses transformer-based encoder-decoder baselines, achieving a minimum of 0.5% gains in BLEU-4 and an average training speedup of 21.8% across datasets. Compared to the encoder-only and decoder-only baselines in sign-to-text, ADAT is at least 0.7% more accurate, despite being up to 12.1% slower due to its dual-stream structure.

A deep learning-based method combines manual and non-manual features for sign language recognition

Article Open access 19 December 2025

Harnessing attention-driven hybrid deep learning with combined feature representation for precise sign language recognition to aid deaf and speech-impaired people

Article Open access 01 September 2025

A novel model for expanding horizons in sign Language recognition

Article Open access 08 July 2025

Data availability

The proposed MedASL dataset and ADAT in this study are publicly available at the INDUCE Lab GitHub: [https://github.com/INDUCE-Lab](https:/github.com/INDUCE-Lab).

References

Camgoz, N. C., Hadfield, S., Koller, O., Ney, H. & Bowden, R. Neural Sign Language Translation. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 7784–7793 (2018).
World Health Organization. Deafness and hearing loss. World Health Organ. (2023). https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
Kumar, V. K., Goudar, R. H. & Desai, V. T. Sign Language unification: the need for next generation deaf education. Procedia Comput. Sci. 48, 673–678 (2015).
Google Scholar
Registry of Interpreters for the Deaf. Registry. Registry of Interpreters for the Deaf https://rid.org. (2023).
Shahin, N. & Ismail, L. From rule-based models to deep learning Transformers architectures for natural Language processing and sign Language translation systems: survey, taxonomy and performance evaluation. Artif. Intell. Rev. 57, 271 (2024).
Google Scholar
Ismail, L. & Zhang, L. Information Innovation Technology in Smart Cities. (2018). https://doi.org/10.1007/978-981-10-1741-4
Zhang, Z. et al. Universal multimodal representation for Language Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 1–18. https://doi.org/10.1109/TPAMI.2023.3234170 (2023).
Jay, M. Don’t Just Sign… Communicate! A Student’s Guide To Mastering ASL Grammar (Judea Media, 2011).
Papastratis, I., Chatzikonstantinou, C., Konstantinidis, D., Dimitropoulos, K. & Daras, P. Artificial intelligence technologies for sign Language. Sensors 21, 5843 (2021).
Google Scholar
Camgoz, N., Koller, O., Hadfield, S. & Bowden, R. Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 10023–10033 (2020).
Jin, T., Zhao, Z., Zhang, M. & Zeng, X. M. C. S. L. T. Towards Low-Resource Signer-Adaptive Sign Language Translation. in Proceedings of the 30th ACM International Conference on Multimedia 4939–4947ACM, New York, NY, USA, (2022). https://doi.org/10.1145/3503161.3548069
Shahin, N., Ismail, L. & ChatGPT Let Us Chat Sign Language: Experiments, Architectural Elements, Challenges and Research Directions. in 2023 International Symposium on Networks, Computers and Communications (ISNCC) 1–7IEEE, (2023). https://doi.org/10.1109/ISNCC58260.2023.10323974
Vaswani, A., Shazeer, N., Parmar, N. & Uszkoreit, J. Attention is All You Need. in Advances in Neural Information Processing Systems (2017).
Yang, M., Gao, H., Guo, P. & Wang, L. Adapting Short-Term Transformers for Action Detection in Untrimmed Videos. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 18570–18579 (2024).
Wang, J., Bertasius, G., Tran, D. & Torresani, L. Long-short temporal contrastive learning of video transformers. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 14010–14020 (2022).
Li, S. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. in Advances in neural information processing systems (2019).
Chattopadhyay, R. & Tham, C. K. A Position Aware Transformer Architecture for Traffic State Forecasting. in IEEE 99th Vehicular Technology ConferenceIEEE, Singapore, (2024).
Ismail, L., Buyya, R. & Metaverse A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds. ArXiv (2023).
Zhou, H., Zhou, W., Qi, W., Pu, J. & Li, H. Improving Sign Language Translation with Monolingual Data by Sign Back-Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1316–1325 (2021).
Kan, J. et al. Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network. in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 3367–3367 (2022).
Xie, P., Zhao, M., Hu, X. & PiSLTRc Position-Informed sign Language transformer with Content-Aware Convolution. IEEE Trans. Multimedia. 24, 3908–3919 (2022).
Google Scholar
Gan, S. et al. International Joint Conferences on Artificial Intelligence Organization, California,. Contrastive Learning for Sign Language Recognition and Translation. in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 763–772 (2023). https://doi.org/10.24963/ijcai.2023/85
Zhang, B., Müller, M. & Sennrich, R. S. L. T. U. N. E. T. A Simple Unified Model for Sign Language Translation. in International Conference on Learning Representations (2023).
Yin, A. et al. ACM, New York, NY, USA,. SimulSLT: End-to-End Simultaneous Sign Language Translation. in Proceedings of the 29th ACM International Conference on Multimedia 4118–4127 (2021). https://doi.org/10.1145/3474085.3475544
Chen, Y., Wei, F., Sun, X., Wu, Z. & Lin, S. A. Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5120–5130 (2022).
Graves, A., Fernández, S., Gomez, F. & Schmidhuber, J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. in Proceedings of the 23rd international conference on Machine learning - ICML ’06 369–376ACM Press, New York, New York, USA, (2006). https://doi.org/10.1145/1143844.1143891
Zhao, J. et al. Conditional sentence generation and Cross-Modal reranking for sign Language translation. IEEE Trans. Multimedia. 24, 2662–2672 (2022).
Google Scholar
Jin, T., Zhao, Z., Zhang, M. & Zeng, X. Prior Knowledge and Memory Enriched Transformer for Sign Language Translation. in Findings of the Association for Computational Linguistics: ACL 2022 3766–3775Association for Computational Linguistics, Stroudsburg, PA, USA, (2022). https://doi.org/10.18653/v1/2022.findings-acl.297
Chaudhary, L., Ananthanarayana, T., Hoq, E. & Nwogu, I. SignNet II: A Transformer-Based Two-Way Sign Language Translation Model. IEEE Trans Pattern Anal Mach Intell 1–14 (2022). https://doi.org/10.1109/TPAMI.2022.3232389
Yu, P., Zhang, L., Fu, B. & Chen, Y. Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder. in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 5260–5268International Joint Conferences on Artificial Intelligence Organization, California, (2023). https://doi.org/10.24963/ijcai.2023/584
Yin, A. et al. Gloss Attention for Gloss-free Sign Language Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2551–2562 (2023).
Zhou, B. et al. Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining. in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 20871–20881 (2023).
Ismail, L. & Guerchi, D. Performance evaluation of Convolution on the cell broadband engine processor. IEEE Trans. Parallel Distrib. Syst. 22, 337–351 (2011).
Google Scholar
Scherer, D., Müller, A. & Behnke, S. Evaluation of pooling operations in convolutional architectures for object recognition. in 92–101 (2010). https://doi.org/10.1007/978-3-642-15825-4_10
Shen, L. & Wang, Y. T. C. C. T. Tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480, 131–145 (2022).
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. in Proceedings of the IEEE conference on computer vision and pattern recognition 2921–2929 (2016).
Bridle, J. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. in Advances in neural information processing systems (1989).
Lee, J. Y., Cheon, S. J., Choi, B. J. & Kim, N. S. Memory attention: robust alignment using gating mechanism for End-to-End speech synthesis. IEEE Signal. Process. Lett. 27, 2004–2008 (2020).
Google Scholar
Koller, O., Forster, J. & Ney, H. Continuous sign Language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015).
Google Scholar
Ko, S. K., Kim, C. J., Jung, H. & Cho, C. Neural sign Language translation based on human keypoint Estimation. Appl. Sci. 9, 2683 (2019).
Google Scholar
Adaloglou, N. et al. A comprehensive study on deep Learning-Based methods for sign Language recognition. IEEE Trans. Multimedia. 24, 1750–1762 (2022).
Google Scholar
Schembri, A., Fenlon, J., Rentelis, R., Reynolds, S. & Cormier, K. Building the British Sign Language Corpus (in (University of Hawaii, 2013).
Viitaniemi, V., Jantunen, T., Savolainen, L., Karppa, M. & Laaksonen, J. S-pot: a Benchmark in Spotting Signs Within Continuous Signing (in LREC (European Language Resources Association, 2014).
Elakkiya, R. & NATARAJAN, B. I. S. L. C. S. L. T. R. Indian sign Language dataset for continuous sign Language translation and recognition. Mendeley Data 1, (2021).
OpenAI. ChatGPT. (2022). https://chat.openai.com
Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02 311Association for Computational Linguistics, Morristown, NJ, USA, (2001). https://doi.org/10.3115/1073083.1073135
Dreuw, P., Forster, J., Deselaers, T. & Ney, H. Efficient approximations to model-based joint tracking and recognition of continuous sign language. in 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition 1–6IEEE, (2008). https://doi.org/10.1109/AFGR.2008.4813439
Athitsos, V. et al. IEEE,. The American Sign Language Lexicon Video Dataset. in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 1–8 (2008). https://doi.org/10.1109/CVPRW.2008.4563181
Othman, A. & Jemni, M. English-asl gloss parallel corpus 2012: Aslg-pc12. in LREC 151–154 (European Language Resources Association (ELRA), (2012).
Camgo ̈z, N. C. et al. BosphorusSign: A Turkish sign language recognition corpus in health and finance domains. in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) 1383–1388 (2016).
Guo, D., Zhou, W., Li, H. & Wang, M. Hierarchical LSTM for Sign Language Translation. in Proceedings of the AAAI Conference on Artificial Intelligence vol. 32 (2018).
Rodriguez, J. et al. Understanding Motion in Sign Language: A New Structured Translation Dataset. in Proceedings of the Asian Conference on Computer Vision (ACCV) (2020).
Ananthanarayana, T. et al. IEEE,. Dynamic Cross-Feature Fusion for American Sign Language Translation. in 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) 1–8 (2021). https://doi.org/10.1109/FG52635.2021.9667027
Duarte, A. et al. How2Sign: a large-scale multimodal dataset for continuous American sign language. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2735–2744 (2021).

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable comments and feedback which helped us to improve the paper. This work was supported by the Emirates Center for Mobility Research, United Arab Emirates University, under Grant 12R126.

Author information

Authors and Affiliations

Intelligent Distributed Computing and Systems (INDUCE) Lab, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates
Nada Shahin & Leila Ismail
National Water and Energy, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates
Nada Shahin & Leila Ismail
Emirates Center for Mobility Research, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates
Leila Ismail

Authors

Nada Shahin
View author publications
Search author on:PubMed Google Scholar
Leila Ismail
View author publications
Search author on:PubMed Google Scholar

Contributions

NS: Writing—original draft preparation, NS and LI: Investigation, Design, Analysis, Revisions, LI: Conceptualization, Methodology, Supervision, Funding acquisition, Writing—review & editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Leila Ismail.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shahin, N., Ismail, L. ADAT novel time-series-aware adaptive transformer architecture for sign language translation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36293-9

Download citation

Received: 21 April 2025
Accepted: 12 January 2026
Published: 28 January 2026
DOI: https://doi.org/10.1038/s41598-026-36293-9