Abstract
Current sign language machine translation systems rely on recognizing hand movements, facial expressions, and body postures, and natural language processing, to convert signs into text. While recent approaches use Transformer architectures to model long-range dependencies via positional encoding, they lack accuracy in recognizing fine-grained, short-range temporal dependencies between gestures captured at high frame rates. Moreover, their quadratic attention complexity leads to inefficient training. To mitigate these issues, we introduce ADAT, an Adaptive Transformer architecture that combines convolutional feature extraction, log-sparse self-attention, and an adaptive gating mechanism to efficiently model both short- and long-range temporal dependencies in sign language sequences. We evaluate ADAT on three datasets: the benchmark RWTH-PHOENIX-Weather-2014 (PHOENIX14T), the ISL-CSLTR, and the newly introduced MedASL, a medical-domain American Sign Language corpus. In sign-to-gloss-to-text translation, ADAT outperforms the state-of-the-art baselines, improving BLEU-4 by at least 0.1% and reducing training time by an average of 21% across datasets. In sign-to-text translation, ADAT consistently surpasses transformer-based encoder-decoder baselines, achieving a minimum of 0.5% gains in BLEU-4 and an average training speedup of 21.8% across datasets. Compared to the encoder-only and decoder-only baselines in sign-to-text, ADAT is at least 0.7% more accurate, despite being up to 12.1% slower due to its dual-stream structure.
Similar content being viewed by others
Data availability
The proposed MedASL dataset and ADAT in this study are publicly available at the INDUCE Lab GitHub: [https://github.com/INDUCE-Lab](https:/github.com/INDUCE-Lab).
References
Camgoz, N. C., Hadfield, S., Koller, O., Ney, H. & Bowden, R. Neural Sign Language Translation. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 7784–7793 (2018).
World Health Organization. Deafness and hearing loss. World Health Organ. (2023). https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
Kumar, V. K., Goudar, R. H. & Desai, V. T. Sign Language unification: the need for next generation deaf education. Procedia Comput. Sci. 48, 673–678 (2015).
Registry of Interpreters for the Deaf. Registry. Registry of Interpreters for the Deaf https://rid.org. (2023).
Shahin, N. & Ismail, L. From rule-based models to deep learning Transformers architectures for natural Language processing and sign Language translation systems: survey, taxonomy and performance evaluation. Artif. Intell. Rev. 57, 271 (2024).
Ismail, L. & Zhang, L. Information Innovation Technology in Smart Cities. (2018). https://doi.org/10.1007/978-981-10-1741-4
Zhang, Z. et al. Universal multimodal representation for Language Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 1–18. https://doi.org/10.1109/TPAMI.2023.3234170 (2023).
Jay, M. Don’t Just Sign… Communicate! A Student’s Guide To Mastering ASL Grammar (Judea Media, 2011).
Papastratis, I., Chatzikonstantinou, C., Konstantinidis, D., Dimitropoulos, K. & Daras, P. Artificial intelligence technologies for sign Language. Sensors 21, 5843 (2021).
Camgoz, N., Koller, O., Hadfield, S. & Bowden, R. Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 10023–10033 (2020).
Jin, T., Zhao, Z., Zhang, M. & Zeng, X. M. C. S. L. T. Towards Low-Resource Signer-Adaptive Sign Language Translation. in Proceedings of the 30th ACM International Conference on Multimedia 4939–4947ACM, New York, NY, USA, (2022). https://doi.org/10.1145/3503161.3548069
Shahin, N., Ismail, L. & ChatGPT Let Us Chat Sign Language: Experiments, Architectural Elements, Challenges and Research Directions. in 2023 International Symposium on Networks, Computers and Communications (ISNCC) 1–7IEEE, (2023). https://doi.org/10.1109/ISNCC58260.2023.10323974
Vaswani, A., Shazeer, N., Parmar, N. & Uszkoreit, J. Attention is All You Need. in Advances in Neural Information Processing Systems (2017).
Yang, M., Gao, H., Guo, P. & Wang, L. Adapting Short-Term Transformers for Action Detection in Untrimmed Videos. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 18570–18579 (2024).
Wang, J., Bertasius, G., Tran, D. & Torresani, L. Long-short temporal contrastive learning of video transformers. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 14010–14020 (2022).
Li, S. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. in Advances in neural information processing systems (2019).
Chattopadhyay, R. & Tham, C. K. A Position Aware Transformer Architecture for Traffic State Forecasting. in IEEE 99th Vehicular Technology ConferenceIEEE, Singapore, (2024).
Ismail, L., Buyya, R. & Metaverse A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds. ArXiv (2023).
Zhou, H., Zhou, W., Qi, W., Pu, J. & Li, H. Improving Sign Language Translation with Monolingual Data by Sign Back-Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1316–1325 (2021).
Kan, J. et al. Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network. in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 3367–3367 (2022).
Xie, P., Zhao, M., Hu, X. & PiSLTRc Position-Informed sign Language transformer with Content-Aware Convolution. IEEE Trans. Multimedia. 24, 3908–3919 (2022).
Gan, S. et al. International Joint Conferences on Artificial Intelligence Organization, California,. Contrastive Learning for Sign Language Recognition and Translation. in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 763–772 (2023). https://doi.org/10.24963/ijcai.2023/85
Zhang, B., Müller, M. & Sennrich, R. S. L. T. U. N. E. T. A Simple Unified Model for Sign Language Translation. in International Conference on Learning Representations (2023).
Yin, A. et al. ACM, New York, NY, USA,. SimulSLT: End-to-End Simultaneous Sign Language Translation. in Proceedings of the 29th ACM International Conference on Multimedia 4118–4127 (2021). https://doi.org/10.1145/3474085.3475544
Chen, Y., Wei, F., Sun, X., Wu, Z. & Lin, S. A. Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5120–5130 (2022).
Graves, A., Fernández, S., Gomez, F. & Schmidhuber, J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. in Proceedings of the 23rd international conference on Machine learning - ICML ’06 369–376ACM Press, New York, New York, USA, (2006). https://doi.org/10.1145/1143844.1143891
Zhao, J. et al. Conditional sentence generation and Cross-Modal reranking for sign Language translation. IEEE Trans. Multimedia. 24, 2662–2672 (2022).
Jin, T., Zhao, Z., Zhang, M. & Zeng, X. Prior Knowledge and Memory Enriched Transformer for Sign Language Translation. in Findings of the Association for Computational Linguistics: ACL 2022 3766–3775Association for Computational Linguistics, Stroudsburg, PA, USA, (2022). https://doi.org/10.18653/v1/2022.findings-acl.297
Chaudhary, L., Ananthanarayana, T., Hoq, E. & Nwogu, I. SignNet II: A Transformer-Based Two-Way Sign Language Translation Model. IEEE Trans Pattern Anal Mach Intell 1–14 (2022). https://doi.org/10.1109/TPAMI.2022.3232389
Yu, P., Zhang, L., Fu, B. & Chen, Y. Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder. in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 5260–5268International Joint Conferences on Artificial Intelligence Organization, California, (2023). https://doi.org/10.24963/ijcai.2023/584
Yin, A. et al. Gloss Attention for Gloss-free Sign Language Translation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2551–2562 (2023).
Zhou, B. et al. Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining. in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 20871–20881 (2023).
Ismail, L. & Guerchi, D. Performance evaluation of Convolution on the cell broadband engine processor. IEEE Trans. Parallel Distrib. Syst. 22, 337–351 (2011).
Scherer, D., Müller, A. & Behnke, S. Evaluation of pooling operations in convolutional architectures for object recognition. in 92–101 (2010). https://doi.org/10.1007/978-3-642-15825-4_10
Shen, L. & Wang, Y. T. C. C. T. Tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480, 131–145 (2022).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. in Proceedings of the IEEE conference on computer vision and pattern recognition 2921–2929 (2016).
Bridle, J. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. in Advances in neural information processing systems (1989).
Lee, J. Y., Cheon, S. J., Choi, B. J. & Kim, N. S. Memory attention: robust alignment using gating mechanism for End-to-End speech synthesis. IEEE Signal. Process. Lett. 27, 2004–2008 (2020).
Koller, O., Forster, J. & Ney, H. Continuous sign Language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015).
Ko, S. K., Kim, C. J., Jung, H. & Cho, C. Neural sign Language translation based on human keypoint Estimation. Appl. Sci. 9, 2683 (2019).
Adaloglou, N. et al. A comprehensive study on deep Learning-Based methods for sign Language recognition. IEEE Trans. Multimedia. 24, 1750–1762 (2022).
Schembri, A., Fenlon, J., Rentelis, R., Reynolds, S. & Cormier, K. Building the British Sign Language Corpus (in (University of Hawaii, 2013).
Viitaniemi, V., Jantunen, T., Savolainen, L., Karppa, M. & Laaksonen, J. S-pot: a Benchmark in Spotting Signs Within Continuous Signing (in LREC (European Language Resources Association, 2014).
Elakkiya, R. & NATARAJAN, B. I. S. L. C. S. L. T. R. Indian sign Language dataset for continuous sign Language translation and recognition. Mendeley Data 1, (2021).
OpenAI. ChatGPT. (2022). https://chat.openai.com
Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02 311Association for Computational Linguistics, Morristown, NJ, USA, (2001). https://doi.org/10.3115/1073083.1073135
Dreuw, P., Forster, J., Deselaers, T. & Ney, H. Efficient approximations to model-based joint tracking and recognition of continuous sign language. in 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition 1–6IEEE, (2008). https://doi.org/10.1109/AFGR.2008.4813439
Athitsos, V. et al. IEEE,. The American Sign Language Lexicon Video Dataset. in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 1–8 (2008). https://doi.org/10.1109/CVPRW.2008.4563181
Othman, A. & Jemni, M. English-asl gloss parallel corpus 2012: Aslg-pc12. in LREC 151–154 (European Language Resources Association (ELRA), (2012).
Camgo ̈z, N. C. et al. BosphorusSign: A Turkish sign language recognition corpus in health and finance domains. in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) 1383–1388 (2016).
Guo, D., Zhou, W., Li, H. & Wang, M. Hierarchical LSTM for Sign Language Translation. in Proceedings of the AAAI Conference on Artificial Intelligence vol. 32 (2018).
Rodriguez, J. et al. Understanding Motion in Sign Language: A New Structured Translation Dataset. in Proceedings of the Asian Conference on Computer Vision (ACCV) (2020).
Ananthanarayana, T. et al. IEEE,. Dynamic Cross-Feature Fusion for American Sign Language Translation. in 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) 1–8 (2021). https://doi.org/10.1109/FG52635.2021.9667027
Duarte, A. et al. How2Sign: a large-scale multimodal dataset for continuous American sign language. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2735–2744 (2021).
Acknowledgements
We would like to thank the anonymous reviewers for their valuable comments and feedback which helped us to improve the paper. This work was supported by the Emirates Center for Mobility Research, United Arab Emirates University, under Grant 12R126.
Author information
Authors and Affiliations
Contributions
NS: Writing—original draft preparation, NS and LI: Investigation, Design, Analysis, Revisions, LI: Conceptualization, Methodology, Supervision, Funding acquisition, Writing—review & editing. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shahin, N., Ismail, L. ADAT novel time-series-aware adaptive transformer architecture for sign language translation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36293-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-36293-9


