Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Sign4all: a Spanish Sign Language dataset
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 23 February 2026

Sign4all: a Spanish Sign Language dataset

  • Francisco Morillas-Espejo  ORCID: orcid.org/0000-0001-9195-68221 &
  • Ester Martinez-Martin1 

Scientific Data , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Developing world
  • Quality of life

Abstract

Sign Language Recognition (SLR) is a critical component of human-machine interaction, enabling more inclusive technologies for the deaf and hard-of-hearing community. However, current datasets often suffer from data sparsity and a bias toward right-handed signs. To support this effort, we present Sign4all, a dataset for Spanish Sign Language (LSE), specifically designed for Isolated Sign Language Recognition (ISLR). The dataset is composed of 7,756 high-resolution RGB video recordings and their corresponding skeletal keypoints, covering 24 signs related to daily activities, more specifically a vocabulary centered in the catering field. Unlike sparse lexicons, Sign4all adopts a high-density approach, providing an average of 323 samples per sign to facilitate data-intensive deep learning models. Moreover, the dataset provides a handedness balance, with equal representation of left- and right-handed signs for every sign to support handedness invariance. Each sample was manually segmented, temporally normalized and preprocessed through spatial normalization to guarantee consistency and compatibility with different deep learning pipelines. Technical validation using Transformer and skeletal models demonstrates the dataset’s integrity and the need of providing pre-computed augmentation splits. All data is formatted in widely supported file types (AVI for video, HDF5 for keypoints), enabling direct use in machine learning frameworks such as TensorFlow or PyTorch.

Similar content being viewed by others

A large dataset covering the Chinese national sign language for dual-view isolated sign language recognition

Article Open access 19 April 2025

A deep learning-based method combines manual and non-manual features for sign language recognition

Article Open access 19 December 2025

Innovative hand pose based sign language recognition using hybrid metaheuristic optimization algorithms with deep learning model for hearing impaired persons

Article Open access 18 March 2025

Data availability

The complete Sign4all dataset is available at Science Data Bank43. Because the dataset contains identifiable facial and body features –including characteristics from which gender may be inferred– participants are exposed to a potential risk of re-identification. For this reason, access to the dataset is restricted and subject to manual request. To obtain the data, researchers must agree to a Data Usage Agreement (DUA) and provide contact information such as name, email address and affiliation details. Once the request is verified, access will be granted via a secure download link sent by email.

Code availability

None of the six variations of the proposed dataset require any custom code for access or processing since all the data is provided in widely supported formats, as mentioned before. The dataset was processed with Blender 4.0 as video editing software, MediaPipe 0.10.1 for keypoint extraction, and TensorFlow 2.12.0 for model training and testing; all of them under an Arch Linux operative system with Python 3.8. For the dataset recording, Azure Kinect SDK 1.345 with PyKinect Azure46 as Python wrapper was used under Ubuntu 18.04 LTS.

References

  1. Duda, R. O., Hart, P. E. & Stork, D. G.Pattern Classification (2nd Edition) (Wiley-Interscience, USA, 2000).

  2. Tao, T., Zhao, Y., Liu, T. & Zhu, J. Sign language recognition: A comprehensive review of traditional and deep learning approaches, datasets, and challenges. IEEE Access PP, 1–1, https://doi.org/10.1109/ACCESS.2024.3398806 (2024).

    Google Scholar 

  3. Adaloglou, N. et al. A comprehensive study on deep learning-based methods for sign language recognition. IEEE Transactions on Multimedia 24, 1750–1762, https://doi.org/10.1109/tmm.2021.3070438 (2022).

    Google Scholar 

  4. Koller, O., Forster, J. & Ney, H. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141, 108–125, https://doi.org/10.1016/j.cviu.2015.09.013 Pose and Gesture (2015).

    Google Scholar 

  5. Camgoz, N. C., Hadfield, S., Koller, O., Ney, H. & Bowden, R. Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).

  6. Duarte, A. et al. How2sign: A large-scale multimodal dataset for continuous american sign language 2008.08143 (2021).

  7. Sanabria, R. et al. How2: A large-scale dataset for multimodal language understanding 1811.00347 (2018).

  8. Zhou, H., Zhou, W., Qi, W., Pu, J. & Li, H. Improving sign language translation with monolingual data by sign back-translation 2105.12397 (2021).

  9. Armstrong, D. F., Stokoe, W. C. & Wilcox, S. E. Gesture and the nature of language. Cambridge University Press (1995).

  10. Li, D., Opazo, C. R., Yu, X. & Li, H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison 1910.11006 (2020).

  11. Caselli, N., Sehyr, Z., Cohen-Goldberg, A. & Emmorey, K. Asl-lex: A lexical database of american sign language. Behavior Research Methods 49, https://doi.org/10.3758/s13428-016-0742-0 (2016).

  12. Sehyr, Z. S., Caselli, N., Cohen-Goldberg, A. M. & Emmorey, K. The asl-lex 2.0 project: A database of lexical and phonological properties for 2,723 signs in american sign language. The Journal of Deaf Studies and Deaf Education 26, 263–277, https://doi.org/10.1093/deafed/enaa038 (2021).

    Google Scholar 

  13. Joze, H. R. V. & Koller, O. Ms-asl: A large-scale data set and benchmark for understanding american sign language. ArXivabs/1812.01053 (2018).

  14. Jin, P. et al. A large dataset covering the chinese national sign language for dual-view isolated sign language recognition. Scientific Data 12 (2025).

  15. Asl alphabet. https://www.kaggle.com/dsv/29550, https://doi.org/10.34740/KAGGLE/DSV/29550.

  16. MNIST. Sign language mnist. https://www.kaggle.com/datasets/datamunge/sign-language-mnist Accessed: January 2025 (2018).

  17. Al-Barham, M. et al. Rgb arabic alphabets sign language dataset, https://doi.org/10.48550/arXiv.2301.11932 (2023).

  18. Sincan, O. M. & Keles, H. Y. Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access 8, 181340–181355, https://doi.org/10.1109/access.2020.3028072 (2020).

    Google Scholar 

  19. Kumwilaisak, W., Pannattee, P., Hansakunbuntheung, C. & Thatphithakkul, N. American sign language fingerspelling recognition in the wild with iterative language model construction. APSIPA Transactions on Signal and Information Processing 11, https://doi.org/10.1561/116.00000003 (2022).

  20. Sunuwar, J., Borah, S. & Kharga, A. Nsl23 dataset for alphabets of nepali sign language. Data in Brief 53, 110080, https://doi.org/10.1016/j.dib.2024.110080 (2024).

    Google Scholar 

  21. Kumar, P., Saini, R., Roy, P. P. & Dogra, D. P. A position and rotation invariant framework for sign language recognition (slr) using kinect. Multimedia Tools Appl. 77, 8823–8846, https://doi.org/10.1007/s11042-017-4776-9 (2018).

    Google Scholar 

  22. Boulesnane, A., Bellil, L. & Ghiri, M. Aslad-190k: Arabic sign language alphabet dataset, https://doi.org/10.31219/osf.io/n236q (2024).

  23. El Kharoua, R. & Jiang, X. Deep learning recognition for arabic alphabet sign language rgb dataset. Journal of Computer and Communications 12, 32–51, https://doi.org/10.4236/jcc.2024.123003 (2024).

    Google Scholar 

  24. Wikipedia. List of sign languages. https://en.wikipedia.org/wiki/List_of_sign_languages Accessed: March 2025 (2025).

  25. Wikipedia. List of sign languages by number of native signers. https://en.wikipedia.org/wiki/List_of_sign_languages_by_number_of_native_signers Accessed: March 2025 (2025).

  26. Ethnologue. Ethnologue. https://www.ethnologue.com Accessed: March 2025 (2025).

  27. Ethnologue. Spanish Sign Language by Ethnologue. https://www.ethnologue.com/language/ssp/ Accessed: March 2025 (2025).

  28. Docío-Fernández, L. et al. LSE_UVIGO: A multi-source database for Spanish Sign Language recognition. In Efthimiou, E.et al. (eds.) Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, 45–52 (European Language Resources Association (ELRA), Marseille, France, 2020).

  29. Vázquez Enríquez, M., Castro, J. L. A., Fernandez, L. D., Jacques Junior, J. C. S. & Escalera, S. Eccv 2022 sign spotting challenge: Dataset, design and results. In Karlinsky, L., Michaeli, T. & Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops, 225–242 (Springer Nature Switzerland, Cham, 2023).

  30. Rodríguez-Moreno, I., Martinez-Otzeta, J. M. & Sierra, B.A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System, 37–53 (2022).

  31. Morillas-Espejo, F. & Martinez-Martin, E. A real-time platform for spanish sign language interpretation. Neural Computing and Applications (2024).

  32. Martinez-Martin, E. & Morillas-Espejo, F. Deep learning techniques for spanish sign language interpretation. Computational Intelligence and Neuroscience https://doi.org/10.1155/2021/5532580 (2021).

  33. Stokoe, W.C. Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf. Studies in Linguistics. Occasional Papers (University of Buffalo, 1960).

  34. Azure Kinect DK oficial page. https://azure.microsoft.com/es-es/products/kinect-dk/ Accessed: February 2025 (2021).

  35. Blender’s page. https://www.blender.org/ Accessed: May 2025 (2025).

  36. Sarge, V., Andersch, M., Fabel, L., Micikevicius, P. & Tran, J. Tips for Optimization GPU Performance using Tensor Cores. https://developer.nvidia.com/blog/optimizing-gpu-performance-tensor-cores/ Accessed: February 2025 (2019).

  37. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition 1512.03385 (2015).

  38. Tan, M. & Le, Q. V. Efficientnet: Rethinking model scaling for convolutional neural networks 1905.11946 (2020).

  39. Arnab, A. et al. Vivit: A video vision transformer 2103.15691 (2021).

  40. Tan, M., Pang, R. & Le, Q. V. Efficientdet: Scalable and efficient object detection 1911.09070 (2020).

  41. Lugaresi, C. et al. Mediapipe: A framework for building perception pipelines 1906.08172 (2019).

  42. Papakipos, Z. & Bitton, J. Augly: Data augmentations for robustness 2201.06494 (2022).

  43. Morillas-Espejo, F. & Martinez-Martin, E. Sign4all: a spanish sign language dataset, https://doi.org/10.57760/sciencedb.28304 (2025).

  44. Dosovitskiy, A. et al. An image is worth 16 × 16 words: Transformers for image recognition at scale 2010.11929 (2021).

  45. Azure Kinect Sensor SDK oficial documentation. https://microsoft.github.io/Azure-Kinect-Sensor-SDK/master/index.html. Accessed: February 2025.

  46. Gorordo, I. PyKinectAzure GitHub page. https://github.com/ibaiGorordo/pyKinectAzure?tab=readme-ov-file Accessed: February 2025 (2020).

  47. Joo, H. et al. Panoptic studio: A massively multiview system for social interaction capture 1612.03153 (2016).

  48. Athitsos, V. et al. The american sign language lexicon video dataset. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 0, 1–8, https://doi.org/10.1109/CVPRW.2008.4563181 (2008).

    Google Scholar 

  49. Mavi, A. & Dikle, Z. A new 27 class sign language dataset collected from 173 individuals 2203.03859 (2022).

  50. Albanie, S. et al. Bbc-oxford british sign language dataset 2111.03635 (2021).

  51. Efthimiou, E. et al. Sign language recognition, generation, and modelling: A research effort with applications in deaf communication. In Stephanidis, C. (ed.) Universal Access in Human-Computer Interaction. Addressing Diversity, 21–30 (Springer Berlin Heidelberg, Berlin, Heidelberg 2009).

  52. Bhatia, P. & Wadhawan, A. Deep learning-based sign language recognition system for static signs. Neural Computing and Applications https://doi.org/10.1007/s00521-019-04691-y (2021).

  53. Chai, X., Wang, H. & Chen, X. The devisign large vocabulary of chinese sign language database and baseline evaluations (2014).

  54. Radakovic, M. et al. The serbian sign language alphabet: A unique authentic dataset of letter sign gestures. Mathematics 12, 525, https://doi.org/10.3390/math12040525 (2024).

    Google Scholar 

  55. Kapitanov, A., Karina, K., Nagaev, A. & Elizaveta, P.Slovo: Russian Sign Language Dataset, 63-73 (Springer Nature Switzerland, 2023).

  56. Azure Kinect DK hardware specifications. https://learn.microsoft.com/en-us/previous-versions/azure/kinect-dk/hardware-specification#depth-camera-supported-operating-modes Accessed: February 2025 (2021).

Download references

Acknowledgements

This work has been partially funded by a PhD grant under the reference UAFPU21-78 from the University of Alicante (Spain). In addition, this work has been funded by the Spanish State Research Agency (AEI) and ERDF/EU under grant: GEMELIA PID2024-161711OB-I00.

Author information

Authors and Affiliations

  1. RoViT Lab, Department of Computer Science and Artificial Intelligence, University of Alicante, Carretera de San Vicente del Raspeig s/n, E-03690, Alicante, Spain

    Francisco Morillas-Espejo & Ester Martinez-Martin

Authors
  1. Francisco Morillas-Espejo
    View author publications

    Search author on:PubMed Google Scholar

  2. Ester Martinez-Martin
    View author publications

    Search author on:PubMed Google Scholar

Contributions

F.M.E. and E.M.M. defined the vocabulary; F.M.E. recorded, filtered and processed the data, also performed the technical validation; E.M.M. supervised the experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Francisco Morillas-Espejo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morillas-Espejo, F., Martinez-Martin, E. Sign4all: a Spanish Sign Language dataset. Sci Data (2026). https://doi.org/10.1038/s41597-026-06872-6

Download citation

  • Received: 07 August 2025

  • Accepted: 09 February 2026

  • Published: 23 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06872-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing