Attention-based hybrid deep learning model with CSFOA optimization and G-TverskyUNet3+ for Arabic sign language recognition

Mohamed, Ahmed A.; Al-Saleh, Abdullah; Sharma, Sunil Kumar; Tejani, Ghanshyam

doi:10.1038/s41598-025-03560-0

Download PDF

Article
Open access
Published: 26 June 2025

Attention-based hybrid deep learning model with CSFOA optimization and G-TverskyUNet3+ for Arabic sign language recognition

Ahmed A. Mohamed¹,
Abdullah Al-Saleh²,
Sunil Kumar Sharma³ &
…
Ghanshyam Tejani ORCID: orcid.org/0000-0001-9106-0313^4,5

Scientific Reports volume 15, Article number: 20313 (2025) Cite this article

1574 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Arabic sign language (ArSL) is a visual-manual language which facilitates communication among Deaf people in the Arabic-speaking nations. Recognizing the ArSL is crucial due to variety of reasons, including its impact on the Deaf populace, education, healthcare, and society, as well. Previous approaches for the recognition of Arabic sign language have some limitations especially in terms of accuracy and their capability to capture the detailed features of the signs. To overcome these challenges, a new model is proposed namely DeepArabianSignNet, that incorporates DenseNet, EfficientNet and an attention-based Deep ResNet. This model uses a newly introduced G-TverskyUNet3+ to detect regions of interest in preprocessed Arabic sign language images. In addition, employing a novel metaheuristic algorithm, the Crisscross Seed Forest Optimization Algorithm, which combines the Crisscross Optimization and Forest Optimization algorithms to determine the best features from the extracted texture, color, and deep learning features. The proposed model is assessed using two databases, the variation of the training rate was 70% and 80%; Database 2 was exceptional, with an accuracy of 0.97675 for 70% of the training data and 0.98376 for 80%. The results presented in this paper prove that DeepArabianSignNet is effective in improving Arabic sign language recognition.

ASLDetect: Arabic sign language detection using ResNet and U-Net like component

Article Open access 23 May 2025

Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach

Article Open access 30 October 2024

A novel model for expanding horizons in sign Language recognition

Article Open access 08 July 2025

Introduction

Gestures have long been a fundamental mode of communication, with Deaf and Hard to Hearing (DHH) individuals today being the primary users of recognized sign languages^1,2,3. These languages are not only signed but also contain the manual features like the hand gestures⁴ and the non-manual features including the facial actions and the body language⁵. Although sign language assists DHH people in their communication, the gap between the DHH and hearing people has not been closed. In the world, 466 million people have hearing impaired and they have to face communication problem every day. Thus, it was important to recognize the fact which sign language is not a linguistic minority that can be disregarded⁶. Different sign languages have different signs for the letters of the alphabet and they are mostly in the form of the letters⁷. However, sign language is different by one country to the another and there are 144 sign languages in world including the Arabic Sign Language (ArSL) which has 30 different alphabet sign that is specific to the Arabic region^6,8,9,10.

Arabic Sign Language detection is essential in ensuring that the hearing-impaired individuals in the Arab speaking countries have an easy time communicating with other people^11,12,13. Considering the cultural and linguistic heterogeneity within the Arab world, this system can significantly improve the communication interaction during the daily practices, education, and work¹⁴. For the people who use ArSL as their main method of communication, sign language detection technologies can translate the signs into text or speech, thus allowing the person with hearing impairment to interact between the rest of the society with minimal interference^15,16,17,18. This technology is most beneficial in government services, healthcare, education and social communications where sign language interpreters may not always be available^19,20.

The benefits that can be derived from detection of ArSL are numerous. It improves the chances of easy communication and integration of the deaf in the society and not feel locked out because of language issues. As the development of machine learning and computer vision, the gesture recognition accuracy and response speed are enhanced^19,21, so that the system becomes more stable. However, there are some issues which need to be solved as well. This is because, as mentioned before, Arabian sign language was the complex language which has multiple gestures and variations of regional dialects that can influence the level of consistency of detection²². However, some of the systems may need costly hardware or high computational power and this may not be possible in some settings²³.

The uses of detection of ArSL include; It can be incorporated in mobile applications, where people can use their smartphones to interact in real time or in public service facilities such as airports, banks, and government offices for the benefit of the deaf²⁴. In education, it can be used to assist the deaf students to be able to communicate with teachers and students in the class. Furthermore, the technology can be applied in the health sector to enhance the interaction between the patients and the medical practitioners, specifically to avoid communication barriers in a deaf patient²⁵.

It is worth mentioning that different approaches have been used in the detection of Arabic Sign Language (ArSL) and each of them has its pros and cons. A Faster R-CNN method employs deep learning models like VGG-16 and ResNet-18²⁶ for gesture recognition but has a problem with computational cost²⁷. Another approach uses a 3D CNN skeleton network for real time sign detection but is very sensitive to environmental conditions especially light. The results reveal that the proposed systems employing KNN and SVM with Leap Motion Controller exhibit high accuracy for double-hand gestures while the single-hand recognition is not very reliable²⁸. The vision-based technique performs well in terms of accuracy for certain gestures but suffers from the problem of data set dependency²³. Because of these limitations, a new method is developed in this paper to overcome these drawbacks and improve the ArSL detection.

Contribution

The following indicates the proposed algorithm’s contribution;

To introduce a new G-TverskyUNet3+ based segmentation model for accurate identification of Region of Interest (ROI)areas, from the pre-processed images.
To extract the optimal features among the extracted multi-features (texture, color and deep learning) using a new hybrid optimization model referred as CSFOA. This CSFOA is the combination of FOA and CSO, respectively.
To introduce a new DeepArabianSignNet model based on DenseNet-EfficientNet and attention-based Deep ResNet, for accurate recognition of the arabian sign languages.

Organization

Section “Introduction” is focused on the presentation of the importance of the recognition of Arabic sign language and the goals of the research. Section “Literature review” reviews the literature on sign language recognition, and identifies areas which the study intends to fill. Section “Proposed methodology for Arabian sign language (ArSL) detection” focuses on the DeepArabianSignNet model describing its architecture, pre-processing of the input data and the training process. Section “Result and discussion” displays the results, assess the performance of the proposed model, and compare it with other methods and finally conclude the overall significance of the suggested approach contribution.

Literature review

The recent improvements in ArSL recognition have motivated the development of numerous novel systems based on deep learning, computer vision, and wearable technologies. Alawwad et al.²⁶ have used Faster R-CNN with ResNet-18 and VGG-16 to achieve 93% accuracy in recognizing ArSL alphabets under varying backgrounds. Bencherif et al.²⁹ proposed a video-based model using both 2D and 3D CNNs for dynamic and static sign recognition. Hisham and Hamouda³⁰ used the Leap Motion Controller with recognized machine learning models like SVM, KNN, and DTW to reach 92% accuracy on average. Tharwat et al.³¹ presented a vision-based system to recognize Quranic letters, with nearly 99% accuracy. Alani and Cosma³² came up with ArSL-CNN based on the ArSL2018 dataset and further improved recognition accuracy using SMOTE to reach up to 97.29%. Bansal et al.³³ applied mRMR-PSO for optimal feature selection using HOG features, showing better recognition accuracy on several sign language datasets versus other methods. Miah et al.³⁴ defined BenSignNet for Bengali Sign Language, giving an accuracy of more than 94% on three datasets. Sharma and Singh³⁵ created an ISL dataset and designed a CNN model touting a good performance with little processing time to back it up. Alyami et al.³⁶ introduced a pose-based Transformer utilizing MediaPipe keypoints for hand and face, increasing recognition rate by 4%. Sharma and Singh³⁵ designed an SISLA speech-to-sign avatar system with multilingual capabilities and up to 91% accuracy. Abdul Ameer et al.¹² presented an attention-based LSTM with MediaPipe for temporal gesture recognition and achieved an accuracy of more than 85%. AlKhuraym et al.³⁷ used EfficientNet-Lite0 along with label smoothing to garner a 94% accuracy under background noise-robust conditions. Shanableh³⁸ proposed a two-stage temporal segmentation model with the CNN transfer learning method, beating the prior models with 97.3 and 92.6% word and phrase recognition. Rwelli et al.³⁹ theorized a CNN-based wearable sensor system using DG5-V gloves with real-time vocalized output-the system achieved an accuracy of 90%, enhancing accessibility for the hearing impaired.

Table 1 gives a discussion of the benefits and drawbacks of state-of-the-art approaches on sign language recognition.

Table 1 Analysis on the state-of-art approaches on sign language recognition: Advantages and Drawbacks.

Subjects

Abstract

Similar content being viewed by others

ASLDetect: Arabic sign language detection using ResNet and U-Net like component

Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach

A novel model for expanding horizons in sign Language recognition

Introduction

Contribution

Organization

Literature review

Proposed methodology for Arabian sign language (ArSL) detection

Image acquisition

Arabic Sign Language ArSL dataset

RGB Arabic Alphabets Sign Language Dataset

KArSL database

Pre-processing

Image resizing

L*a*b* color space conversion

Image augmentation

G-TverskyUNet3+: proposed architecture

New GNeXt

GNeXt

Ghost convolution

Multiple skip connections

Attention models

Unet group convolution block

Feature extraction

Feature selection

ArabSignNet-based detection

Result and discussion

Analysis of the proposed model for Dataset 1 (70% variation in training rate)

Analysis of the proposed model for Dataset 1 (80% variation in training rate)

Analysis of the suggested model for Dataset 2 (70% variation in training rate)

Analysis of the proposed model for Dataset 2 (80% variation in training rate)

Analysis of meta-heuristic algorithms with the proposed model

Comparative evaluation of the proposed model on the KArSL dataset against existing Arabic sign language recognition methods

Comparative analysis of the proposed model with meta-heuristic optimization techniques for KArSL dataset

Statistical analysis for Database 1, database 2 and database 3

Ablation study

Cross-validation

Robustness tests

Computational cost analysis

Performance evaluation and comparative analysis of the proposed model across ArSL, RGB Arabic alphabet, and KArSL datasets

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links

Lab* color space conversion