Table 1 Analysis on the state-of-art approaches on sign language recognition: Advantages and Drawbacks.

From: Attention-based hybrid deep learning model with CSFOA optimization and G-TverskyUNet3+ for Arabic sign language recognition

References

Technique

Database

Advantages

Disadvantages

Outcomes

Alawwad et al.26

Faster R-CNN, VGG-16, ResNet-18

ArSL images

High accuracy, robust to background changes

Complexity and Computational Cost, Limited Dataset

Accuracy = 93%

Bencherif et al.29

2D point convolution network, 3D CNN skeleton network

ArSL video-based database

Effective for both signer- independent and dependent recognition, new ArSL video-based dataset

Environmental Factors may reduce accuracy

Accuracy = 88.89%

Hisham and Hamouda30

KNN, SVM, AdaBoost, DTW

Self-collected ISL, ASL datasets

High accuracy, enhanced with AdaBoost, prototype on Latte Panda for portability

Accuracy for single-hand gestures lower than double-hand gestures

Accuracy = 92%

Tharwat et al.31)

Machine learning

Bare hands with dark/light backgrounds, gloves

High accuracy for Quranic sign language, robust across various backgrounds

User Variability, Lighting Conditions problem

Accuracy = 99%

Alani and Cosma32

CNN, SMOTE

ArSL2018 dataset

High accuracy improved with SMOTE

Data imbalance addressed post hoc

Accuracy = 97.29%

Rani et al.33

mRMR-PSO, Histogram of Oriented Gradient (HOG), multi-class SVM

Seven benchmark datasets

Improved classification accuracy with fewer features

Increased computational time

Accuracy = 96.5%

Miah et al.34

CNN, segmentation using YCbCr, HSV, watershed, data augmentation

‘38 BdSL’, 'KU-BdSL’, 'Ishara-Lipi’

High accuracy, improved generalization across datasets

Data Dependency, Computational Complexity

Accuracy = 94%

Sharma and Singh35

Deep learning CNN, data augmentation

ISL, ASL datasets

High accuracy on both ISL and ASL datasets, robust performance

Data Variability, Computational Requirements

Accuracy = 88.01%

Alyami et al.36

MediaPipe pose estimator, LSTM, TCN, Transformer-based models

KArSL-100, LSA64 datasets

High accuracy, importance of combining hand and face keypoints demonstrated

Limited Generalization, Limited Gesture Context

Accuracy = 98.25%

Sharma and Singh35

Speech recognition, translation to ISL, 3D avatars

multi-lingual datasets

High accuracy, useful for educational and communication purposes

Translation Complexity, Avatar Limitations

Accuracy = 89%

Abdul Ameer et al.12

MediaPipe & Long Short-Term Memory (LSTM) with Attention Mechanism

DArSL50 dataset

Focuses on relevant data parts using attention; Temporal handling via LSTM

Requires a dataset from multiple volunteers; Limited to predefined gestures

Achieved accuracies of > 85% for individual volunteers and 83% for consolidated data

AlKhuraym et al.37

EfficientNet-Lite 0 architecture

collected real ArSL image

Reduces computing costs while maintaining performance; Effective with background variations

May need more data for real-time adaptation

Achieved 94% accuracy; Effective in real-world scenarios

Shanableh38

Two-stage solution with CNN transfer learning

Arabic sign language dataset

Higher accuracy with word and sentence segmentation; Outperformed previous solutions

Requires precise word count prediction and segmentation accuracy

Achieved 97.3% word recognition and 92.6% sentence recognition

Rwelli et al.39

Wearable sensor, Convolutional Neural Network (CNN)

DG5-V hand gloves with wearable sensors

Efficient in recognizing Arabic sign language with 30 letters; User accessibility

Limited to predefined set of gestures; Wearable sensor dependency

Achieved 90% accuracy in user recognition