Scientific Reports

Table 1 Analysis on the state-of-art approaches on sign language recognition: Advantages and Drawbacks.

From: Attention-based hybrid deep learning model with CSFOA optimization and G-TverskyUNet3+ for Arabic sign language recognition

References	Technique	Database	Advantages	Disadvantages	Outcomes
Alawwad et al.²⁶	Faster R-CNN, VGG-16, ResNet-18	ArSL images	High accuracy, robust to background changes	Complexity and Computational Cost, Limited Dataset	Accuracy = 93%
Bencherif et al.²⁹	2D point convolution network, 3D CNN skeleton network	ArSL video-based database	Effective for both signer- independent and dependent recognition, new ArSL video-based dataset	Environmental Factors may reduce accuracy	Accuracy = 88.89%
Hisham and Hamouda³⁰	KNN, SVM, AdaBoost, DTW	Self-collected ISL, ASL datasets	High accuracy, enhanced with AdaBoost, prototype on Latte Panda for portability	Accuracy for single-hand gestures lower than double-hand gestures	Accuracy = 92%
Tharwat et al.³¹)	Machine learning	Bare hands with dark/light backgrounds, gloves	High accuracy for Quranic sign language, robust across various backgrounds	User Variability, Lighting Conditions problem	Accuracy = 99%
Alani and Cosma³²	CNN, SMOTE	ArSL2018 dataset	High accuracy improved with SMOTE	Data imbalance addressed post hoc	Accuracy = 97.29%
Rani et al.³³	mRMR-PSO, Histogram of Oriented Gradient (HOG), multi-class SVM	Seven benchmark datasets	Improved classification accuracy with fewer features	Increased computational time	Accuracy = 96.5%
Miah et al.³⁴	CNN, segmentation using YCbCr, HSV, watershed, data augmentation	‘38 BdSL’, 'KU-BdSL’, 'Ishara-Lipi’	High accuracy, improved generalization across datasets	Data Dependency, Computational Complexity	Accuracy = 94%
Sharma and Singh³⁵	Deep learning CNN, data augmentation	ISL, ASL datasets	High accuracy on both ISL and ASL datasets, robust performance	Data Variability, Computational Requirements	Accuracy = 88.01%
Alyami et al.³⁶	MediaPipe pose estimator, LSTM, TCN, Transformer-based models	KArSL-100, LSA64 datasets	High accuracy, importance of combining hand and face keypoints demonstrated	Limited Generalization, Limited Gesture Context	Accuracy = 98.25%
Sharma and Singh³⁵	Speech recognition, translation to ISL, 3D avatars	multi-lingual datasets	High accuracy, useful for educational and communication purposes	Translation Complexity, Avatar Limitations	Accuracy = 89%
Abdul Ameer et al.¹²	MediaPipe & Long Short-Term Memory (LSTM) with Attention Mechanism	DArSL50 dataset	Focuses on relevant data parts using attention; Temporal handling via LSTM	Requires a dataset from multiple volunteers; Limited to predefined gestures	Achieved accuracies of > 85% for individual volunteers and 83% for consolidated data
AlKhuraym et al.³⁷	EfficientNet-Lite 0 architecture	collected real ArSL image	Reduces computing costs while maintaining performance; Effective with background variations	May need more data for real-time adaptation	Achieved 94% accuracy; Effective in real-world scenarios
Shanableh³⁸	Two-stage solution with CNN transfer learning	Arabic sign language dataset	Higher accuracy with word and sentence segmentation; Outperformed previous solutions	Requires precise word count prediction and segmentation accuracy	Achieved 97.3% word recognition and 92.6% sentence recognition
Rwelli et al.³⁹	Wearable sensor, Convolutional Neural Network (CNN)	DG5-V hand gloves with wearable sensors	Efficient in recognizing Arabic sign language with 30 letters; User accessibility	Limited to predefined set of gestures; Wearable sensor dependency	Achieved 90% accuracy in user recognition

Back to article page

Search

Advanced search

Quick links