Table 1 Analysis on the state-of-art approaches on sign language recognition: Advantages and Drawbacks.
References | Technique | Database | Advantages | Disadvantages | Outcomes |
---|---|---|---|---|---|
Alawwad et al.26 | Faster R-CNN, VGG-16, ResNet-18 | ArSL images | High accuracy, robust to background changes | Complexity and Computational Cost, Limited Dataset | Accuracy = 93% |
Bencherif et al.29 | 2D point convolution network, 3D CNN skeleton network | ArSL video-based database | Effective for both signer- independent and dependent recognition, new ArSL video-based dataset | Environmental Factors may reduce accuracy | Accuracy = 88.89% |
Hisham and Hamouda30 | KNN, SVM, AdaBoost, DTW | Self-collected ISL, ASL datasets | High accuracy, enhanced with AdaBoost, prototype on Latte Panda for portability | Accuracy for single-hand gestures lower than double-hand gestures | Accuracy = 92% |
Tharwat et al.31) | Machine learning | Bare hands with dark/light backgrounds, gloves | High accuracy for Quranic sign language, robust across various backgrounds | User Variability, Lighting Conditions problem | Accuracy = 99% |
Alani and Cosma32 | CNN, SMOTE | ArSL2018 dataset | High accuracy improved with SMOTE | Data imbalance addressed post hoc | Accuracy = 97.29% |
Rani et al.33 | mRMR-PSO, Histogram of Oriented Gradient (HOG), multi-class SVM | Seven benchmark datasets | Improved classification accuracy with fewer features | Increased computational time | Accuracy = 96.5% |
Miah et al.34 | CNN, segmentation using YCbCr, HSV, watershed, data augmentation | ‘38 BdSL’, 'KU-BdSL’, 'Ishara-Lipi’ | High accuracy, improved generalization across datasets | Data Dependency, Computational Complexity | Accuracy = 94% |
Sharma and Singh35 | Deep learning CNN, data augmentation | ISL, ASL datasets | High accuracy on both ISL and ASL datasets, robust performance | Data Variability, Computational Requirements | Accuracy = 88.01% |
Alyami et al.36 | MediaPipe pose estimator, LSTM, TCN, Transformer-based models | KArSL-100, LSA64 datasets | High accuracy, importance of combining hand and face keypoints demonstrated | Limited Generalization, Limited Gesture Context | Accuracy = 98.25% |
Sharma and Singh35 | Speech recognition, translation to ISL, 3D avatars | multi-lingual datasets | High accuracy, useful for educational and communication purposes | Translation Complexity, Avatar Limitations | Accuracy = 89% |
Abdul Ameer et al.12 | MediaPipe & Long Short-Term Memory (LSTM) with Attention Mechanism | DArSL50 dataset | Focuses on relevant data parts using attention; Temporal handling via LSTM | Requires a dataset from multiple volunteers; Limited to predefined gestures | Achieved accuracies of > 85% for individual volunteers and 83% for consolidated data |
AlKhuraym et al.37 | EfficientNet-Lite 0 architecture | collected real ArSL image | Reduces computing costs while maintaining performance; Effective with background variations | May need more data for real-time adaptation | Achieved 94% accuracy; Effective in real-world scenarios |
Shanableh38 | Two-stage solution with CNN transfer learning | Arabic sign language dataset | Higher accuracy with word and sentence segmentation; Outperformed previous solutions | Requires precise word count prediction and segmentation accuracy | Achieved 97.3% word recognition and 92.6% sentence recognition |
Rwelli et al.39 | Wearable sensor, Convolutional Neural Network (CNN) | DG5-V hand gloves with wearable sensors | Efficient in recognizing Arabic sign language with 30 letters; User accessibility | Limited to predefined set of gestures; Wearable sensor dependency | Achieved 90% accuracy in user recognition |