Table 4 Database size and effect on accuracy, overfitting considerations.
Database | Size | Number of samples (training set) | Number of samples (test set) | Effect on accuracy | Overfitting risk |
---|---|---|---|---|---|
Arabic Sign Language (ArSL) Dataset (Database 1) | 15,086 images | 13,926 images | 290 images | Larger training set allows the model to learn more robust features, improving generalization and reducing bias | Overfitting could occur if the model is too complex relative to the dataset size, particularly with deep models like DenseNet and ResNet, which might memorize specific features if not properly regularized |
RGB Arabic Alphabets Sign Language Dataset (Database 2) | 7857 images | 4000 images | 3000 images | A larger dataset contributes to better model performance as it enables better feature extraction and generalization | With larger datasets, overfitting risk is lower but still possible if the model is overtrained on the data without sufficient cross-validation or regularization |
KArSL (Database 3) | 75,300 | 60,240 (80%) | 15,060 (20%) | High accuracy potential due to large dataset; consistent signer data improves learning | Moderate (due to repetitive samples from same signers, may limit generalization) |