Table 4 Database size and effect on accuracy, overfitting considerations.

From: Attention-based hybrid deep learning model with CSFOA optimization and G-TverskyUNet3+ for Arabic sign language recognition

Database

Size

Number of samples (training set)

Number of samples (test set)

Effect on accuracy

Overfitting risk

Arabic Sign Language (ArSL) Dataset (Database 1)

15,086 images

13,926 images

290 images

Larger training set allows the model to learn more robust features, improving generalization and reducing bias

Overfitting could occur if the model is too complex relative to the dataset size, particularly with deep models like DenseNet and ResNet, which might memorize specific features if not properly regularized

RGB Arabic Alphabets Sign Language Dataset (Database 2)

7857 images

4000 images

3000 images

A larger dataset contributes to better model performance as it enables better feature extraction and generalization

With larger datasets, overfitting risk is lower but still possible if the model is overtrained on the data without sufficient cross-validation or regularization

KArSL (Database 3)

75,300

60,240 (80%)

15,060 (20%)

High accuracy potential due to large dataset; consistent signer data improves learning

Moderate (due to repetitive samples from same signers, may limit generalization)