Table 2 Table of dataset details.
Dataset | Full name & URL | Image type & resolution | Total samples used | Classes (count) | Train/test split | Class distribution & imbalance handling |
|---|---|---|---|---|---|---|
MNIST | MNIST Handwritten Digit Database, | Grayscale, $28\times 28$ | 70,000 (60,000 train; 10,000 test) | 10 digits (0–9) | 80/20 | Balanced across 10 classes; no special handling required |
CIFAR-10 | CIFAR-10 Object Recognition Dataset, | RGB, $32\times 32$ | 60,000 (50,000 train; 10,000 test) | 10 object categories (aeroplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) | 80/20 | Balanced across 10 classes; no special handling required |
Skin Cancer (Binary) | Skin Cancer: Malignant vs. Benign (Kaggle), https://www.kaggle.com/datasets/fanconic/skin-cancer-malignant-vs-benign | RGB dermoscopy, up to $1024\times 1024 \rightarrow$ resized to $150\times 150$ | 3297 (1800 benign; 1497 malignant) | 2 classes:—Benign (1800);—Malignant (1497) | 80/20 | Imbalanced (malignant $\approx$45%); addressed via:—Weighted cross-entropy loss (higher weight for malignant class);—Augmentation: rotation, flips, zoom to oversample malignant cases |
HAM10000 | HAM10000 (“Human Against Machine with 10,000 training images”), https://dataverse.harvard.edu/dataset.xhtml? persistentId = https://doi.org/10.7910/DVN/DBW86T | RGB dermatoscopic images, typically $600\times 450$ pixels | 10,015 images | 7 classes of pigmented lesions: Melanocytic nevi (NV), Melanoma (MEL), Benign keratosis-like lesions (BKL), Basal cell carcinoma (BCC), Actinic keratoses (AKIEC), Vascular lesions (VASC), Dermatofibroma (DF) | No official split; commonly used splits vary (e.g., 80/20 or 90/10 for train/test) | Highly imbalanced: Melanocytic Nevi (NV) comprises $ > 60\%$ % of samples. Melanoma (MEL) is $\approx 10\%$. Handled in studies using: Focal loss, oversampling, and weighted approaches |