Table 24 Summary of recent deep learning studies on oral squamous cell carcinoma (OSCC) detection.
From: Deep visual detection system for oral squamous cell carcinoma
Article title | Year | Methodology | Dataset | Accuracy |
|---|---|---|---|---|
Ariji et al.46 | 2019 | Deep learning | CT images | 78.20% |
Welikala et al.77 | 2020 | ResNet101 | Histopathological images | 78.30% |
Aubreville et al.53 | 2017 | Deep Learning | Not specified | 80.01% |
Begum et al.56 | 2023 | DenseNet201 | Histopathological images | 91.25% |
Ghosh et al.78 | 2022 | DRNN | Private | 83.33% |
Figueroa et al.58 | 2022 | VGG19, & Resnet50 | Private | 84.84% |
Warin et al.64 | 2021 | R-CNN | 700 clinical oral images | Precision 76%, Recall 82%, F1 score 79% |
Jubair et al.79 | 2020 | EfficientNet-B0 | 716 clinical images | 85.0% |
Bansal et al.69 | 2022 | ResNet50, VGG19, and DenseNet | 5072 images | DenseNet produced best accuracy 92% |
Hemalatha et al.37 | 2022 | FJWO-DCNN | BAHNO NMDS dataset | 93% |
Rahman et al.38 | 2022 | AlexNet | 4946 images kaggle dataset | 90.06% |
Lima et al.52 | 2023 | MetaBlock fusion & ResNetV2 | NDB-UFES dataset | 83.24% |
Maia et al.80 | 2024 | DenseNet-121 | NDB-UFES dataset | 91.91% |
Tafala et al.81 | 2025 | DeepPatchNet | NDB-UFES dataset | 86.71% |
Pham et al.82 | 2024 | InceptionResNet-v2 + SVM, ViT + SVM, Fusion strategy | NDB-UFES dataset | Accuracy: 91.7%, AUC: 0.985 |
Uliana et al.83 | 2025 | Diffusion Models vs CNNs and Transformers, Lower BAC for 6-class PAD-UFES-20; | PAD-UFES-20, NDB-UFES dataset | PAD-UFES-20: 64.57% (6-class), 83.57% (binary) NDB-UFES: Accuracy: 90.50% |
Mandal et al.84 | 2025 | ResNet50 + SVM, KNN, DT, RF, LR, NB, Clinical Data Fusion | 237 OSCC + leukoplakia | KNN: 89%, SVM: 81% |
Liao et al.85 | 2025 | Momentum Contrastive Learning (MoCo), ViT, Self-supervised pretraining Didn’t tested NDB-UFES | NDB-UFES histopathological Images OralHist Dataset | Model classes Separation AUROC: 99.4% (NDB-UFES), AUROC: 94.8% (OralHist) |
Hadilou et al.66 | 2025 | ViT vs VGG16 | 2 online DBs | ViT: 94% (3-class), 97% (4-class) |
Kaur et al.65 | 2025 | EfficientNetB3, DenseNet201 | 5192 images, Kaggle Dataset | 0.92%, 0.91% |
Justaniah et al.68 | 2025 | ResNet50, EfficientNetB0, DenseNet121, VGG16 | Spectral oral images from Kaggle | ResNet50 (fine-tuned): 77% (acc), 64% (F1) |
Olivos et al.86 | 2024 | Deep CNN (MobileNet) + Data Augmentation | 131 images, Kaggle dataset | 90.9% accuracy, AUC = 0.91 |
Prado et al.67 | 2025 | MobileNet-V2, VGG16 | 5192 images kaggle dataset | MobileNet-V2: 97% |
Anitha et al.87 | 2024 | DenseNet121, Binary | 5192 images kaggle dataset | Accuracy: 97% |
Proposed Approach Deep Visual Detection System (DVDS) | 2025 | (Proposed) Deep Visual Detection System (DVDS), Data Augmentation, image preprocessing Techniques, Callbacks used: Early Stopping & Reducelronplateau EfficientNetB3 with augmentation and preprocessing techniques, DenseNet121 & ResNet50 were also applied but showed significantly lower performance | Used 2 datasets: Dataset1: 5192 histopathological images, Kaggle Dataset2: 3763 NDB-UFES histopathological images | DATASET1: EfficientNet-B3 Test Accuracy: 0.9704, Train Accuracy: 0.9901, Validation Accuracy: 0.9769 DenseNet121: Test Accuracy: 0.8691 ResNet50: Test Accuracy: 0.5956 DATASET2: EfficientNet-B3 Test Accuracy: 0.9715, Train Accuracy: 0.9969, Validation Accuracy: 0.9749 DenseNet121: Test Accuracy: 0.5477 ResNet50: Test Accuracy: 0.5570 |