Table 24 Summary of recent deep learning studies on oral squamous cell carcinoma (OSCC) detection.

From: Deep visual detection system for oral squamous cell carcinoma

Article title

Year

Methodology

Dataset

Accuracy

Ariji et al.46

2019

Deep learning

CT images

78.20%

Welikala et al.77

2020

ResNet101

Histopathological images

78.30%

Aubreville et al.53

2017

Deep Learning

Not specified

80.01%

Begum et al.56

2023

DenseNet201

Histopathological images

91.25%

Ghosh et al.78

2022

DRNN

Private

83.33%

Figueroa et al.58

2022

VGG19, & Resnet50

Private

84.84%

Warin et al.64

2021

R-CNN

700 clinical oral images

Precision 76%, Recall 82%, F1 score 79%

Jubair et al.79

2020

EfficientNet-B0

716 clinical images

85.0%

Bansal et al.69

2022

ResNet50, VGG19, and DenseNet

5072 images

DenseNet produced best accuracy 92%

Hemalatha et al.37

2022

FJWO-DCNN

BAHNO NMDS dataset

93%

Rahman et al.38

2022

AlexNet

4946 images kaggle dataset

90.06%

Lima et al.52

2023

MetaBlock fusion & ResNetV2

NDB-UFES dataset

83.24%

Maia et al.80

2024

DenseNet-121

NDB-UFES dataset

91.91%

Tafala et al.81

2025

DeepPatchNet

NDB-UFES dataset

86.71%

Pham et al.82

2024

InceptionResNet-v2 + SVM, ViT + SVM, Fusion strategy

NDB-UFES dataset

Accuracy: 91.7%,

AUC: 0.985

Uliana et al.83

2025

Diffusion Models vs CNNs and Transformers,

Lower BAC for 6-class PAD-UFES-20;

PAD-UFES-20,

NDB-UFES dataset

PAD-UFES-20:

64.57% (6-class),

83.57% (binary)

NDB-UFES: Accuracy: 90.50%

Mandal et al.84

2025

ResNet50 + SVM, KNN, DT, RF, LR, NB, Clinical Data Fusion

237 OSCC + leukoplakia

KNN: 89%, SVM: 81%

Liao et al.85

2025

Momentum Contrastive Learning (MoCo), ViT, Self-supervised pretraining

Didn’t tested NDB-UFES

NDB-UFES histopathological

Images

OralHist Dataset

Model classes Separation

AUROC: 99.4% (NDB-UFES),

AUROC: 94.8% (OralHist)

Hadilou et al.66

2025

ViT vs VGG16

2 online DBs

ViT: 94% (3-class), 97% (4-class)

Kaur et al.65

2025

EfficientNetB3, DenseNet201

5192 images, Kaggle Dataset

0.92%, 0.91%

Justaniah et al.68

2025

ResNet50, EfficientNetB0, DenseNet121, VGG16

Spectral oral images from Kaggle

ResNet50 (fine-tuned): 77% (acc), 64% (F1)

Olivos et al.86

2024

Deep CNN (MobileNet) + Data Augmentation

131 images, Kaggle dataset

90.9% accuracy, AUC = 0.91

Prado et al.67

2025

MobileNet-V2, VGG16

5192 images kaggle dataset

MobileNet-V2: 97%

Anitha et al.87

2024

DenseNet121, Binary

5192 images kaggle dataset

Accuracy: 97%

Proposed Approach

Deep Visual Detection System (DVDS)

2025

(Proposed) Deep Visual Detection System (DVDS),

Data Augmentation, image preprocessing Techniques,

Callbacks used:

Early Stopping & Reducelronplateau

EfficientNetB3 with augmentation and preprocessing techniques,

DenseNet121 & ResNet50 were also applied but showed significantly lower performance

Used 2 datasets:

Dataset1: 5192 histopathological images, Kaggle

Dataset2: 3763 NDB-UFES histopathological

images

DATASET1:

EfficientNet-B3

Test Accuracy: 0.9704,

Train Accuracy: 0.9901, Validation Accuracy: 0.9769

DenseNet121:

Test Accuracy: 0.8691

ResNet50:

Test Accuracy: 0.5956

DATASET2:

EfficientNet-B3

Test Accuracy: 0.9715,

Train Accuracy: 0.9969, Validation Accuracy: 0.9749

DenseNet121:

Test Accuracy: 0.5477

ResNet50:

Test Accuracy: 0.5570