Deep visual detection system for oral squamous cell carcinoma

Akram, Kainat; Aslam, Muhammad; Waheed, Talha; Ayesha, Noor; Alamri, Faten S.; Mirdad, Abeer Rashad; Rehman, Amjad

doi:10.1038/s41598-025-34332-5

Download PDF

Article
Open access
Published: 19 January 2026

Deep visual detection system for oral squamous cell carcinoma

Kainat Akram¹,
Muhammad Aslam¹,
Talha Waheed¹,
Noor Ayesha²,
Faten S. Alamri³,
Abeer Rashad Mirdad⁴ &
…
Amjad Rehman⁴

Scientific Reports volume 16, Article number: 4219 (2026) Cite this article

1019 Accesses
Metrics details

Subjects

Abstract

Background

Oral Squamous Cell Carcinoma (OSCC) is a widespread and aggressive malignancy where early and accurate detection is essential for improving patient outcomes. Traditional diagnostic methods relying on histopathological examination are often time-consuming, resource-intensive, and susceptible to subjective interpretation. Moreover, inter-observer variability can further compromise diagnostic consistency, leading to delays in timely intervention. In recent years, advances in Artificial Intelligence (AI) and computer-aided diagnostic systems have shown transformative potential in medical imaging, enabling faster, objective, and reproducible detection of complex disease patterns. Particularly, deep learning–based models have demonstrated remarkable accuracy in histopathological analysis, making them promising tools for OSCC diagnosis and early clinical decision-making. Methods: This study introduces a Deep Visual Detection System (DVDS) designed to automate OSCC detection using histopathological images. Three convolutional neural network (CNN) models—EfficientNetB3, DenseNet121, and ResNet50—were trained and evaluated on two publicly available datasets: the Kaggle Oral Cancer Detection dataset containing 5192 images labeled as Normal or OSCC, and the NDB-UFES dataset comprising 3763 images categorized into OSCC, leukoplakia with dysplasia, and leukoplakia without dysplasia. Data augmentation techniques were employed to mitigate class imbalance and enhance model generalization, while advanced image preprocessing methods and training strategies such as EarlyStopping and ReduceLROnPlateau were applied to ensure stable convergence. Results Among the models tested, EfficientNetB3 consistently delivered superior performance across both datasets. On the binary classification task, it achieved a test accuracy of 97.05%, with precision, recall, and F1-score all at 97.05%, specificity of 97.17%, and sensitivity of 96.92%. On the multi-class NDB-UFES dataset, it again outperformed the other models, attaining a 97.16% accuracy, matching precision, recall, and F1-score, and specificity of 98.58%. In contrast, DenseNet121 and ResNet50 showed substantially lower accuracy scores in both experiments. Conclusion: These results highlight the importance of model architecture and preprocessing in medical image classification tasks. The proposed Deep Visual Detection System (DVDS), built upon EfficientNetB3, demonstrates high reliability and robustness, suggesting strong potential for deployment in clinical settings to aid pathologists in rapid and consistent OSCC diagnosis. This approach could significantly streamline diagnostic workflows and support early intervention strategies, ultimately enhancing patient care.

Deep transfer learning with improved crayfish optimization algorithm for oral squamous cell carcinoma cancer recognition using histopathological images

Article Open access 25 October 2024

Deep learning–based artificial intelligence models predict survival in patients with oral cavity squamous cell carcinoma

Article Open access 10 December 2025

Effectiveness of deep learning classifiers in histopathological diagnosis of oral squamous cell carcinoma by pathologists

Article Open access 19 July 2023

Introduction

Oral cancer remains one of the most prevalent forms of cancer globally, ranking sixth among the most common types. Specifically, oral squamous cell carcinoma (OSCC) constitutes nearly 90% of the aggressive cases, posing significant health risks to affected individuals¹. The World Health Organization (WHO) highlights that approximately 657,000 new cases of oral cancer are diagnosed each year globally, leading to more than 330,000 fatalities annually. This alarming statistic underscores the severe impact of OSCC, especially prevalent in developing countries across South and Southeast Asia, where incidence rates are nearly double the global average. India, in particular, accounts for one-third of the global OSCC cases, emphasizing a substantial healthcare burden in the region. Likewise, Pakistan, oral cancer represents the most frequent cancer among males and ranks second among females, indicating a critical public health challenge².

Further epidemiological studies reveal a significant gender disparity in oral cancer incidence, with men being approximately 2.5 times more likely to develop the disease compared to women. This discrepancy largely results from lifestyle factors such as tobacco use and alcohol consumption³. Even in developed nations, the incidence is notably rising. For example, in the United States, the American Cancer Society’s 2023 survey projected around 54,540 new cases and approximately 7,400 deaths annually, demonstrating the escalating burden of OSCC despite advanced healthcare infrastructure⁴. Additionally, the GLOBOCAN 2022 database indicated oral cancer constituted 1.9% of all cancers in 2022, resulting in about 188,230 deaths worldwide⁵. These statistics strongly advocate for more effective detection methods and treatment options, especially in regions heavily affected by this malignancy¹.

OSCC typically originates from the squamous cells lining the mouth’s internal surfaces, including the tongue, gums, lips, and cheeks. Annually, this cancer records around 350,000 to 400,000 new cases globally, primarily affecting men and closely associated with risk factors such as tobacco and alcohol usage, and human papillomavirus (HPV) infections⁶. Although OSCC ranks sixth in global cancer incidence, it becomes the eighth leading cause of cancer-related mortality among men. Due to a lack of distinct early symptoms, diagnosis relies on lesion characteristics such as size, color, texture and patient habits like smoking and alcohol intake⁷. Unfortunately, OSCC often presents at advanced stages, hindering treatment effectiveness and resulting in a poor five-year survival rate of approximately 50%⁸. This delay in detection also escalates healthcare costs, estimated at over $2 billion annually due to the tumor’s complex and diverse nature⁹.

The early detection of OSCC critically enhances patient outcomes, potentially elevating the five-year survival rate from approximately 20–30% at advanced stages to nearly 80% when detected early¹⁰. Rapid and precise diagnosis supports personalized treatment strategies, such as surgery, radiation, or chemotherapy^11,12. However, conventional diagnostic methods face notable limitations, including substantial reliance on expert pathologists, time-consuming processes, and considerable susceptibility to errors arising from subjective evaluations influenced by factors such as staining quality and microscope type^13,14. Such variability can result in delayed or incorrect diagnoses, significantly impacting the effectiveness of subsequent clinical interventions¹¹.

In addressing these diagnostic challenges, advancements in artificial intelligence (AI) and computer-aided diagnostic systems are increasingly shaping modern healthcare, offering transformative potential for the early detection and management of cancers, including OSCC. In particular, deep learning (DL) has shown great promise^15,16. Unlike traditional machine learning, DL techniques, including deep neural networks (DNNs) and convolutional neural networks (CNNs), enable automated feature extraction directly from raw data, significantly improving classification accuracy and diagnostic reliability^17,18. Such DL models demonstrate superior performance, particularly in medical imaging applications^19,20, as they inherently overcome limitations associated with manual feature selection and offer rapid inference critical for timely clinical decisions²¹.

CNN architectures, specifically, are extensively used for OSCC detection, leveraging multiple convolutional layers to perform complex feature extraction, followed by dense layers dedicated to the classification tasks^22,23. The CNN structure optimizes accuracy through parameters adjusted during training, effectively recognizing distinctive image features indicative of malignancy^24,25. Among various CNN architectures explored, EfficientNet, ResNet, and DenseNet stand out due to their robust performance. EfficientNet, introduced by Tan and Le²⁶, incorporates compound scaling across width, depth, and resolution dimensions, delivering high classification accuracy while minimizing computational resources^27,28. EfficientNetB3, specifically, is optimal for OSCC detection due to its balanced scaling approach, improved feature extraction capability, rapid inference speed, and lower computational costs compared to earlier versions²⁹. Moreover, EfficientNetB3 was preferred over newer variants such as EfficientNetV2 and Vision Transformer (ViT) because it offers an ideal trade-off between performance and computational efficiency, making it more suitable for histopathological image analysis with limited data availability³⁰.

Similarly, ResNet, significantly enhances deep network performance through residual connections, effectively combating issues such as vanishing gradients common in deep neural architectures³¹. ResNet50, a popular variant within this family, particularly suits OSCC detection due to its sophisticated 50-layer architecture, exceptional feature extraction capabilities, computational efficiency, and strong validation across diverse medical imaging tasks^32,33.

DenseNet, introduced by Huang et al.³⁴, employs densely connected networks that maximize feature reuse, minimize parameter redundancy, and ensure stable gradient flow. DenseNet121, the most popular variant, demonstrates high accuracy with fewer parameters, improved computational efficiency, and excellent performance in medical imaging tasks. These attributes make DenseNet121 highly suitable for precise detection of OSCC^35,36.

The primary contributions of this research are as follows:

We present Deep Visual Detection System (DVDS), a deep learning framework for automatic OSCC detection using histopathological images.
Built on EfficientNetB3, the model balances high accuracy with computational efficiency for medical image classification.
Evaluated on two public datasets: Kaggle (binary) and NDB-UFES (multi-class: OSCC, leukoplakia with/without dysplasia).
Applied image preprocessing (histogram equalization, normalization, augmentation) and training techniques (EarlyStopping, ReduceLROnPlateau) for improved performance.
Achieved 97.05% (binary) and 97.16% (multi-class) accuracy, outperforming DenseNet121 and ResNet50 across all metrics.
DVDS demonstrates strong clinical potential for accurate, consistent, and rapid OSCC diagnosis.

This paper is organized as follows: Section “Related work” reviews related work in deep learning for medical image analysis. Section "Employed datasets" details the employed datasets. Section “”Methodology" explains the methodology, including preprocessing, model architecture, and training strategies. Section "Results" presents the results, while Section "Discussions" offers a detailed discussion and performance comparison. Section "Conclusion" concludes the study, and Section "Future work" outlines future research directions.

Related work

Oral cancer remains one of the most prevalent cancers worldwide, largely due to its late-stage diagnosis, emphasizing the critical need for enhanced early detection techniques³⁷. Recent advancements in deep learning (DL), especially convolutional neural networks (CNNs), have substantially improved diagnostic capabilities. For instance, a Deep Convolutional Neural Network (FJWO-DCNN) trained on the BAHNO NMDS dataset achieved 93% accuracy, although generalization concerns persist due to limited dataset size. Another significant study leveraged transfer learning with AlexNet to classify oral squamous cell carcinoma (OSCC) biopsy images, reporting an accuracy of 90.06%, underscoring the efficacy of pretrained networks in medical imaging tasks³⁸. Further developments in DL have incorporated multimodal imaging, significantly enhancing diagnostic accuracy. Liana et al. combined brightfield and fluorescence microscopy techniques, applying a Co-Attention Fusion Network (CAFNet), which attained an accuracy of 91.79%, surpassing human diagnostic performance³⁹. Yang et al. utilized magnetic resonance imaging (MRI) in a three-stage DL model for detecting cervical lymph node metastasis (LNM) in OSCC patients, achieving a notable AUC of 0.97, thereby substantially reducing rates of occult metastasis⁴⁰. Mario et al. integrated DL with Case-Based Reasoning (CBR), achieving 92% accuracy in lesion classification using a redesigned Faster R-CNN architecture validated through a publicly accessible dataset⁴¹.

Studies comparing CNNs to traditional machine learning methods like SVM and KNN indicate superior performance by CNNs, achieving up to 97.21% accuracy in segmentation and classification tasks⁴². Additionally, 3D CNN models have proven more effective than traditional 2D CNNs in analyzing spatial and temporal imaging data for early OSCC detection⁴³. Navarun et al. noted performance issues when entirely replacing CNN layers during transfer learning, suggesting partial layer training as a more efficient alternative for smaller datasets⁴⁴. Other investigations using fuzzy classifiers reached an accuracy of 95.70%⁴⁵, CNN achieved 78.20% in cervical lymph node diagnosis⁴⁶, and fluorescent confocal microscopy images provided accuracy up to 77.89%⁴⁷. Hyperspectral image-based CNN approaches reported an accuracy of 94.5%⁴⁸, and CNN successfully differentiated oral cancer types on MRI images with 96.50% accuracy⁴⁹. Meanwhile, feature extraction methods using SVM classifiers yielded 91.64% accuracy in oral cancer diagnosis⁵⁰.

Predictive modeling using Artificial Neural Networks (ANN) for oral cancer risk demonstrated promising results, reaching an accuracy of 78.95%, suggesting significant potential for clinical use⁵¹. Deep learning applied to histopathological data has notably improved classification accuracy, clearly distinguishing OSCC from leukoplakia with an accuracy rate of 83.24%⁵². Confocal Laser Endomicroscopy (CLE) combined with DL techniques also showed excellent diagnostic accuracy (88.3%), highlighting clinical applicability⁵³. Prognostic predictions using machine learning have shown effectiveness in forecasting clinical outcomes, with decision tree and logistic regression methods achieving accuracies between 60 to 76% for disease progression and survival outcomes^54,55. DenseNet201 consistently demonstrated high OSCC detection accuracy at 91.25%⁵⁶.

Custom CNN architectures leveraging ResNet50 and DenseNet201 for feature extraction reported up to 92% classification accuracy⁵⁷. Gradient-weighted class activation mapping and handheld imaging devices achieved significant diagnostic accuracies up to 86.38%, overcoming challenges posed by varied imaging conditions^58,59. Moreover, 3D CNN-based algorithms demonstrated superior performance compared to traditional 2D CNN methods, offering better early detection accuracy⁶⁰. Integration of gene-expression profiles and clinical data using machine learning algorithms like XGBoost predicted OSCC patient survival outcomes with an accuracy of 82%⁶¹.

Transfer learning consistently achieved high diagnostic accuracy above 90%, proving effective across diverse datasets^62,63. Faster R-CNN models delivered reliable lesion detection with an F1 score of 79.31%⁶⁴. Studies evaluating various CNN models such as EfficientNetB3, DenseNet201, and Vision Transformers (ViT) regularly demonstrated high diagnostic accuracy levels, generally exceeding 90%, despite persistent challenges such as interpretability, class imbalance, and generalizability⁶⁵. ViT models, in particular, achieved remarkable classification accuracy, up to 97%, indicating robust performance even under dataset constraints⁶⁶.

In conclusion, the application of deep learning and artificial intelligence methods has greatly advanced oral cancer detection and prognosis. Ongoing innovations in multimodal data integration and model optimization promise further improvements, ultimately enhancing clinical decision-making and patient outcomes.

Challenges noted in recent research

The previous research studies exhibit the following challenges:

Most studies used limited or single-source datasets, reducing model generalizability across diverse clinical settings^37,67,68
Many approaches relied on time-consuming or computationally expensive methods, limiting real-time applicability^58,60
Reported accuracies were often low with incomplete or inconsistent performance metrics^61,64
External validation and class-wise performance evaluation were frequently missing, limiting insight into model reliability⁶⁹
Focus remained on binary classification, ignoring multi-class or stage-specific detection of OSCC^58,61.

The proposed work directly addresses these challenges by utilizing multi-source datasets, comprehensive evaluation metrics, and optimized deep learning architectures to enhance model generalizability and diagnostic reliability.

Employed datasets

Two histopathological image datasets were used in this study: the Kaggle OSCC Detection Dataset and the NDB-UFES Oral Cancer Dataset. A summary comparison of both datasets is provided below.

Kaggle OSCC dataset

The Kaggle dataset consists of 1,224 original histopathological images (100× and 400× magnification), later expanded through augmentation to a total of 5,192 images. These images include normal epithelium and OSCC tissue samples collected from 230 patients. The dataset is publicly accessible on Kaggle under the CC0: Public Domain license, allowing unrestricted use for research, including CNN-based histopathological image classification⁷⁰. Table 1 presents Kaggle OSCC dataset.

a.
Data Collection: Tissue samples were collected and prepared by medical experts, followed by image acquisition using a Leica ICC50 HD microscope at 100× and 400× magnifications. To enhance cellular and structural details, all slides were stained with Hematoxylin and Eosin (H&E), ensuring clear visualization of histopathological features. Figure 1 shows sample images from the OSCC dataset.
b.
Data Division: The dataset, comprising Normal and OSCC classes, was split into training (70%), validation (15%), and testing (15%) sets using stratified sampling to preserve class balance. This ensured fair training, tuning, and evaluation while reducing overfitting. Table 2 summarizes the class wise division of the Kaggle Oral Squamous Cell Carcinoma dataset for training, validation, and test.

Figure 2 illustrates the class balance in the oral cancer dataset, with roughly 52% OSCC and 48% Normal cases.
c.
Data Preprocessing: Preprocessing ensured input consistency, improved model performance, and minimized overfitting. It involved image resizing, enhancement, and augmentation.
d.
Image Resizing: Images were resized to 300 × 300 for EfficientNetB3 and 224 × 224 for DenseNet121 and ResNet50 to meet model input requirements while preserving key histological features.
e.
Data Augmentation: Horizontal flipping was applied to the training set to enhance robustness and prevent overfitting. No augmentation was applied to validation or test sets, and test set shuffling was disabled for consistent evaluation.

Table 1 Summary of original images from Kaggle OSCC dataset.

Subjects

Abstract

Background

Similar content being viewed by others

Deep transfer learning with improved crayfish optimization algorithm for oral squamous cell carcinoma cancer recognition using histopathological images

Deep learning–based artificial intelligence models predict survival in patients with oral cavity squamous cell carcinoma

Effectiveness of deep learning classifiers in histopathological diagnosis of oral squamous cell carcinoma by pathologists

Introduction

Related work

Challenges noted in recent research

Employed datasets

Kaggle OSCC dataset

NDB-UFES OSCC dataset

Methodology

Training strategy

Convolutional feature extractor (base model)

Employed deep learning models

Evaluation metrics

Experimental SETUP

Results

EfficientNetB3

DenseNet121

ResNet50

Comparison analysis of EfficientNetB3, DenseNet121, ResNet50 on Kaggle and NDB-UFES OSCC datasets

Comparative analysis with existing methods

Discussions

Conclusion

Future work

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links