Introduction

Oral squamous cell carcinoma (OSCC) accounts for over 90% of the head and neck cancers1 and 94% of the oral cavity cancers worldwide2. It is the 16th most leading cancer in the world amongst both sexes and all ages and ranks 16th and 15th for incidence and mortality rates respectively3. The problem is significantly higher in India as it is one of the top 3 leading cancers with being the top and fourth most frequent cancer for males and females respectively. It has an incidence rate of 9.9 and mortality rate of 5.6%4. India has one third of the oral cancer cases in the world5 and oral cancer accounts for around 30% of all cancers in India6.

Oral cancer affects various sites like the tongue, cheeks, palate, lips, and gingiva, significantly impacting swallowing, chewing, breathing, and speech, and posing a life-threatening risk7. Tobacco use is a key factor in OSCC occurrence and progression8. Despite advancements in treatment and screening, OSCC incidence is rising, with high mortality and morbidity9,10,11. Early detection improves survival, yet most cases are diagnosed at advanced stages, reducing prognosis12. Treatments like surgery, chemotherapy, and radiotherapy exist, but the five-year survival rate remains below 50%13,14. Locoregional recurrence affects 60% of advanced and 30% of early OSCC cases, with recurrence rates higher in patients with histologic positive tumor margins, indicating the need for better predictive markers15.

Margin status is an important prognostic factor for OSCC, and suitable surgical resection is vital for local control and prognosis16. During oral cancer surgery, it is important to achieve adequate resection margins to improve patient prognosis. Surgeons have this important task of attaining adequate resection and conserving reasonable remaining function and satisfactory physical appearance, while relying only on preoperative imaging, visual inspection, and palpation17. However, the definition of “clear margin” stays debatable16. To improve patient’s outcome, the attainment of clear margins should be considered an important surgical goal and therefore a solution should be sought that can rapidly evaluate the entire resection surface18. Thus, it is important to identify effective biomarkers which could demarcate tumor and non-tumor tissue for efficient margin clearance. This has potential implications for approaches tailored to the individual level.

In this study we identified differentially expressed genes (upregulated and downregulated) between the tumor and adjacent normal oral tissue samples which are significant in OSCC. These significant genes have the potential to be biomarkers for efficient margin clearance during surgery which can help manage the concerns of high recurrence rate of OSCC.

Materials and methods

Collection of clinical samples

Oral squamous cell carcinoma (OSCC) and their adjacent normal tissue samples were collected with patient consent and ethical approval from Zydus Cancer Hospital and Gujarat Cancer and Research Institute (GCRI), Ahmedabad, Gujarat, India in the years 2021–2024. The study was approved by the GCRI/GCS Ethics Committee-BHR numbered EC-BHR-O-10-2024 (dated 12-07-2024) and the Zydus Ethics Committee (dated 9th Jan 2021) working in accordance to ICH-GCP, the New Drugs and Clinical Trial Rules, 2019, ICMR guidelines, and other applicable regulations. The samples were kept in RNAlater and stored at -80 °C. Sample details such as gender, age, tobacco consumption habits and cancer stage were characterized during sample collection (Supplementary Table 1) with consent from all the patients.

RNA isolation

RNA isolation from all tumor and adjacent normal oral tissue samples was performed using the Qiagen RNeasy Plus Mini Kit, following the manufacturer’s protocol. Quantification of the isolated RNA was conducted by Qubit™ (Qubit™ RNA HS Assay Kit) and QIAxpert™, and its integrity was assessed using the Bioanlyzer™ (Agilent RNA 6000 Nano Kit) and QIAxcel™. Samples with lower RNA concentrations (< 10ng/µL) or compromised integrity were excluded, while the remaining samples proceeded to library preparation.

Library Preparation and RNA sequencing

After RNA isolation, libraries for RNA Sequencing were prepared using the Illumina TruSeq Stranded Total RNA kit (Illumina, CA, USA), following the manufacturer’s protocol. Ribosomal RNA (rRNA) was removed followed by RNA fragmentation and cDNA synthesis. The 3’ ends were adenylated for adapter ligation, and enriched DNA fragments were purified to create the final cDNA library. Library concentration was measured using Qubit™ (Qubit™ DNA HS Assay Kit), and integrity was assessed with the Bioanalyzer™ (Agilent High Sensitivity DNA Kit) and QIAxcel™. Samples with lower library concentration (< 5ng/µL) or compromised integrity were excluded, while the remaining samples proceeded to next-generation RNA sequencing. Sequencing was performed on the Illumina NovaSeq platform which gave an average of ~ 11 million reads per sample. Processed reads were used for transcriptomic analysis, and the raw data for all oral tumor and adjacent normal samples are available on NCBI under the BioProject PRJNA1127288. Consequently, data for 32 tumor and 27 adjacent normal samples were obtained and further processed for transcriptomic analysis.

Transcriptomic analysis

After RNA sequencing, transcriptomic analysis began with quality assessment using FastQC. Reads were mapped to the human genome (GRCh38.p13) in CLC Genomics Workbench (version 12.0.3, Qiagen Bioinformatics Licensed to GBRC). A PCA plot was generated to detect and remove outliers. Analysis was conducted in stages, including the full dataset, cancer stage-based evaluation, and individual sample comparisons.

Comprehensive transcriptomic analysis of the complete dataset

After conducting mapping and PCA plot analysis, differential expression analysis was done on the 30 tumor and 26 adjacent normal oral tissue sample groups using the CLC genomics workbench to identify the differentially expressed genes between the two groups. A volcano plot representing this analysis was generated using RStudio 1.4.1106 (R version 4.0.5 (2021-03-31)). Additionally, a heat map was created to visualize the gene expression levels.

Subsequently the data obtained from the differential sequence analysis was filtered based on log2 fold change (> 2 or <-2) and FDR p-value (≤ 0.05) to identify significantly differentially expressed upregulated and downregulated genes.

Gene ontology (GO) analysis was done for the upregulated and downregulated genes using the Funrich (3.1.3) tool19 to identify the molecular function, biological process, biological pathway, cellular component and site of expression that these genes are associated with.

A protein-protein interaction (PPI) network was constructed for the upregulated and downregulated genes using the STRING database, and K-mean clustering was done to obtain three clusters. For each of these three clusters, the top 20 nodes were identified using the cytoHubba plugin in Cytoscape (3.10.0)20.

To identify the genes that are significant in HNSC, the top 100 upregulated and downregulated genes were validated using the TCGA (The Cancer genome atlas)21 database through the GEPIA2 tool22 (cutoffs log2 fold change (> 2 or <-2) and p-value (≤ 0.05)).

Box plots were made to observe and represent the differential expression between the tumor and adjacent normal sample groups for the upregulated and downregulated genes significant in HNSC using TPM values, and the significance values were calculated using the T-test.

Additionally, ROC (Receiver Operating Characteristic) curve analysis was done for the significant upregulated and downregulated genes through MetaboAnalyst (6.0) (last accessed on 5th december 2024). These genes were then sorted based on the highest to lowest AUC (area under the curve) value obtained from the ROC curves23.

To further study the trend in the expression levels of the significant genes, the significant upregulated and downregulated genes were also sorted based on highest to lowest log2 fold change and average tumor and adjacent normal RPKM (Reads Per Kilobase of transcript per Million mapped reads) separately.

Cancer stage-specific transcriptomic analysis

The dataset included 5 normal and 8 tumor samples (stage 1), 10 normal and 9 tumor (stage 2), 1 normal and 3 tumor (stage 3), and 8 normal and 7 tumor (stage 4) (Supplementary Fig. 5). Subsequent data analysis was done based on the stage of cancer, following steps similar to those outlined in section “Comprehensive transcriptomic analysis of the complete dataset”. Gene expression analysis was conducted by determining the number of upregulated and downregulated genes at each stage and examining their log2 fold changes.

Pairwise transcriptomic analysis of individual samples

The dataset included 20 true tumor and adjacent normal sample pairs. Data analysis was done for each individual sample pair, following steps similar to those outlined in section “Comprehensive transcriptomic analysis of the complete dataset”. Gene expression analysis was performed for each pair by identifying upregulated and downregulated genes and assessing their log2 fold changes.

Selection of significant differentially expressed genes

To identify key biomarker candidates for HNSC, genes were selected based on log2 fold change, average tumor/ adjacent normal RPKM, and AUC values from ROC curve analysis. Additionally differential expression was assessed using box plots, cancer stage comparisons, and individual sample analysis. This ensured the selection of genes with strong and consistent expression patterns for reliable biomarker potential.

Experimental validation of selected differentially expressed genes via real time quantitative reverse transcription PCR (RT-qPCR)

Based on the gene expression analysis results, three genes (MMP13, MMP10, and ADAM12) were selected for RT-qPCR validation. Reverse transcription was performed using the PrimeScript™ RT reagent Kit with gDNA Eraser (Takara Biotechnology Co., Ltd.). TaqMan assays were conducted with specifically designed primers and probes using the Probe qPCR with UNG kit (Takara®), following the manufacturer’s protocol and at an annealing temperature of 60 °C. Reactions were performed on the Applied Biosystems 7500 Fast RT-PCR system, and data were analyzed using the comparative Ct (ΔΔCt) method with GAPDH as the control.

Results

Transcriptomic analysis

Transcriptomic analysis was done to identify significant differentially expressed genes that have the potential to serve as biomarkers to demarcate between tumor and non-tumor tissue.

The sequenced reads data for all the samples passed the quality check and were successfully mapped to the reference human genome (GRCh38.p13). Subsequently, the PCA plot generated for the 32 tumor and 27 adjacent normal samples in the CLC genomics workbench identified three outliers (Supplementary Fig. 1a), which were then excluded from the dataset (Supplementary Fig. 1b). All further data analysis was then performed on the remaining 30 tumor and 26 adjacent normal samples. The presence of outliers can cause discrepancies in data analysis, and their extreme values can influence the results, therefore eliminating outliers can increase accuracy in data analysis.

Comprehensive transcriptomic analysis of the complete dataset

Followed by mapping and PCA plot analysis, the differential sequence analysis conducted on 30 tumor and 26 adjacent normal sample, identified 704 genes as upregulated and 1540 genes as downregulated using the criteria of log2 fold change (> 2 or <-2) and FDR p-value (≤ 0.05). These results are graphically represented by a volcano plot (Fig. 1). Additionally, gene expression levels across all tumor and adjacent normal samples were visualized using a heat map as it shows the relative intensity of the expression values (Supplementary Fig. 2).

Fig. 1
Fig. 1
Full size image

Volcano plot created through RStudio with the cutoffs of log2 fold change > 2 or <-2 and p-value as ≤ 0.05, representing the differentially expressed genes i.e. upregulated and downregulated genes (red dots).

Further the gene ontology analysis of these upregulated and downregulated genes identified that the greatest percentage of upregulated genes were associated with biological pathways like the Integrin family of cell surface interactions (Fig. 2a). Additionally, the highest percentage of downregulated genes were associated with catalytic activity molecular function while the greatest percentage of upregulated genes were associated with transcription factor molecular function (Fig. 2b). In association to biological processes, the highest percentage of upregulated and downregulated genes were correlated with Signal transduction (Fig. 2c). Moreover, a high percentage of upregulated and downregulated genes were found to be extracellular and present in the cytoplasm, nucleus and plasma membrane cellular components (Fig. 2d). Further when analyzed for site of expression for the upregulated and downregulated genes, a high percentage of expression was observed for Head and neck cancer (Fig. 2e).

Fig. 2
Fig. 2
Full size image

Gene ontology (GO) analysis of upregulated and downregulated genes in tumour and adjacent normal tissue identified using cutoff thresholds of log2 fold change > 2 or < -2 and p-value ≤ 0.05. The graphs represent the percentage of gene expression across various GO categories: (a) Biological pathways, (b) Molecular functions, (c) Biological processes, (d) Cellular components, and (e) Expression sites.

The PPI network created for the upregulated and downregulated genes followed by clustering resulted in 3 clusters (Supplementary Fig. 3a, 4a). For the upregulated genes, cluster 1 contained 9 out of the 15 identified genes significant in HNSC, while cluster 2 and 3 contained 5 and 1 genes respectively. Whereas for downregulated genes, cluster 1 contained 7 out of the 9 identified genes significant in HNSC and cluster 2 contained 2 genes. The PPI networks represent the physical interactions between proteins, while K-means clustering is used to group datasets into distinct clusters based on similarity. For each identified cluster, the top 20 nodes were analyzed using CytoHubba, which revealed that cluster 1 of upregulated genes and cluster 3 of downregulated genes contained a significant number of genes associated with HNSC (Supplementary Fig. 3b, 4b). Cytohubba aids in predicting and exploring significant nodes in a network and is used to identify hub genes.

Furthermore, the validation of the identified top 100 upregulated and downregulated genes, performed using the TCGA database through the GEPIA2 tool (Supplementary table. 2a, 2b), showed that 15 upregulated and 9 downregulated genes were significant in HNSC (Table 1). These findings were illustrated through box plot representations (Figs. 3 and 4). Notably, HNSC was chosen for validation studies as GEPIA and TCGA databases categorize data under HNSC instead of OSCC, which accounts for 90% of HNSC cases. The TCGA database, a significant cancer genomics initiative, contains more than 20,000 cancer and matched normal samples from 33 different cancer types. Similarly, the GEPIA2 tool, used for analysing sequencing expression data, contains 9736 tumor and 8537 normal samples data from TCGA and GTEX projects.

The box plots for the 15 upregulated (Fig. 3) and 9 downregulated (Fig. 4) genes that were significant in HNSC showed substantial differential expression between the tumor and adjacent normal groups. Significance calculated using the T- test showed that all these genes were significant with a p-value ≤ 0.05 (Figs. 3 and 4).

Fig. 3
Fig. 3
Full size image

Representative box plot showing differential upregulated gene expression between tumor and adjacent normal tissue with p-value ≤ 0.05. For each of the significant upregulated genes, box plots labelled ‘a’ are of TCGA database and box plots labelled ‘b’ are of the data from the current study.

Fig. 4
Fig. 4
Full size image

Representative box plot showing differential downregulated gene expression between tumor and adjacent normal tissue with p-value ≤ 0.05. For each of the significant downregulated genes, box plots labelled ‘a’ are of TCGA database and box plots labelled ‘b’ are of the data from the current study.

Subsequently, the ROC curves generated for the significant upregulated and downregulated genes revealed that most had an AUC closer to 1 while the genes MMP1, IL24 and ADAM12 achieved a perfect AUC of 1 (Table 1). Representative ROC curves for the significant upregulated and downregulated genes are shown in Fig. 5. ROC curve analysis is generally considered the standard method for performance assessment in biomarker studies and the closer the AUC is to 1, the better is the utility of the biomarker. An AUC value of 1, or closer to it, indicates a high true positive rate and a low false positive rate.

Fig. 5
Fig. 5Fig. 5
Full size image

Representative ROC curves of the significant upregulated and downregulated genes in tumor and adjacent normal tissues and their AUC values.

Table 1 The table presents a list of 15 upregulated and 9 downregulated genes, identified as significant in HNSC and ranked within the top 100 upregulated and downregulated genes. The upregulated genes MMP1, MMP10, IL24, MMP13 and downregulated genes PIP, TFF3, CRISP3 had the highest log2 fold change, the upregulated genes MMP1, IL24, ADAM12, MMP10 and downregulated genes ADH1B, FAM3B, KRT4 had the highest AUC values, and the upregulated genes MMP1, ISG15, MMP3, MMP13 and downregulated genes KRT4, PIP, PIGR showed the highest RPKM levels (highlighted in bold).

To further examine the expression trends of significant genes, upregulated and downregulated genes were identified based on their log2 fold change, average tumor/adjacent normal RPKM, and AUC values. The results revealed that the upregulated genes MMP1, MMP10, IL24, MMP3, ISG15, ADAM12, MMP13 and the downregulated genes PIP, TFF3, CRISP3, ADH1B, PIGR, KRT4, FAM3B had the highest log2 fold change, AUC value and avg. tumor/ adjacent normal RPKM (Table 1).

Cancer stage-specific transcriptomic analysis

The total number of upregulated and downregulated genes for each of the four stages has been mentioned in the Supplementary table. 3a. Additionally, when examining the log2 fold change of the significant upregulated and downregulated genes for each of the four stages, all of the upregulated genes were found to have a significant log2 fold change in all four cancer stages and the downregulated genes exhibited significant log2 fold change across most cancer stages (Supplementary table. 3b).

Pairwise transcriptomic analysis of individual samples

The number of upregulated and downregulated genes for each of the 20 sample pairs has been mentioned in the Supplementary table. 4a. Furthermore, upon studying the log2 fold change of the significant upregulated and downregulated genes for each of the 20 sample pairs, many of these upregulated and downregulated genes showed significant log2 fold changes across most individual sample pairs (Supplementary table. 4b).

Selection of significant differentially expressed genes

The upregulated genes were selected as potential biomarkers and the selection was based on the key parameters. MMP1 emerged as a standout candidate with the highest log2 fold change, average tumor RPKM, and AUC value. Similarly, MMP10 and IL24 showed both one of the highest log2 fold change and AUC value. MMP13 was notable for its high log2 fold change and average tumor RPKM. ADAM12 displayed the highest AUC value, while MMP3 and ISG15 demonstrated one of the highest average tumor RPKM. Additionally, as observed from the box plots (Figs. 3 and 4), these genes also demonstrated significant differential expression between the tumor and adjacent normal groups. Along with this, they also showed significant log2 fold change across all four cancer stages and most individual sample pairs, as detailed in the analysis based on cancer stage and sample pairs (Supplementary table. 3b, 4b). Collectively, these genes represent strong candidates for use as tumor-specific potential biomarkers.

Experimental validation of selected differentially expressed genes via real time quantitative reverse transcription PCR (RT-qPCR)

Significant differences in the expression levels of MMP13, MMP10, and ADAM12 were observed between tumor and adjacent normal tissue samples, as illustrated in the box plots (Fig. 6). These differences were statistically significant, with p ≤ 0.05, as determined by the T-test. Furthermore, a comparison between the log₂ fold changes obtained from transcriptome analysis and RT-qPCR data demonstrated consistent and significant expression trends for all three genes. Specifically, RT-qPCR revealed log₂ fold changes of 5.08 for MMP13, 5.04 for ADAM12, and 3.99 for MMP10, while transcriptomic analysis showed corresponding values of 6.68, 4.90, and 7.25, respectively.

Fig. 6
Fig. 6
Full size image

Box plots representing differential expression of MMP13, MMP10 and ADAM12 genes based on RT-qPCR data with the p-value of ≤ 0.05.

Discussion

Based on the above findings, we further explored the study’s relevance and the biological implications of the identified significant genes as potential biomarkers. Numerous studies have focused on transcriptional dysregulation and disease-specific signatures in oral squamous cell carcinoma (OSCC) to explore carcinogenesis, prognostic indicators, and potential molecular targets24. Incomplete surgical excision of the primary tumor is a key factor affecting patient survival25. When recurrences are treated surgically, challenges arise due to the potential spread of microscopic tumor cells beneath normal mucosa or at a distance from the initial tumor site, complicating the determination of clear surgical margins26.

In this study we identified significant differentially expressed genes that can serve as potential molecular biomarkers for demarcating tumor from non-tumor tissues. This is critical because tissue beyond clinically determined margins may appear morphologically normal yet possess molecular alterations characteristic of OSCC, contributing to recurrence, which remains a major challenge. The biomarkers identified in this study by tumor versus adjacent normal comparison could support more effective margin determination by enabling molecular assessment of tissues beyond the planned surgical boundary. If these markers show elevated expression in “normal appearing” tissue, the surgical margin could be adjusted accordingly.

Through differential expression analysis we identified the 704 upregulated and 1540 downregulated genes. Upon gene ontology analysis it was revealed that the greatest percentage of upregulated genes were associated with biological pathways like the Integrin family of cell surface interactions (Fig. 2a). Integrins are cell surface receptors and transmembrane plasma membrane proteins. They are made of α- and β-chains and activate focal adhesion kinase (FAK) upon ECM (extracellular matrix) protein binding. A study showed the expression of αV, β1, β3, β5, β6, FAK and pFAK (phosphorylated-FAK) integrins as prognostic predictors in OSCC patients. It highlighted that the pFAK-positive OSCC patients had a decreased overall survival rate as compared to the negative group. A study identified αV, β1, β3, β5, β6, FAK, and pFAK integrins as prognostic markers in OSCC. pFAK-positive patients had lower overall survival, highlighting integrin β8 and pFAK as potential diagnostic and therapeutic targets for oral cancer27. In OSCC, there is a loss or reduction expression of alpha 6 beta 4 integrins and de novo expression of alpha v beta 6 integrins which relates to basement membrane protein loss and may be essential in tumor cell migration respectively28. In association to biological processes, the highest percentage of upregulated and downregulated genes were correlated with Signal transduction (Fig. 2c). In the progression and development of cancer, cell signal transduction is a fundamental process. Several key signal transduction pathways are dysregulated in cancer cells, including NF-κB/nuclear factor kappa beta, p53, protein kinase B/AKT, mammalian target of rapamycin/mTOR, β-catenin, c-Myc29, and JAK/STAT pathways30. STAT3 is involved in key aspects of cancer metastasis, such as invasion, migration, and angiogenesis31. A critical step in the metastatic process is the ability of cancer cells to invade the extracellular matrix, a process facilitated by the regulation of matrix metalloproteinases (MMPs)32. Also, the IL-6/STAT3 signalling pathway enhances the expression of MMPs such as MMP1, MMP2, MMP7, and MMP9 by directly binding to their promoter regions, a mechanism observed in various aggressive cancer types33.

Additionally, the highest percentage of downregulated genes were associated with catalytic activity molecular function while the greatest percentage of upregulated genes were associated with transcription factor molecular function (Fig. 2b). Transcription factors (TFs) are proteins that control gene expression and are regularly varied in oral cancer. Some TFs are correlated with oral cancer such as overexpression of c-myc34, c-Jun and Fra-135 are associated with poor prognosis and poorly differentiated tumors, abnormal activation and expression of TEAD is correlated to development of OSCC36 and low expression of FOXO1 and HBP1 is linked with oral tumor invasiveness37. In context to cellular component, highest percentage of upregulated and downregulated genes were found to be extracellular and present in the cytoplasm, nucleus and plasma membrane (Fig. 2d). This enhances their detectability, diagnostic relevance and facilitates in identifying druggable targets especially extracellular and membrane proteins38. Further when analyzed for site of expression for the upregulated and downregulated genes, a high percentage of expression was observed for Head and neck cancer (Fig. 2e).

We further identified genes significant in HNSC from the top 100 upregulated and downregulated genes and selected those with the highest potential as effective biomarkers for demarcating tumor tissues from non-tumor tissues to facilitate efficient margin clearance. This analysis highlighted MMP1, MMP10, MMP13, MMP3, ADAM12, IL24, and ISG15 as key candidates. These genes, exhibiting significantly elevated expression in oral tumor samples, are promising detectable biomarkers indicative of tumor presence, particularly those localized to the cell’s outer regions.

Furthermore, gaining insights into the functions of these significantly upregulated genes is essential for understanding the implications of their deregulation. Significantly upregulated genes like IL24, ISG15 and ADAM12 have been found to play a significant role in oral cancer. ADAM12 (ADAM Metallopeptidase Domain 12) that is a part of metalloproteinase and disintegrin family of membrane related metalloproteinases was found to have one of the highest AUC values (1.0) suggesting its significant upregulation in oral tumor samples and its potential to be an effective biomarker in HNSC studies. As supported by findings from other studies, ADAM12 is associated with numerous malignant tumors. Its overexpression in OSCC has been shown to enhance cellular proliferation, metastasis, and invasion, suggesting its potential role as an oncogene39 and its function may also be related with TGF-β signalling40. Whereas IL24 (Interleukin-24) with one of the highest log2 fold change (6.69) and AUC value (1.0) in this study (Table 1) has been found to induce apoptosis in cancer cells with minimal effect on normal cells in previous studies. It has also been suggested that by inhibiting tumor invasion, growth and angiogenesis, higher expression of IL24 can be correlated with better prognosis in OSCC, and it can be a capable target for cancer therapy41. Moreover, studies have indicated that ISG15 (Interferon-stimulated gene 15) is often upregulated in oral cancer and contributes to cancer cell invasion, migration and metastasis playing a pro-tumorigenic role. This correlates to poor prognosis in OSCC and can be considered a possible biomarker for disease progression42. Even in this study ISG15 was found to have one of the highest tumor RPKM values (Table 1). Furthermore, Matrix Metallopeptidase genes such as MMP1, MMP10, MMP13 and MMP3 that encode a member of the peptidase M10 family of matrix metalloproteinases (MMPs) showed significant upregulation in the oral tumor samples (Table 1). MMP1 especially emerged as a standout candidate with the highest log2 fold change (7.90), average tumor RPKM (520.55), and AUC value (1.0) among all the identified significantly upregulated genes (Table 1). Other studies have shown that MMPs are overexpressed in HNSC, and their expression has been associated with tumorigenic hallmarks like cell proliferation, metastasis, angiogenesis, and invasion43. MMP10 plays a crucial role in metastasis and invasion of HNSC and is partly associated with the p38 MAPK inhibition44 and can be used as a marker for metastasis prediction. It is expressed in tissues of OSCC and can serve as prognostic indicators45. Moreover, the overexpression of MMP1 and MMP13 has been associated with the aggressive behaviour of OSCC, facilitating the invasion of nearby bone in patients with OSCC46. Also, the presence of MMP1 and MMP3 in saliva could serve as a non-invasive biomarker for the early detection of oral cancer47. Additionally, through GO analysis it was identified that MMP1 along with other MMPs was mainly associated with molecular function of metallopeptidase activity, biological pathway of Beta1 integrin cell surface interactions and protein metabolism biological process (Fig. 2). This study highlights the significantly upregulated genes MMP1, MMP10, MMP13, MMP3, ADAM12, IL24, and ISG15 as potential molecular biomarkers for demarcating tumor from non-tumor tissues, thereby improving the efficiency of margin clearance. These genes demonstrated the highest log2 fold change, tumor RPKM, AUC values and significant differences in expression levels as validated through RT-qPCR, underlining their relevance for assessing surgical margins at the molecular level.

However, validation in actual surgical margin specimens or clinically relevant borderline tissues has not yet been performed, primarily due to their limited access. Additionally, the study does not yet provide direct evidence supporting translation of these findings into actual clinical application, including the development of rapid intraoperative molecular assays.

Future studies in the field could therefore address these gaps by validating these biomarkers at the protein level, assessing their performance directly in surgical margin and borderline tissues, and developing rapid, clinically applicable detection methods that can advance their translation toward practical intraoperative utility.

Conclusion

OSCC is a prevalent and challenging malignancy, especially in India, with high recurrence despite treatment advances. This study highlights the need for reliable biomarkers to demarcate tumor from non-tumor tissue. Through transcriptomic analysis, key differentially expressed genes (DEGs) were identified, including MMP1, MMP10, MMP13, MMP3, ADAM12, IL24, and ISG15 that showed the strongest potential for accurate tissue demarcation. These findings with further exploration pave the way for clinically applicable tools to potentially improve surgical margin evaluation, reduce recurrence rates, and enhance patient outcomes, supporting precision surgery in OSCC treatment.