Screening of common genomic biomarkers to explore common drugs for the treatment of pancreatic and kidney cancers with type-2 diabetes through bioinformatics analysis

Ajadee, Alvira; Mahmud, Sabkat; Sarkar, Arnob; Noor, Tasfia; Ahmmed, Reaz; Haque Mollah, Md. Nurul

doi:10.1038/s41598-025-91875-3

Download PDF

Article
Open access
Published: 02 March 2025

Screening of common genomic biomarkers to explore common drugs for the treatment of pancreatic and kidney cancers with type-2 diabetes through bioinformatics analysis

Alvira Ajadee¹,
Sabkat Mahmud¹,
Arnob Sarkar^1,2,
Tasfia Noor³,
Reaz Ahmmed^1,2 &
…
Md. Nurul Haque Mollah¹

Scientific Reports volume 15, Article number: 7363 (2025) Cite this article

4525 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Type 2 diabetes (T2D) is a crucial risk factor for both pancreatic cancer (PC) and kidney cancer (KC). However, effective common drugs for treating PC and/or KC patients who are also suffering from T2D are currently lacking, despite the probability of their co-occurrence. Taking disease-specific multiple drugs during the co-existence of multiple diseases may lead to adverse side effects or toxicity to the patients due to drug-drug interactions. This study aimed to identify T2D-, PC and KC-causing common genomic biomarkers (cGBs) highlighting their pathogenetic mechanisms to explore effective drugs as their common treatment. We analyzed transcriptomic profile datasets, applying weighted gene co-expression network analysis (WGCNA) and protein-protein interaction (PPI) network analysis approaches to identify T2D-, PC-, and KC-causing cGBs. We then disclosed common pathogenetic mechanisms through gene ontology (GO) terms, KEGG pathways, regulatory networks, and DNA methylation of these cGBs. Initially, we identified 78 common differentially expressed genes (cDEGs) that could distinguish T2D, PC, and KC samples from controls based on their transcriptomic profiles. From these, six top-ranked cDEGs (TOP2A, BIRC5, RRM2, ALB, MUC1, and E2F7) were selected as cGBs and considered targets for exploring common drug molecules for each of three diseases. Functional enrichment analyses, including GO terms, KEGG pathways, and regulatory network analyses involving transcription factors (TFs) and microRNAs, along with DNA methylation and immune infiltration studies, revealed critical common molecular mechanisms linked to PC, KC, and T2D. Finally, we identified six top-ranked drug molecules (NVP.BHG712, Irinotecan, Olaparib, Imatinib, RG-4733, and Linsitinib) as potential common treatments for PC, KC and T2D during their co-existence, supported by the literature reviews. Thus, this bioinformatics study provides valuable insights and resources for developing a genome-guided common treatment strategy for PC and/or KC patients who are also suffering from T2D.

Gene expression profiling and protein–protein network analysis revealed prognostic hub biomarkers linking cancer risk in type 2 diabetic patients

Article Open access 18 December 2023

A bioinformatics-driven approach to identify biomarkers and elucidate the pathogenesis of type 2 diabetes concurrent with pulmonary tuberculosis

Article Open access 15 May 2025

Bioinformatics analysis to disclose shared molecular mechanisms between type-2 diabetes and clear-cell renal-cell carcinoma, and therapeutic indications

Article Open access 19 August 2024

Introduction

Type-2 diabetes (T2D) is a chronic metabolic disorder that is gradually increasing worldwide¹. The International Diabetes Federation (IDF) estimated that there would be around 629 million adult diabetes patients worldwide by 2045². A population-based study reported that the prevalence of diabetes in all age groups is 2.8% in 2000 and is projected to increase to 4.4% by 2030³. It is characterized by β-cell dysfunction, excessive glucose production from the liver and insulin resistance which impairs the ability of glucose to bind with insulin in blood⁴. T2D leads to serious health complications due to insulin resistance and hyperinsulinemia. It is also associated with obesity, cardiovascular disease, and cancers, including kidney cancer (KC)^5,6 and pancreatic cancer (PC)^7,8. Some studies showed that T2D stimulates about 80% of PC patients⁹ and 40% of KC patients¹⁰. Pancreatic cancer (PC) remains one of the most difficult cancers to diagnose and treat. In 2018, it was the 7th leading cause of cancer-related deaths worldwide, with approximately 466,000 deaths and a 5-year survival rate of just 10%¹¹. Clear-cell renal cell carcinoma (ccRCC) is one of the most prevalent cancers worldwide. Kidney cancer (KC) includes various types, with renal cell carcinoma (RCC) being the most common. ccRCC, a subtype of RCC, accounts for approximately 70–80% of all kidney cancers¹². It had the 17th highest cancer-related mortality in 2018 with 175,098 deaths worldwide¹³. In 2020, the death rate of KC patients was around 42%¹⁴. By 2030, pancreatic cancer (PC) is projected to become the second leading cause of cancer-related deaths¹⁵, while KC is expected to be the 10th most common cancer¹⁶. In PC, hyperinsulinemia associated with T2D elevates IGF-1 levels, which upregulates IGF-1R expression on pancreatic cells. This activation triggers cellular proliferation, suppresses apoptosis, and promotes genomic instability, fostering genetic mutations that may lead to cancer development^17,18,19. Similarly, for KC, hyperinsulinemia stimulates IGF-1 production which developed renal cell proliferation and inhibits apoptosis¹⁹. Additionally, chronic inflammation and oxidative stress associated with T2D contribute to DNA damage, further increasing the risk of cancer development^18,20. Thus, a schematic diagram about the link of PC and/or KC with T2D is given in Fig. 1.

Mainly, PC can disrupt insulin regulation, leading to insulin resistance and hyperinsulinemia, which contribute to the development of T2D. Once T2D is established, chronic hyperinsulinemia and elevated IGF-1 levels promote cellular proliferation and inhibit apoptosis in various tissues, including the kidneys. Additionally, T2D is associated with chronic inflammation and oxidative stress, which cause DNA damage and genomic instability in renal cells. This environment fosters the development of KC, creating a cascade where pancreatic cancer leads to T2D, which in turn increases the risk of kidney cancer²¹. Doctors often prescribe disease-specific medications for patients with multiple conditions²², which can lead to polypharmacy and potential drug-drug interactions (DDIs) causing adverse effects or toxicity^23,24,25. To mitigate this risk, it is preferable to prescribe a smaller number of common drugs that effectively address the multiple conditions. However, no study has yet proposed a common drug for patients with prostate cancer (PC) and/or kidney cancer (KC) who also suffer T2D. This bioinformatics-based study aims to: (i) identify common genomic biomarkers (cGBs) associated with T2D, PC, and KC, highlighting their shared pathogenetic mechanisms, and (ii) find cGBs-guided repurposable drugs for treating PC and/or KC in T2D patients.

Methodology

To explore repurposable drugs for treating PC and KC in patients with T2D, it is essential to identify common genomic biomarkers (cGBs) that can serve as drug targets. However, selecting top-ranked cGBs and potential therapeutic agents from numerous alternatives solely through wet-lab experiments is challenging due to the time, effort, and cost involved. To address these challenges, bioinformatics analysis plays a crucial role in streamlining and enhancing the drug discovery process. Transcriptomics profile analysis through bioinformatics tools is a popular approach to detect disease-causing or genomic biomarkers (GBs) as the targets of drug molecules^{26,27,28,29,30,31,32,33,34}. The detailed methodology of this study is given in the following subsections 2.1–2.8.

Data source and descriptions

We considered gene-expression profiles for exploring T2D-, KC- and PC-causing cGBs, as well as meta-drugs to identify common drug molecules for these three diseases. It should be noted here that a group of drugs is considered as meta-drugs, which are already recommended for T2D, KC, or PC by individual studies.

Transcriptomics profiles collection (case/control)

There are several individual studies in the literature that explored multiple diseases causing cGBs from different independent transcriptomics datasets^35,36,37,38. In this study, we used three independent transcriptomic datasets from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), GSE36895³⁹ for KC, GSE16515⁴⁰ for PC, and GSE76896⁴¹ for T2D. Where, GSE36895 includes 29 KC and 23 controls, GSE16515 includes 36 PC and 16 controls, and GSE76896 includes 55 T2D and 116 controls. Datasets were carefully selected based on their larger sample sizes in both case and control groups, ensuring statistical robustness and reliable results. Priority was given to datasets generated using the same platform, specifically the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570), to maintain uniformity in data quality and experimental design. The inclusion of larger sample sizes helped reduce variability and enhance the power to detect true differences in gene expression between case and control groups, thereby supporting the identification of meaningful biomarkers and molecular mechanisms.

Collection of meta-drugs

A total of 110 KC-associated meta-drug agents (Table S1), 103 PC-associated meta-drug agents (Table S2), and 224 T2D-associated meta-drug agents (Tables S3) were collected to identify the potential common drug agents for each of T2D, KC, and PC. Specifically, the drug data were obtained from peer-reviewed published articles (Table S1, S2 & S3) and reliable online databases, including Drug Bank⁴², the National cancer institute⁴³, and Drug.com⁴⁴. Also, we individually selected the top-ranked 10 publicly available T2D -causing KGs (S4 Table), PC -causing KGs (S5 Table), and KC-causing KGs (S6Table) by the literature review to verify the performance of the proposed candidate drug agents via molecular docking analysis against the independent receptors.

Identification of DEGs by weighted gene Co-expression network analysis (WGCNA)

At first differentially expressed genes (DEGs) between cases (T2D/PC/KC) and control groups were identified by using “CEMiTool”⁴⁵ from three datasets: GSE36895 for KC, GSE16515 for PC and GSE76896 for T2D. The detail discussion can be found in our previous study about how CEMiTool provides DEGs between disease and control groups⁴⁶. Then weighted gene co-expression network analysis (WGCNA)⁴⁷ technique was used for further filtering of those DEGs by removing the clusters (modules) of less correlated DEGs. Module-trait relationships were determined by computing the Pearson correlation coefficient between module eigengenes (MEs) and traits. In this study, disease status was treated as a clinical trait, where each disease group—T2D, PC, and KC was encoded using a binary classification system. Specifically, a value of “1” was assigned to individuals in the disease group, while a value of “0” was assigned to the control/ non-disease group. For example, for T2D, “T2D = 1” represented patients diagnosed with the disease, and “T2D = 0” represented healthy controls. The same approach was applied to PC and KC. Modules with significant correlations (|r| ≥ 0.6, p-value < 0.001) were selected for further analysis. We detected differentially expressed gene-set (DEGs-set) by combining all genes from all significant modules for each of T2D, PC and KC, separately. Subsequently, we separated the up- and down-regulated DEGs by satisfying the criterion alog₂FC_i > 1 and alog₂FC_i < -1, respectively, where alog₂FC values indicates average of log₂ fold-change values and is computed as.

$$aLo{g_2}F{C_g}=\left\{ {\begin{array}{*{20}{c}} {\frac{1}{{{n_1}}}\mathop \sum \limits_{i}^{{{n_1}}} Lo{g_2}(z_{{gi}}^{D}) - \frac{1}{{{n_2}}}\mathop \sum \limits_{j}^{{{n_2}}} Lo{g_2}(z_{{gj}}^{C}~),~~~~~~if~{n_1} \ne {n_2}} \\ {\frac{1}{n}\mathop \sum \limits_{i}^{n} Lo{g_2}\left( {\frac{{z_{{gi}}^{D}}}{{z_{{gj}}^{C}}}} \right),~~~~~~~~~~~~~~~if~~{n_1}={n_2}~=n~~~~~~~~~~~~~~~~~~~} \end{array}} \right.$$

(1)

Here $\:{z}_{gi}^{D}$ and $\:{z}_{gj}^{C}$ are the responses/expressions for the gth gene with the ith disease and jth control samples, respectively.

Identification of common differentially expressed genes (cDEGs)

We identified shared up- and down-regulated DEGs separately that were common across three datasets (GSE36895 for KC, GSE16515 for PC, and GSE76896 for T2D). Then we combined shared up- and down-regulated DEG-sets to create a unified set of common differentially expressed genes (cDEGs) among T2D, KC, and PC.

Local genetic association among T2D, PC and KC through cDEGs

Although the average log2 fold change (aLog2FC) values were calculated for each of T2D, PC, and KC using independent datasets as per Eq. 1, these values were derived from the same set of cDEGs across T2D, PC, and KC. A cDEG is considered upregulated for two or more diseases if aLog2FC > 0 and downregulated if aLog2FC < 0. Assuming that a gene functions similarly across individuals, the genetic association between any two diseases, A and B, can be assessed using their aLog2FC values corresponding to the cDEGs through Pearson’s correlation coefficient, defined as:

$${r_{ab}}=~\frac{{\sum \left( {{a_i} - \bar {a}} \right)\left( {{b_i} - \bar {b}} \right)}}{{\sqrt {\sum {{({a_i} - \bar {a})}^2}{{\left( {{b_i} - \bar {b}} \right)}^2}} }}$$

(2)

where, $\:{a}_{i}={\text{a}Log}_{2}{\text{F}\text{C}}_{i}\left(X\right)$ and $\:{b}_{i}={\text{a}Log}_{2}{FC}_{i}\left(Y\right)$ are the aLog₂FC values of the i^th gene for the two diseases A and B, respectively; $\:\stackrel{-}{a}\:$and $\:\stackrel{-}{b}\:$are the means of $\:{a}_{i}{\prime\:}s$ and $\:{b}_{i}{\prime\:}s$, respectively.

Identification of three disease-causing common genomic biomarkers (cGBs)

To explore common genomic biomarkers (cGBs), an online database and analysis tool STRINGv11.5 (https://string-db.org/) was used to create the protein-protein interaction (PPI) network of cDEGs. The network was visualized using the Cytoscape3.10.2 software (https://manual.cytoscape.org/en/3.10.2/)⁴⁸. The CytoHubba plugin in Cytoscape⁴⁹ was employed to identify cGBs by applying six topological criteria: Closeness, Degree, EPC, MCC, MNC and DMNC. During the analysis, we identified genes that ranked highly across all six measures, as these represent nodes of critical importance to the network’s structure and function. Only genes that demonstrated significant scores in all six measures were considered key candidates. This integrative approach ensured the robustness of the selection process, as it minimized bias from any single metric and highlighted genes that are universally central to the network.

Verification of association of cGBs with cGBs and T2D, PC, and KC using independent datasets and databases

To verify the association of cGBs with T2D, PC and KC through the independent datasets and databases, we performed disease- cGBs interaction analysis and expression analysis cGBs with T2D, PC and KC as discussed in the following subsections 2.6.1–2.6.2.

Disease-cGBs interaction analysis

We considered GeneCodis 4⁵⁰ web tool based DisGeNET database⁵¹ to perform disease- cGBs enrichment analysis for exploring the association of cGBs with different diseases including differents T2D, KC and PC.

Expression analysis of cGBs with T2D, PC and KC based on independent datasets

The differential expression patterns of cGBs in T2D, PC, and KC were verified using box plot analysis with independent datasets from NCBI. We used the The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) databases through the GEPIA2 web tool (http://gepia2.cancer-pku.cn/)⁵² to confirm the differential expression of cGBs between KC/PC and control samples. For these analyses, the cutoff thresholds were set at a p-value of 0.01 and a log2FC of 1. For validating the differential expression of cGBs between T2D and control samples, we used the independent dataset GSE15932. Box plots were constructed to compare cGBs expression between T2D/KC/PC and control groups. Additionally, we developed a prediction model based on random forest (RF) using three independent expression profiles (GSE36895 for KC, GSE16515 for PC, and GSE76896 for T2D) from the NCBI database and evaluated the predictive performance using ROC curves generated with the R-package “ROCR” https://www.rdocumentation.org/packages/ROCR/versions/1.0-11⁵³.

Disclosing common pathogenetic mechanisms of PC and KC with T2D

In order to disclosing common pathogenetic mechanisms of PC and KC with T2D, functional enrichment analysis with gene ontology (GO)-terms and Kyoto Encyclopedia of Genes and Genomes (KEGG)-pathways, regulatory network analysis with transcription factors (TFs) and microRNAs and DNA methylation analysis were performed as discussed in the following subsections 2.6.1–2.6.3.

Regulatory network analysis of cGBs

We conducted regulatory network analysis with transcription factors (TFs) and micro-RNAs (miRNA) to investigate the regulators of cGBs. In order to determine the primary TFs connected with cGBs, we analyzed the TFs-cGBs connection network with JASPAR database⁵⁴. By examining the links between miRNA and cGBs using the TarBase⁵⁵ databases, it was possible to identify the significant miRNAs that have an impact on cGBs at the post-transcriptional stage. NetworkAnalyst⁴⁷ was used to replicate these interactions. The post-transcriptional regulators of cGBs were selected from top-ranked miRNAs. We used Cytoscape⁵⁶ to visualize the networks of their interactions.

The cGBs -set enrichment analysis with GO-terms and KEGG-pathways

The Gene Ontology (GO) project is a bioinformatics tool that uses domain-specific ontologies to provide a complete source of functional data on gene products and descriptions of activities⁵⁷. To investigate the Gene Ontology and KEGG pathway of cGBs, we considered GeneCodis 4⁵⁸ database, and a P-value of 0.03 was chosen as the threshold.

DNA methylation analysis of cGBs in PC and KC

DNA methylation of cGBs involves adding methyl groups to their DNA, influencing gene expression and regulation. Statistically, analyzing these methylation patterns helps to identify significant changes in gene activity associated with diseases, providing insights into gene regulation mechanisms and contributing to understanding disease processes. MethSurv web tools (https://biit.cs.ut.ee/methsurv/) with TCGA-KIRC methylation data were used to investigate DNA methylation, a complex epigenetic process that controls gene expression in both normal and malignant cells⁵⁹. DNA methylation values (ranging from 0 to 1) were represented by β values, which were computed as M/ (M + U + 100) for every CpG site. The intensities of methylation and unmethylation are represented by M and U, respectively. We classified the methylation levels into two groups based on the difference in methylation β value between the cut-off point and higher (methylation β value above the cut-off point) in order to assess the impact on patient survival. One can use data quantiles or the means to get the grouping cut-off point.

The cGBs -guided drug repurposing

To explore cGBs-guided repurposable common drug molecule for T2D, PC and KC, we performed molecular docking and ADME/T analysis as discussed in the subsections 2.8.1–2.8.2.

Exploring candidate drugs by molecular Docking

To explore potential repurposable drug molecules, we performed molecular docking between six cGBs and their top two associated transcription factors (TFs) as target receptors and repurposable drug molecules using AutoDock Vina⁶⁰. To identify cGBs-guided drug molecules, we gathered 434 candidate molecules from published articles and online databases related to T2D, PC, and KC, as detailed in Tables S1, S2 & S3. Receptor proteins’ 3D structures were obtained from Protein Data Bank⁶¹ and AlphaFold databases⁶². All 434 T2D, PC and KC-related meta-drug candidates’ 3D structures were taken from the PubChem database⁶³. Following this, the binding affinity scores (in kcal/mol) between receptors and ligands (drug molecules) were determined through molecular docking. The receptor proteins were arranged in descending order based on the average binding affinity scores for each row, while the drug candidates were ranked according to the average scores in each column of the score matrix. This method allowed for the selection of the top-ranked drug molecules for further analysis.

ADME/T analysis

ADME/T analysis evaluates a drug candidate’s absorption, distribution, metabolism, excretion, and toxicity to predict its safety and efficacy. In drug repurposing, a compound may demonstrate strong binding to a new target during docking studies, indicating potential for a new therapeutic application. However, if it does not meet ADMET criteria—such as poor absorption, quick metabolism, or high toxicity—it is unlikely to succeed in its new role. By filtering out unsuitable candidates, ADMET analysis ensures that only those with favorable pharmacokinetic and safety profiles progress further in the repurposing process. We analyzed the drug-like properties and ADME/T (absorption, distribution, metabolism, excretion, and toxicity) profiles of the top six ranked drug compounds to better understand their structural features and chemical descriptors. The SCFBio web application⁶⁴ was used to evaluate compliance with Lipinski’s rule. ADME/T parameters were then predicted using the online databases SwissADME⁶⁵ and pkCSM⁶⁶, utilizing the optimal structures of the drug compounds in SMILES format for the calculations.

Results

Identification of DEGs by weighted gene Co-expression network analysis (WGCNA)

Differentially expressed genes (DEGs) for T2D, PC, and KC were identified using the “CEMiTool” and “WGCNA” (Weighted correlation network analysis) approaches across three datasets: GSE9348, GSE121248, and GSE76896. Initially, the “CEMiTool” analysis identified 4,589 DEGs from GSE16515, 4,233 DEGs from GSE16515, and 7,548 DEGs from GSE76896. These DEGs were further refined using the “WGCNA” approach. For each filtered gene expression matrices, a soft threshold β value was chosen (14 for GSE76896, 11 for GSE16515 and 15 for GSE36895) based on a cutoff R² value of 0.85 (Figure S1). Following that, some modules were found by hierarchical clustering with the minimal module size 30. To merge the modules, cut height of module eigengene was set 0.15 for GSE76896, 0.09 for GSE16515 and 0.13 for GSE36895 (Figure S2). To uncover the relationship between modules and clinical traits (KC/PC/T2D and non-KC/non-PC/non-T2D samples and control), We selected six modules from GSE76896, four modules from GSE16515 and six modules from GSE36895 based on the module-trait relationship (correlation greater than 0.6 or less than − 0.6 along with p-value < 4e-08) (Figure S3).

Identification of common differentially expressed genes (cDEGs)

We identified a total of 85 common differentially expressed genes (cDEGs), including 55 upregulated and 30 downregulated genes, across three comparisons: control vs. T2D, control vs. PC, and control vs. KC. These cDEGs were visualized using a Venn diagram, as shown in Fig. 2 (see also Table S7). The Venn diagrams were made using the Venn Web Tool at https://bioinformatics.psb.ugent.be/webtools/Venn/. DEGs-sets for each disease were pasted into the tool, and it generated the diagram showing overlaps and differences between the sets. The diagram was then customized and downloaded for further use.

Local genetic association among T2D, PC and KC through cDEGs

To understand the link of T2D with PC and KC, we computed pairwise local correlation coefficients using Eq. 2 for T2D, PC and KC based on the aLog₂FC values of cDEGs (see Fig. 2C; Table 1 and Table S8). The correlation coefficient between each pair of the three diseases was ≥ 0.83 (see Table 1), indicating that T2D, PC and KC are locally associated with each other through the expressions of cDEGs.

Table 1 _{Local association among T2D, PC and KC through cDEGs}.

Full size table

Identification of three disease-causing common genomic biomarkers (cGBs)

The protein-protein interaction (PPI) network of cGBs was built that consist of 76 nodes and 459 edges. We select the top-ranked 6 cGBs (ALB, MUC1, TOP2A, BIRC5, RRM2 and E2F7) based on six topological methods with threshold Degree = 20, Closeness = 46.16, EPC = 14.35, MNC = 20, Betweenness = 245.0877 and Radiality = 3.37 in the PPI network. cGBs that ranked highly across all six measures were identified as key candidates, ensuring their critical importance to the network. This integrative approach minimized bias and highlighted genes central to the network’s structure and function. Here, six cGBs were upregulated and one cGBs was downregulated. Scores of six topological measures in six cGBs were shown in Table S9. And the graphical representation of PPI-network was displayed in Fig. 3.

Verification of association of cGBs with cGBs and T2D, PC, and KC using independent datasets and databases

To verify the association of cGBs with T2D, PC and KC through the independent datasets and databases, we performed disease- cGBs interaction analysis and expression analysis cGBs with T2D, PC and KC as discussed in the following subsections 3.5.1–3.5.2.