Introduction

Diabetic nephropathy (DN) is one of challenging chronic microvascular complication of diabetes, especially in type 2 diabetes, which is mainly manifested by persistent proteinuria and progressive deterioration of renal function1. With a reported prevalence of about 30–40% in diabetic patients, DN has emerged as a leading cause of end-stage renal disease (ESRD)2, significantly escalating the global healthcare burden. At present, it is generally recognized that the pathogenesis of DN is closely related to hyperglycemia, inflammation, oxidative stress, and lipid deposition3. In clinical practice, angiotensin-converting enzyme (ACE) inhibitors and renin-angiotensin-aldosterone system (RAAS) blockers are commonly employed to manage blood pressure, while glucose cotransporter 2 (SGLT2) inhibitors regulate blood glucose, and statins to reduce lipid content, collectively aiming to delay the DN progression4,5. However, above treatment approaches are primarily geared towards delaying the progression of DN and can cause adverse reactions such as gastrointestinal discomfort, rhabdomyolysis, and liver damage, particularly with prolonged and high-dose usage, significantly impacting the quality of life of patients. Therefore, it is of great significance to continuously elucidate the pathogenesis of DN and develop novel effective treatment methods.

Anoikis is a special type of programmed cell death triggered by the deprivation of connections between normal cells and their ECMs, shares similarities with apoptosis by utilizing the conventional apoptotic pathway to induce cell death6. Anoikis is characterized by its effectiveness in clearing shed cells, preventing them from reattaching to the newly created matrix and proliferating, which not only ensures the dynamic balance of normal cell proliferation, differentiation, and apoptosis, but also plays a key regulatory role in conditions like metastatic cancer, cardiovascular disease, and DN7,8,9. The mechanism of anoikis in DN is closely linked to integrin signaling pathway, PI3K-AkT signaling pathway, Fas signaling, TNF-α, TRAIL, receptor-interacting protein (RIP), TNFR1/TNFR2, DR4/DR5, and reactive oxygen species (ROS)10. Studies have identified podocyte anoikis as the primary manifestation of anoikis in DN, characterized by the complete disappearance of podocyte foot processes along the basement membrane of surrounding capillaries11. Moreover, reduced α3β1 integrin in DN patients and streptozosin-induced diabetic rats results in focal detachment of podocytes from the glomerular basement membrane12,13. Additionally, overactivation of mTORC1 stimulates endoplasmic reticulum stress and epithelial-mesenchymal transmutation phenotypes in podocytes, ultimately leading to podocyte anoikis8,14. Despite accumulating evidence highlighting the pivotal role of lost-nest apoptosis in DN development, exploration of the role of ARGs in the pathogenesis of DN remains limited. Therefore, a more in-depth and detailed analysis is urgently needed to uncover the mechanisms of action of ARGs in DN, which is of great practical significance for predicting the prognosis and progression of DN patients and providing individualized treatment approaches for them.

In this work, microarray datasets of DN patients were firstly downloaded from the integrated GEO dataset and conducted differential expression analysis after merging. Then, the obtained DEGs were intersected with ARGs to acquire DEARGs. Subsequently, the GSEA (GO and KEGG analyses) was performed, and three different machine learning algorithms were used to identify DEARGs associated with DN. Following this, the CIBERSORT was utilized to characterize the immune cell infiltration in the kidneys of DN patients compared to normal kidney tissue. Finally, we evaluated the relationship between identified candidate diagnostic markers and immune infiltration to further elucidate the molecular mechanisms of DEARGs in DN. Analysis in this work provides novel insights that may guide future studies on anoikis and inform potential immunotherapy approaches for DN treatment.

Materials and methods

Collection and pre-processing of DN dataset

The DN related microarray expression datasets were retrieved from the GEO dataset (http://www.ncbi.nlm.nih.gov/geo/). The dataset must meet the following criteria: (1) the species must be human; (2) studies that enrolled DN patients and healthy controls; (3) studies that enrolled at least 10 DN patients and healthy controls; (4) those taking kidney tissues or blood as samples. Specifically, the microarray datasets included GSE30528 (GPL571), GSE47184 (GPL11670), GSE96804 (GPL17586), and GSE30528 (GPL571). These datasets consisted of 13 healthy control samples and 15 DN samples, 4 normal samples and 11 DN samples, and 20 normal samples and 41 DN samples. To enhance statistical power and minimize potential batch effects, the datasets were integrated using the “sva” packagelly, GSE30122 was adopted for verification cohort of DN (Table 1). The potential batch effects we. Additionare visualized by principal component analysis (PCA) using the PCAtools software package.

Table 1 Information of validation set and training set.

Identification of differentially expressed ARGs between DN and normal samples

Differential gene expression analysis was performed using the limma package in R, with the screening criteria set as p.adjust < 0.05 and |log2 fold change (log2FC)| ≥ 115. In addition, ARGs were sourced from the Genecards dataset (https://www.genecards.org/), Harmonizome dataset (https://maayanlab.cloud/Harmonizome), and GSEA dataset (https://www.gsea-msigdb.org/gsea/). The above DGEs were intersected with ARGs to obtain DEARGs. This approach yielded DEGs between DN and normal samples. The above DGEs were intersected with ARGs to obtain DEARGs16,17.

The GO and KEGG enrichment analysis on ARGs

Using the R packages, including “clusterProfile”, “enrichplot”, “ggplot2”, “org.Hs.eg.db”, “GOplot”, and “DOSE”, GO and KEGG analyses were conducted on DEARGs to determine the function of DEARGs15,18,19. P < 0.05 was considered statistically significant.

GSEA

To further investigate the potential functions of all genes in the GSE30528, GSE47184, GSE96804 datasets, the clusterProfiler package (version [4.4.4]) was employed for GSEA. The specific specie was Homo sapiens, using the c2.cp.kegg.v2023.1.Hs.entrez. gmt dataset as the reference gene set and MSigDB Collections gene set database. The corrected absolute normalized enrichment score (|NES|) > 1, false discovery rate (FDR) < 0.25 and p.adjust < 0.05 were used as the screening criteria. Meanwhile, the core genes that play key roles in these enrichment pathways were obtained16,20,21.

Analysis of immune cell infiltration

CIBERSORT, an algorithm employing gene deconvolution to infer 22 human immune cell types, utilizes marker gene signatures to quantify relative scores for each cell type, which can accurately calculate the abundance of immune cells based on gene expression data22. Here, the gene expression matrix data was uploaded to CIBERSORT, considering a difference with P < 0.05 as statistically significant. Bar and heat maps illustrating immune cell infiltration distribution in normal and ND samples were generated using the “pheatmap” package in R. Furthermore, heat maps displaying the correlation between immune cell infiltration were also plotted. Violin plots, depicting the variance in immune cell infiltration between DN patients and control samples, were generated using the “vioplot” package in R to enhance visualization. In addition, to further analyze the difference in immune cell infiltration between the DN group and the control group, ssGSEA was added to evaluate the types of immune cells in the microenvironment, which could identify 29 immune cell phenotypes, including T cells, B cells and NK cells23. xCell is a sophisticated algorithm that evaluates the infiltration levels of 64 immune and stromal cell types, including extracellular matrix cells, epithelial cells, hematopoietic progenitors, and both innate and adaptive immune cells. The xCell algorithm assessed immune cell infiltrations across two subgroups.

Machine learning to screen for key genes

Three machine learning algorithms, namely the least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE), and random forest (RF), were employed to identify key genes associated with DN24,25,26. The LASSO algorithm was implemented using the “glmnet” package in R, selecting 10-fold cross-validation to identify genes that exhibited prominence in distinguishing between DN and normal samples. Meanwhile, for the RF algorithm, the “randomForest” package in R software was adopted to determine the optimal number of variables based on the average error rate of DEARG. Genes with feature importance scores greater than 1 were selected as reference genes. In addition, the “e1071” package in R was leveraged to complete the SVM-RFE algorithm, and the result with the smallest cross-validation error was taken as the threshold for candidate genes. By intersecting the results of these three machine learning algorithms, the genes identified as common to all three were selected as diagnostic gene for DN. The intersection Venn diagram of the three algorithms was constructed by using Venn online website.

Construction and assessment of nomogram

The construction of a nomogram holds significant value for the clinical diagnosis of DN. In this work, “rms” in R was employed to develop a nomogram for candidate genes. In this representation, “Points” correspond to the scores assigned to candidate hub genes, and “Total Points” represents the cumulative sum of scores of all the above genes27.

Correlation between candidate feature genes and immune cell infiltration and validation of candidate feature genes

Spearman correlation analysis in R was adopted to further examine the correlation between key feature genes and immune cell infiltration. The visualization of these results was facilitated using “ggstatsplot” and “ggplot2” packages in R. To evaluate the diagnostic value of DEARG in DN, receiver operating characteristic (ROC) curves were constructed using the “pROC” package in R. The area under ROC curve (AUC) was calculated, where AUC > 0.75 was considered the ideal diagnostic value28. Additionally, the predictive power of these signature genes was validated in the validation set (GSE30122).

Construction of CeRNA networks for candidate trait genes

Furthermore, the following steps were implemented to construct a ceRNA network associated with candidate trait genes, involving long non-coding RNA (lncRNA), microRNA (miRNA), and candidate trait genes. (1) The miRDB, miRanda, and TargetScan datasets were based to predict the relationship between miRNA and candidate diagnostic genes. (2) The spongeScan dataset was utilized to predict the interaction between LncRNA and miRNA. (3) Finally, Cytoscape (version 3.7.2) was responsible for identifying crossed lncRNAs and miRNAs, thereby constructing a ceRNA network with candidate feature genes.

Results

Analysis and identification of ARGs in DN

Three publicly available DN microarray datasets (GSE30528, GSE47184, and GSE96804) were acquired from the GEO dataset for further analysis. Following the removal of batch effect, the data were emerged and utilized for subsequent analyses (Fig. 1A and B). Figure 1D illustrates the presence of significant differences among three datasets. The 10,907 genes were processed using limma package, and 482 DN-related DEGs were obtained, which were visualized by heat maps and volcano maps (Fig. 1C and F). To delve deeper into the role of ARGs in DN, 482 DEGs were intersected with 924 ARGs, yielding 59 DNN-related ARGs. Among them, 27 ARGs (FN1, MMP2, TPM1, JAG1, LPAR1, IGFBP3, TNC, MCAM, CD44, etc.) were up-regulated, while 32 (PDK4, JUN, TGFBR3, NR4A1, S100A8, HTRA1, CHI3L1, BMP7, HSPA1A, MAOA, etc.) were down-regulated. Figure 1G depicts the correlations among the 22 ARGs (|LogFC| > 0.7). Simultaneously, the chromosomal locations of the 22 ARGs were visualized (Fig. 2C), showcasing their positions on the chromosomes. Inferences from gene co-expression networks for the 22 ARGs demonstrated a close relationship between these genes (Fig. 1G). These results suggest that ARGs are differentially expressed in DN patients and may be related with the pathogenesis of DN.

Fig. 1
figure 1

Expression profiles of differential genes in DN patients. (A and B): PCA diagrams; (C): Heat maps of DEGs; (D): Data clustering class analysis diagram; (E): Venn diagram of intersection of DEGs and ARGs; and (F): Volcano plots of DEGs; (G): Heat maps of correlation coefficients of 22 DEGs.

GO and KEGG enrichment analyses on DEARGs

GO and KEGG enrichment analyses were performed on ARGs in DN to further clarify their biological functions. In terms of biological processes (BPs), these ARGs are primarily linked to kidney development, glomerular epithelial cell development, mesenchymal differentiation, positive regulation of the cellular chemotactic MAPK cascade, positive regulation of epithelial cell proliferation, and positive regulation of endothelial cell proliferation. Regarding cellular components (CCs), these ARGs exhibited significant association with the ECM containing collagen, secretory granule lumen, cytoplasmic vesicle lumen, and vesicle lumen. Finally, for molecular function (MF), these ARGs are mainly associated with G protein-coupled receptor-binding lobes, endopeptase activity, growth factor active collagen binding, receptor ligand activity, RAGE receptor-binding acid binding, and ribocorticosteroid receptor binding (Fig. 2A and C). In addition, KEGG pathway analysis revealed that these ARGs are predominantly enriched in the MAPK, PI3K − Akt, IL − 17, TNF, and HIF − 1 signaling pathways (Fig. 2B and D). Consequently, these findings suggest that ARGs may play a role in the onset and progression of DN through differential expression in the aforementioned signaling pathways.

Fig. 2
figure 2

GO function and KEGG enrichment analyses of ARGs. (A): Bubble map for GO enrichment analysis of up-regulated ARGs; (B): Bar chart for KEGG enrichment analysis of up-regulated ARGs; (C): Bubble chart for GO enrichment analysis of down-regulated ARGs; and (D): Bar chart for KEGG enrichment analysis of down-regulated ARGs.

Screening of ARGs related to DN

RF, LASSO, and SVM-RFE were adopted to further refine and identify key DEARGs from 59 DEARGs to predict the occurrence of DN. Firstly, the cross-validation error curve of the SVM-RFE algorithm pinpointed 23 potential core genes (Fig. 3A, B, and Table 2). Secondly, based on the lowest point in the ten-fold cross-validation error curve within the LASSO regression model, 19 candidate genes emerged as potential contributors to the development of DN (Fig. 3C, D, and Table 2). Finally, employing the random forest tree algorithm, 14 feature genes with relative importance scores greater than 1 were identified (Fig. 3E, F; Table 2). Then, the genes derived from these three machine learning models were amalgamated, leading to the identification of seven DEARGs associated with DN (PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF) for subsequent analysis (Fig. 6F).

Fig. 3
figure 3

Construction of LASSO, SVM. and RF models. (A and B): The best biomarkers screened by SVF-RFE algorithm (n = 23); (C and D): variables selected in LASSO model (n = 19). (E and F): important feature genes selected by RF algorithm (n = 14); and (G): Venn diagrams of common feature genes of the three algorithms.

Table 2 The genes derived from three machine learning models.

GSEA result analysis

GSEA was performed to explore the regulatory pathways or biological functions related to target gene expression. Compared with traditional enrichment analysis, GSEA does not need to specify a clear differential gene threshold, and is realized according to the actual overall trend by the algorithm, without missing some key information due to unreasonable screening parameters. Therefore, GSEA can retain the key information without differential screening, so as to identify the functional gene sets with insignificant differences but very consistent trends. Among them, 1405 gene sets were significantly enriched under the threshold of FDR < 0.25 and p.adjust < 0.05. These genes were mainly enriched in: Oxidative phosphorylation, Cytokine-cytokine receptor interaction, PI3K-Akt signaling pathway, ECM-receptor interaction, TGF-beta signaling pathway, Notch signaling pathway, NF-kappa B signaling pathway, etc. (Fig. 4). In addition, 83 targets including GPX3, PTGS2, JUN, EGF, CDKN1B, VEGFA, VEGFC and CAV1 were common targets obtained by DEGs and GSEA enrichment analysis, and CDKN1B and EGF were coincident targets with our core targets. In response to that, single gene GSEA was further conducted to explore its potential function. The results showed that CDKN1B is correlated with Cytokine-cytokine receptor interaction, complement and coagulation cascades and ECM-receptor interaction, while EGF is closely related to signaling pathways such as ECM-receptor interaction, Chemokine signaling pathwa, and Cytokine-cytokine receptor interaction (Fig. 5).

Fig. 4
figure 4

GSEA results.

Fig. 5
figure 5

GSEA results of CDKN1B and EGF. (A): GSEA results of CDKN1B; (B): GSEA results of EGF.

Validation and expression patterns of characteristic genes

To verify the reliability of these characteristic gene diagnoses, ROC curves were employed to assess the diagnostic performance of seven central genes in both the training set (GSE30528) and the validation set (GSE30122). As depicted in Fig. 6A and C, the diagnostic model demonstrated excellent discrimination between normal and DN samples in both the training set and the validation set (AUC = 0.987 and AUC = 1, respectively). In addition, as illustrated in Fig. 6B and D, the ROC curve areas for all seven feature genes surpassed 0.7 in both the training set and validation set, indicating robust diagnostic efficacy for each of the seven feature genes.

Fig. 6
figure 6

The diagnostic values of characteristic genes in DN. (A and C): The training set (GSE30528) and validation set (GSE30122) to apply logistic regression models for a construction of diagnostic models; (B and D): ROC curves of seven characteristic genes in two datasets.

Additionally, the expression patterns of seven characteristic genes were investigated from training set (GSE30528) and validation set (GSE30122). It was observed that the expressions of PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF in DN samples were significantly reduced in the training set GSE30528 (Fig. 7A–G) in comparison to those in normal samples. This observation was consistently confirmed in the validation set GSE30122 (Fig. 7H–N). The RT-qPCR results confirmed that compared with the control group, the mRNA expressions of EGF, S100A8 and CHI3L1 were elevated in the DN cell model, while the those of CDKN1B, WT1, PDK4 and HTRA1 were down-regulated in the DN cell model, aligning with the trend of previous bioanalysis, which further verified the credibility of bioanalysis results.

Fig. 7
figure 7

Expression patterns of seven characteristic genes in GSE30528 and GSE30122. (AG): Expression patterns of PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF in training set (GSE30528), respectively; (HN): Expression patterns of PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF in validation set (GSE30122), respectively.

Fig. 8
figure 8

RT-qPCR results of seven key genes in normal and DN samples (*p < 0.05, **p < 0.01).

Construction of nomogram model for DN prediction

In this work, a DN correlation nomogram model was developed based on seven DEARGs (TPDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF), as shown in Fig. 8D. In Figs. 8A and 9A, the calibration curves in the training set and the verification set were highly coincidently with the standard curves, indicating the high accuracy of nomogram in predicting DN. The results of decision curve analysis and clinical impact curve analysis showed that the nomogram model built based on DEARGs can provide high-quality support for DN identification (Figs. 8B and C and 9B, and C). These findings suggest that the risk score model exhibits good predictive capability, with seven key diagnostic biomarkers playing key roles in DN progression (Fig. 10).

Fig. 9
figure 9

Construction of diagnosis nomogram model and assessment of its effect in distinguishing DN. (A): Calibration curve of the model; (B): decision curve of the model; (C): clinical impact curve of the model; and (D): nomograms for the seven characteristic genes in diagnosing DN.

Fig. 10
figure 10

Construction of diagnosis nomogram model and assessment of its effect in distinguishing DN. (A): Calibration curve of the model; (B): decision curve of the model; (C): clinical impact curve of the model; and (D): nomograms for the seven characteristic genes in diagnosing DN.

Immunoinfiltration analysis

Previous studies have indicated that ARGs play a crucial role in the pathogenesis of DN by regulating the immune system. To gain a deeper understanding of the immunomodulatory aspect of DN, the CIBERSORT method was employed to analyze the immunological characteristics of the dataset. Specifically, profiles of 21 different types of immune cells in DN patients were generated. The heat map depicted the ratios and distinctions in the infiltration of 22 different types of immune cells between DN and normal samples (Fig. 11B). The correlation heat map of immune cells revealed several pairs of immune cells with both positive and negative correlations. Notably, as illustrated in Fig. 11A, there was a significant positive correlation between activated dendritic cells (DCs) and memory CD4 T cells (r = 0.87). In contrast, Macrophages M2 was negatively correlated with Plasma cells (r = 0.6). In addition, compared with normal samples, B cells naive, B cells memory, Macrophages M2, and activated DCs were significantly increased in DN samples, while neutrophils exhibited a significant decrease (Fig. 11C). Similarly, ssGSEA revealed Immature B cell, Neutrophil, T follicular helper cell, Central memory CD4 T cell, Macrophage, Immature dendritic cell, Natural killer cell, Activated B cell, Effector memeory CD4 T cell, Regulatory T cell, Type 1 T helper cell showed significant differences between DN and normal groups (Fig. 11D).

Fig. 11
figure 11

Analysis of immune cell infiltration. (A): Correlated heat maps of immune cells; (B): Distribution of 22 different immune cells in DN and normal samples; and (C): Violin diagram showing DN versus immune cell infiltration in normal samples; (D): ssGSEA results.

The results of the Xcell analysis revealed statistically significant differences between the two groups: aDC、CD4+memory T cells、CD4+ T cells、CD8+T cells、Epithelial cells、Macrophages、Macrophages M1, MEP, Myocytes and Th2 cell swere statistically significant (Fig. 12).

Fig. 12
figure 12

The identification of differentially infiltrating immune cells by XCell methods.

Immunoinfiltration correlation of characteristic genes

Furthermore, we analyzed the correlation between characteristic genes and 22 types of immune cells. The results unveiled that CDKN1B was positively correlated with Neutrophils and Monocytes, while negatively associated to Mast cells resting, T cells CD4 memory activated, and Macrophages M2 (Fig. 13A). CHI3L1 exhibited a positive correlation with Neutrophils, NK cells resting, and Monocytes, while negatively linked with Mast cells resting and Macrophages M2 (Fig. 13B). EGF was negatively correlated with B cells memory and Macrophages M2 (Fig. 13C). HTRA1 was positively correlated with Neutrophils and Monocytes, and negatively correlated with T cells CD4 memory activated, Mast cells resting, and Macrophages M2 (Fig. 13D). PDK4 was positively correlated with Neutrophils and negatively correlated with Mast cells resting (Fig. 13E). Furthermore, S100A8 was positively correlated with Neutrophils (Fig. 13F). WT1 was negatively correlated with Neutrophils, Mast cells activated, memory CD4 T cells activated, Macrophages M2, and Mast cells resting (Fig. 13G). Consistent with the analysis using CIBERSORT, the ssGSEA results showed that, in addition to PDK4, WT1, HTRA1, EGF, CHI3L1 and CDKN1B were negatively correlated with Activated CD8 T cell, Activated dendritic cell, Central memory CD4 T cell, Macrophage and MDSC but positively related to S100A8 (Fig. 14). Overall, the expression of these genes appears to be linked to the levels of infiltration of multiple immune cells, suggesting that these key diagnostic genes are likely involved in immune regulation during the pathogenesis of DN.

Fig. 13
figure 13

Correlation between expressions of characteristic genes and immune cell infiltration. (AG): Correlations of CDKN1B, CHI3L1, EGF, HTRA1, PDK4, S100A8, and WT1 expression to immune cell invasion, respectively.

Fig. 14
figure 14

Correlation between expressions of characteristic genes and immune cell infiltration.

Construction of CeRNA network for characteristic genes

Generally, lncRNA is perceived as a ceRNA that competes for binding with miRNA. The ceRNA analysis involves the examination of the entire ceRNA regulatory network, with lncRNA-miRNA-mRNA analysis as the core of the network. To further elucidate the potential regulatory mechanisms of these seven characteristic genes in DN, the ceRNA regulatory network was developed by identifying lncRNAs and mRNAs regulated by the same miRNA. Unfortunately, the lncRNAs and mRNAs that regulate CHI3L1 and S100A8 were not predicted. The network consisted of 47 miRNAs, 5 mRNAs, 134 lncRNAs, and 74 interacting pairs (Fig. 15).

Fig. 15
figure 15

The ceRNA network constructed by lncRNA (pink), miRNA (blue), and mRNA (green).

Discussion

The global incidence of diabetes is steadily increasing, especially in developing countries, resulting in a rise in the prevalence of DN which is a common and severe complication of diabetes29. Moreover, DN can progress to ESRD, which brings huge economic burden to global medical treatment30. Presently, clinical treatments primarily focus on delaying the progression of DN, and there is a lack of specific drugs for its treatment due to the intricate pathogenesis of DN. Therefore, continuous exploration of DN mechanism is of great clinical significance. In recent years, anoikis, as a novel form of programmed cell death akin to the apoptosis process, has become the focus of scientific research. Anoikis can be triggered by pathways mediated by mitochondria and cell surface death receptors, thus acting as a protective physiological barrier against the metastasis of cancer cells in various tumors31,32. However, the potential role of ARG in DN is still in its infancy, and their mechanism of action in DN requires further exploration. In this context, this work was developed. It analyzed the differential expression of ARGs between normal and DN samples for the first time, and constructed a prognostic model related to anoikis. Results in this work signified that the model constructed accurately predicted the prognosis of DN and provide valuable support for the exploration of DN-related mechanisms.

In this work, the GO function and KEGG pathway enrichment analyses were performed on DEARGs to further investigate their biological functions. The GO enrichment results revealed their involvement in various BPs (such as kidney development, glomerular epithelial cell development, positive regulation of epithelial cell proliferation, and endothelial cell proliferation), CCs (like ECM, cytoplasmic vesicle cavity, and vesicle cavity), and MFs (e.g., G protein-coupled receptor-binding leaf, endopeptidyase activity, growth factor activity, collagen binding activity, receptor ligand activity, and RAGE receptor-binding acid binding). Furthermore, the KEGG pathway analysis highlighted the enrichment of these ARGs in MAPK, PI3K/Akt, IL − 17, and TNF signaling pathways. The MAPK signaling pathway is intricately involved in various BPs such as cell proliferation, aging, and apoptosis, and its dysfunction is implicated in the occurrence of various diseases. Previous studies have suggested that high glucose (HG) induces endothelial damage by activating the MAPK signaling pathway, thereby accelerating the onset of diabetic vascular complications. Conversely, inhibiting it has shown promise in mitigating renal fibrosis of DN patients33,34. In addition, the activities of MAPK (p-P38), ERK (p-ERK), and JNK (p-JNK) are significantly enhanced in renal tissues affected by DN. Inhibition of the MAPK signaling pathway leads to a reduction in the production of renal fibrosis-related cytokines and collagen in renal tissues, indicating a pivotal role of this pathway in the progression of DN35,36,37. The PI3K/Akt signaling pathway serves as a key mediator of cellular growth and apoptosis. Meanwhile, p-AKT functions to block the binding of Bad and Bcl-2, thus inhibiting the occurrence and progression of apoptosis38. Decreased phosphorylation of PI3K-Akt has been associated with podocyte anoikis, underscoring the importance of this signaling pathway in renal function. Numerous studies have demonstrated the activation of the PI3K-Akt signaling pathway39,40. Therefore, dysregulation of the PI3K/AKT signaling pathway appears to be a rational contributor to the progression of DN. In recent years, the relationship between IL-17 and DN has attracted more and more attention among researchers. IL-17 has been reported to exhibit increased expression in the peripheral blood of DN patients41. Ma et al. confirmed that mice deficient in IL-17 showed reduced proteinuria and improved glomerular injury and fibrosis in a diabetic environment, suggesting a pivotal role for IL-17 signaling pathway in the pathogenesis of DN42. These discrepant conclusions may signify variations in the expression of the IL-17 signaling pathway in different samples and stages of DN. As a well-known inflammatory cytokine closely related to kidney injury, TNF signaling pathway can regulate various intracellular signaling pathways involved in inflammation and immunity43. Chen et al. observed up-regulation of levels of p-P65, p-JNK, and p-P38, where are molecular proteins in the TNF signaling pathway, in the kidney tissues of streptozotocin (STZ)-treated mice. Such outcome was consistent with the change trend observed in apoptosis-related protein Bax44. In addition, targeted inhibition of TNF-α can effectively alleviate kidney injury induced by STZ in DN rats45. These studies on DN signaling pathways lend support to our results to a certain extent.

In this work, the differential expressions of ARGs between DN and normal kidney samples registered from the GEO dataset were comprehensively analyzed. 59 DEGs were identified to be associated with anoikis. The difference between ARGs in DN and normal kidney samples suggests that they may be involved in the occurrence and progression of DN. Besides, a model was constructed and its performance was validated based on seven ARGs, namely PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF, using multiple algorithms. The results unveiled that PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF may be potential therapeutic targets for DN. PDK4, a member of the serine kinase family, exhibits the strongest correlation with metabolic diseases, especially type 2 diabetes46. Han et al. experimentally verified under HG stimulation, the expression of PDK4 continued to decline in the DN cell model, and the diminished expression of PDK4 adversely affected the apoptosis process in DN cells47. Therefore, it is speculated that PDK4 might participate in the cell apoptosis in DN patients. Moreover, inhibition of PDK4 can protect mouse podocytes from HG-induced apoptosis and inflammation during HG-induced injury48. S100A8, also known as MRP8, belongs to the S100 protein family and is crucial in the immune system and inflammatory processes49. A clinical study reported significantly lower levels of S100A8 in serum samples from DN patients compared to normal participants50. In contrast, another study found elevated expression of S100A8 in the tubular epithelial region of diabetic mice and diabetic patients. Meanwhile, it unveiled that inhibiting the expression of S100A8 could reduce the endothelial interstitial transformation of tubular epithelial cells, thereby improving renal interstitial fibrosis in DN51. These differing observations may be attributed to variations in the expression of S100A8 in different kidney regions. Numerous studies have confirmed the involvement of WT1 in the development of the kidney, neurons, and genitourinary tissue, and considered it as a biomarker for mature podocyocytes. In DN cases, the number of labeled cells in the glomeruli is significantly less than that in the control group, indicating reduced expression of WT152,53, which aligns with the results of this work. Besides, the activation of the p53 signaling pathway can induce apoptosis of DN podocytes by regulating WT154. Studies have observed up-regulated expression of CDKN1B in the glomeruli and HG-induced mesangial cells of diabetic animals. Its up-regulation is closely related to the hypertrophy of MC and ECM deposition, while inhibiting expression of CDKN1B can delay the progression of DN55,56. Dong et al. identified a low expression of CDKN1B in the glomeruli of DN patients based on the Nephroseq dataset, and confirmed that the mRNA expression of CDKN1B in the cells gradually decreased with the increase of glucose concentration in cells, thereby supporting the reliability of this study57. These results suggest a crucial role for CDKN1B in the progression of DN. EGF receptor (EGFR), a member of the receptor tyrosine kinase ErbB family, is widely expressed in mammalian kidneys. Previous studies have indicated that activation of EGFR signaling in renal epithelial cells exacerbates diabetic kidney injury, while inhibiting the EGFR signaling pathway can preserve podocyte numbers and reduce proteinuria58,59. Furthermore, EGF, as the most intensively studied EGFR receptor, is highly expressed in the serum of DN patients and partially excreted through urine, making EGF a potential prognostic biomarker for DN60,61. Through literature search, it was found that there are no reported studies on the specific mechanisms and pathways of HTRA1 and CHI3L1A in the occurrence and development of DN. Therefore, future research efforts should focus on elucidating the mechanisms of action of HTRA1 and CHI3L1A in DN, enhancing our understanding of the pathogenesis of DN and contributing to the development of new treatment strategies for DN.

The etiology of DN is complex and involves multiple factors, and recent evidence suggests that immunity plays a key role in its pathogenesis62,63. This work revealed a significant increase in naive B cells, memory B cells, Macrophages M2, and activated DCs in DN samples, while neutrophils decreased significantly. Several studies have implicated B cell activation in the DN process, suggesting that it may contribute to the development of DN through its role in antigen presentation, antibody generation, immune complex generation, or cytokine generation64,65. Macrophages are the most commonly infiltrating immune cells in the kidneys of DN patients, and their accumulation in kidney tissue is closely linked to declining renal function in DN66. Although M1 macrophages are generally considered pro-inflammatory, while M2 macrophages are mainly anti-inflammatory67. However, a previous study demonstrated that both M1 and M2 macrophage infiltrations were higher in DN patients than in the control group, increasing gradually with the duration of diabetes, aligning with our findings68. DCs are the major antigen-presenting cells derived from bone marrow. HG has been reported to promote the activation and maturation of DCs. The accumulation of these cells has been observed in DN animal models and human kidney biopsies69,70. Together, this evidence suggests that DCs are involved in kidney injury under DN conditions, and its activation may be a key step in the progression of DN. The observed increase in expression of neutrophil in STZ-induced diabetic mouse models, in contrast to the low expression of neutrophils in DN patients in this work, highlights the potential differences in neutrophil behavior among various DN model71. This discrepancy suggests the need for further research to comprehensively explore the nuanced mechanisms involving neutrophils in the context of DN72. Finally, the analysis of associations between PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, EGF, and immune cells revealed mostly positive and negative correlations with neutrophils, M2 macrophages, resting mast cells, and CD4 + T cells. These results suggest that these seven genes might play critical roles in shaping the immune microenvironment of DN patients.

The construction of CeRNA network can help us identify and investigate potential regulatory pathways and targets, providing new insights into the pathogenesis and biological processes of disease. In recent years, increasing studies have shown that the ceRNA network is widely involved in various diseases such as tumors and nervous system diseases73. However, there are few studies on ARGs in DN. Therefore, a ceRNA network associated with DKN1B, EGF, HTRA1, PDK4, and WT1 was constructed in this study, including 47 miRNAs, 5 mRNAs, and 134 lncRNAs. Studies have found that the expression of CDKN1B is upregulated in the glomeruli and HG-induced mesangial cells of diabetic animals, and its upregulation is closely related to the hypertrophy of mesangial cells (MC) and ECM deposition, while inhibiting its expression can delay the progression of DN55,56. In addition, Dong et al. found a low expression of CDKN1B in the glomeruli of DN patients based on Nephroseq dataset, and confirmed that the expression of CDKN1B mRNA in the cells gradually decreased with the increase of glucose concentration, which further confirmed the reliability of this study57. These results suggest that CDKN1B may play an important role in the progression of DN. The expression of L13Rik was significantly increased in peripheral blood, kidney tissue of diabetic mice and HG-treated MC of DN patients, while the expression of miR-2861 was decreased. L13Rik acted as a ceRNA to absorb miR-2861, thereby relieving the inhibition of CDKN1B and regulating cell cycle frontal MC hypertrophy74. It has been demonstrated that NEAT1 promotes HG-induced hypertrophy of human MCs, and the NEAT1 sequence has been found to partially complement the miR-222-3p sequence, while the expression of miR-222-3p is decreased in HG-stimulated and diabetic rats. Furthermore, we found that the miR-222-3p sequence is complementary to the 3’utr of CDKN1B, and knockdown of CDKN1B reverses the effects of NEAT1 knockdown and miR-222-3p inhibitors on cell cycle profiles, protein synthesis, and protein content. These results indicate the role of NEAT1-miR-222-3p-CDKN1B in regulating cell hypertrophy75. As a member of the serine kinase family, PDK4 has the strongest association with metabolic diseases, especially type 2 diabetes46. Han et al. verified through the DN cell model that the expression of PDK4 continued to decrease under HG stimulation, and its low expression affected the apoptosis of DN cells. Consequently, we speculated that PDK4 might also participate in the apoptosis of DN cells47. In addition, inhibition of PDK4 can protect mouse podocytes from HG-induced apoptosis and inflammation during HG-induced injury48. LncRNA H19 acts as a miRNA sponge to regulate EndMT, proliferation, inflammation, EMT, fibrosis, oxidative stress, iron death and cell pyrosis. Studies have found that H19 is significantly expressed in TGF-β2-induced HMVEC fibrosis and STZ-induced diabetic CD-1 mouse fibrotic kidneys. H19 knockout can significantly reduce renal fibrosis in vivo and in vitro, and the mechanism is related to inhibited miR-29a in HMVEC induced by renal complex diabetes and TGF-β2 fibrosis76. Another study reported that miR-15b-5p can down-regulate the expression of PDK4 and then lower the expression of downstream VEGFAn, ultimately helping to inhibit HG-induced apoptosis and inflammation of mouse podiocytes48. These results suggest the role of H19-miR-15b-5p-PDK4/VEGFA in regulating HG-induced podocyte apoptosis and inflammation. Jie et al. confirmed that the expression of KCNQ1OT1 was elevated in HG-induced human MCs and human glomerular endothelial cells, and knockdown of KCNQ1OT1 could inhibit cell proliferation and fibrosis and induce cell apoptosis, suggesting that KCNQ1OT1 could promote the progression of DN77. On the other hand, miR-1207-5p is theoretically a ceRNA of KCNQ1OT1, and its overexpression can significantly inhibit cell proliferation, migration and induce cell apoptosis78. A high expression of miR-1207-5p has been reported in kidney cells and under HG stimulation, so inhibiting its expression may be a potential target for the treatment of DN79. A number of studies have confirmed that WT1 is involved in the development of kidney, neuron and genitourinary tissue, so it is considered as a biomarker of mature podocyocytes. It has also been reported that the number of labeled cells in the glomeruli of DN cases is significantly less than that of the control group, indicating a low expression of WT1 in DN patients, which is consistent with the results of this study52,53. Moreover, activation of p53 pathway by regulating WT1 can induce apoptosis of podiatocytes in DN patients54. Therefore, it is speculated that H19/miR-15b-5p/PDK4 and KCNQ1T1/miR-1207-5p/WT1 may be potential ceRNA regulatory pathways regulating early DN progression.

In conclusion, this work represents the first comprehensive analysis of the mechanism of ARGs in DN. The bioinformatics revealed the expression profile of ARGs in DN and their correlation with immune cell infiltration. Meanwhile, seven DN-related apoptosis genes (PDK4, S100A8, HTRA1, CHI3L1, WT1, CDKN1B, and EGF) were obtained utilizing machine learning algorithms. This finding provides new insights into the mechanisms for development of DN at the transcriptome level. Additionally, the correlation of feature genes with immune cell infiltration was analyzed, and a ceRNA network based on feature genes was constructed to explore their molecular regulatory mechanisms. However, this work is subjected to several potential limitations. Firstly, the sample size for analysis and verification is relatively small. Secondly, the absence of relevant in vitro and in vivo studies necessitates further validation to confirm specific details. Therefore, future research should delve into more extensive investigations at multiple research levels, especially at the molecular level, to provide additional insights into the diagnosis and progression of DN. Such endeavors may unveil new diagnostic and therapeutic targets for patients with DN.