Introduction

Kidney stone disease is a globally prevalent condition, with a worldwide prevalence ranging from 1 to 13%, and it shows a rising trend annually. Kidney stones are characterized by a high risk of recurrence. The spontaneous recurrence rate within five years after the first episode of renal colic is 35–50%. If patients do not receive follow-up treatment, the 10-year recurrence rate for secondary stone formation can reach as high as 50%. Its high prevalence and recurrence rate also result in a substantial economic burden1,2. The primary treatment methods for this disease currently include pain management, pharmacological therapy, and surgical removal of stones, but each has its limitations. Early identification of kidney stones and targeted preventive treatment may address these challenges, which also represent a significant clinical difficulty.

Approximately 80% of kidney stones are primarily composed of calcium oxalate (CaOx) crystals, and most CaOx stones form on subepithelial plaques of calcium phosphate (CaP) on the renal papillae, known as RP. The pathophysiology of Randall’s plaque formation remains unclear, with several hypotheses proposed, including inflammation, oxidative stress, purine metabolism, urea cycle, and angiogenesis disorders. Pro-inflammatory factors, disruptions in the urea cycle, and purine metabolism dysregulation exacerbate the production of reactive oxygen species (ROS). Excessive ROS generation may activate pro-inflammatory and pro-angiogenic factors. Oxidative stress (OS) plays a role in maintaining the inflammatory microenvironment and extracellular matrix (ECM) remodeling, contributing to the aggregation of deposits3. Liu et al.4. proposed a molecular mechanism involving reactive oxygen species to explain the transition from RP to kidney stones. However, no further in-depth studies have been conducted on this topic.

With the rapid advancement of bioinformatics, sophisticated database analysis techniques have been employed to investigate the underlying mechanisms of various diseases. Although significant progress has been made in identifying therapeutic targets for kidney stone disease in recent years, there are currently no specific interventions targeting the OS pathway. We aim to systematically identify the key genes involved in the oxidative stress mechanism of RP formation using various bioinformatics approaches. Our goal is to construct a diagnostic model for RP, uncover critical molecular mechanisms, and explore potential therapeutic targets to provide a novel approach for the early detection and treatment of kidney stones.

Results

The methodology employed in this experiment is illustrated in Fig. 1.

Fig. 1
figure 1

Flow chart.

WGCNA

We integrated plaque expression data from GEO and used the ComBat method from the R package “sva” (version 3.42.0) to correct for batch effects caused by non-biological technical biases5. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to study gene sets related to oxidative stress. Scale independence and average connectivity analyses indicated that when the weighting power was set to 7 (Fig. 2A), the average connectivity approached 0, and the scale independence was greater than 0.85. Thirteen co-expression modules were identified, with unrelated genes assigned to the gray module, which was excluded from further analysis (Fig. 2B). To understand the physiological significance of the genes within the modules, we associated the 13 module eigengenes (MEs) with oxidative stress and identified the most significant associations. According to the module-trait correlation heatmap (Fig. 2C), genes clustered in the greenyellow module (152 genes, Table S2) had the strongest correlation with oxidative stress (r = 0.6352, P < 0.05). Therefore, we focused on the greenyellow module in subsequent analyses, as it may more accurately reflect oxidative stress. To examine the relationships between modules and determine their correlations, we associated the MEs. The results are displayed in a heatmap (Fig. 2D). A scatter plot of gene significance (GS) for oxidative stress versus module membership (MM) in the greenyellow module (Fig. 2E). MM and GS exhibited a significant positive correlation (cor = 0.59, p < 0.05), indicating that the most important (central) elements of the greenyellow module also tend to be highly correlated with oxidative stress features.

Fig. 2
figure 2

(A) Soft threshold β = 7 and scale-free topology fitting index (R2). (B) Analysis of the gene expression network identified different modules in the co-expression data. (C) Relationship between module eigengenes and oxidative stress. Each row in the table corresponds to a module eigengene, and each column corresponds to a trait. The numbers in the table represent the correlation between the module eigengenes and traits, with P-values shown in parentheses below the correlation values. Correlations are color-coded according to the color legend. (D) Correlation heatmap of the eigengene network. Each row and column in the heatmap corresponds to the eigengenes of a module (color-coded). In the heatmap, red indicates high adjacency, while blue indicates low adjacency. The red rectangles on the diagonal represent meta-modules. (E) Correlation between module membership (MM) and oxidative stress (OS) gene significance (GS) for all genes in the greenyellow module. Cor represents the absolute correlation coefficient between GS and MM.

Selection and enrichment of oxidative stress-related DEGS

A comparison of the integrated data from plaque samples and control samples identified 911 differentially expressed genes (DEGs), with statistically significant differences between the two groups(p < 0.05, |Log2 fold change| > 0.5). In the plaque samples, 489 genes were upregulated, and 422 genes were downregulated (see Table S3). All DEGs were visualized using a volcano plot (Fig. 3A). Additionally, a heatmap showing the expression levels of the top 5 upregulated genes and the top 5 downregulated genes in the samples is presented (Fig. 3B). By intersecting DEGs with oxidative stress-related module genes, we identified 3 oxidative stress-related differential genes: BFSP1, LONRF1, and TAF1D (Fig. 3C), which were considered as hub genes. Their correlations (Fig. 3D) and expression levels (Fig. 3E) are shown in the figures. To investigate the biological functions of the oxidative stress-related differential genes, we created a protein-protein interaction (PPI) network for the hub genes using the GeneMANIA database (Fig. 3F). We performed Gene Ontology (GO) enrichment analysis on 23 co-expressed genes (including 3 hub genes and 20 genes related to the hub genes, Table S4) using GO terms (Table S5). The GO results indicated that these genes were enriched in the biological processes (BP) of transcription by RNA polymerase I, transcription initiation from RNA polymerase I promoter, and transcription preinitiation complex assembly; in the cellular components (CC) of NuRD complex, CHD-type complex, and transcription regulator complex; and in the molecular functions (MF) of core promoter sequence-specific DNA binding, nucleosomal DNA binding, and histone deacetylase binding (Fig. 3G).

Fig. 3
figure 3

Selection and enrichment analysis of oxidative stress-related differentially expressed genes (DEGs). (A) The volcano plot illustrates the distribution of DEGs between plaque and control samples. Orange, blue, and gray dots represent genes with upregulated, downregulated, and no significant expression, respectively. (B) The heatmap shows the top 10 DEGs ranked by differential expression. (C) The Venn diagram displays the identification of oxidative stress-related DEGs. (D) Correlation heatmap of hub genes. (E) Boxplot of hub gene expression, with p < 0.05. (F) Co-expression protein interaction network of hub genes. (G) GO enrichment analysis results of co-expressed genes of hub genes.

Single-gene GSEA and GSVA

To further explore the potential mechanisms of hub genes, we performed single-gene Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA). Using pathway information from the MsigDB database as a reference, we selected the most significant pathways based on normalized enrichment scores (NES). GSEA results revealed that genes with expression similar to different hub genes were significantly enriched in the following pathways: genes similar to BFSP1 expression were mainly enriched in KEGG_MATURITY_ONSET_DIABETES_OF_THE_YOUNG, KEGG_OLFACTORY_TRANSDUCTION, KEGG_TASTE_TRANSDUCTION, KEGG_NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION, and KEGG_ALDOSTERONE_REGULATED_SODIUM_REABSORPTION (Fig. 4A); genes similar to LONRF1 expression were primarily enriched in KEGG_RIBOSOME, KEGG_OXIDATIVE_PHOSPHORYLATION, KEGG_SPLICEOSOME, KEGG_PARKINSONS_DISEASE, and KEGG_STARCH_AND_SUCROSE_METABOLISM (Fig. 4B); and genes similar to TAF1D expression were mainly enriched in KEGG_RIBOSOME, KEGG_PARKINSONS_DISEASE, KEGG_SPLICEOSOME, KEGG_OXIDATIVE_PHOSPHORYLATION, and KEGG_HUNTINGTONS_DISEASE (Fig. 4C). GSVA analysis results showed that the pathways KEGG_REGULATION_OF_AUTOPHAGY, KEGG_FOCAL_ADHESION, KEGG_RENAL_CELL_CARCINOMA, KEGG_INSULIN_SIGNALING_PATHWAY, and KEGG_VASCULAR_SMOOTH_MUSCLE_CONTRACTION were downregulated in the RP group, while KEGG_STEROID_HORMONE_BIOSYNTHESIS, KEGG_TERPENOID_BACKBONE_BIOSYNTHESIS, and KEGG_GLYCOSAMINOGLYCAN_BIOSYNTHESIS_KERATAN_SULFATE pathways were upregulated in the RP group (Fig. 4D, Table S6).

Fig. 4
figure 4

Single-gene GSEA and GSVA of hub genes. (A) Single-gene GSEA of BFSP1. (B) Single-gene GSEA of LONRF1. (C) Single-gene GSEA of TAF1D. (D) GSVA of RP.

Signaling pathways of key genes

We used GSVA to investigate the differences between RP patients and control groups across 50 Hallmark signaling pathways. In RP patients, the Hallmark signaling pathways HALLMARK_SPERMATOGENESIS and HALLMARK_KRAS_SIGNALING_DN were significantly upregulated (Fig. 5A). We analyzed the correlation between the three most significantly differentially expressed hub genes and the 50 Hallmark signaling pathways (Fig. 5B). The results showed that BFSP1 was negatively correlated with most of the Hallmark signaling pathways.

Fig. 5
figure 5

Correlation between hub genes and 50 Hallmark signaling pathways. (A) Comparison of Hallmark signaling pathways between the RP group and the control group. (B) Correlation between hub genes and 50 Hallmark signaling pathways.

Construction and validation of the diagnostic model

We constructed a diagnostic nomogram model for RP using hub genes (Fig. 6A) and evaluated its predictive ability using ROC curves (An AUC between 0.5 and 0.7 is considered normal; 0.7 to 0.85 is good; 0.85 to 0.9 is very good; and 1 indicates perfection). The results indicate that the diagnostic model is accurate (Fig. 6B, AUC > 0.7). We also assessed the diagnostic performance of the three hub genes using ROC curves (Fig. 6C-E), which further confirmed the model’s accuracy. It is evident that all hub genes have good diagnostic value (AUC > 0.7).

Fig. 6
figure 6

Construction and Validation of the Diagnostic Model. (A) Nomogram for predicting RP. (B) ROC curve evaluating the predictive ability of the nomogram model. (C) ROC curve evaluating the predictive ability of BFSP1. (D) ROC curve evaluating the predictive ability of LONRF1. (E) ROC curve evaluating the predictive ability of TAF1D.

Immune infiltration analysis

We used the ssGSEA method to further investigate the infiltration of 28 immune cell types in the RP and control groups. First, we presented the proportion of infiltrating immune cells using a stacked bar plot (Fig. 7A) and displayed the infiltration levels of different types of immune cells between the RP and control groups using box plots (P < 0.05, Fig. 7B). The results showed significant differences in CD56dim natural killer cells and Memory B cells between the RP and control groups. Additionally, we assessed the significant correlations between hub genes and the corresponding immune cells, and these correlations are presented in a scatter plot (Fig. 7C).

Fig. 7
figure 7

Immune Infiltration Levels between RP and Control Groups. (A) Stacked bar plot showing the relative proportions of infiltrating immune cells. (B) Box plot of estimated proportions of immune cells between the RP and control groups. Asterisks indicate p-values: ****p < 0.0001, ***p < 0.001, **p < 0.01, *p < 0.05. (C) Scatter plot showing the correlation between hub genes and immune cells.

RBP-mRNA

Due to the interaction between RNA binding proteins (RBPs) and mRNA, we utilized the StarBase online database to search for and download the mRNA/RBP pairs corresponding to hub genes. Based on the relationships provided by the online dataset, we constructed an RBP-mRNA network consisting of 177 nodes, 174 RBPs, 3 mRNAs, and 246 edges (Fig. 8, Table S7).

Fig. 8
figure 8

RBP-mRNA Network. RBP-mRNA regulatory network of hub genes. Light blue represents RBPs, and orange represents mRNAs.

CERNA

To elucidate the potential molecular mechanisms of hub genes, we constructed an mRNA–miRNA–lncRNA interaction network. Target miRNAs were identified through LONRF1, and a network diagram was constructed using Cytoscape to visualize the lncRNAs corresponding to these target miRNAs (Fig. 9, Table S8).

Fig. 9
figure 9

ceRNA Network. Network diagram of lncRNA–miRNA–mRNA interactions for hub genes. Orange-yellow represents lncRNAs, orange-red represents mRNAs, and light blue represents miRNAs.

Potential drugs and molecular docking

Using key genes as potential drug targets for RP treatment, we identified possible effective drugs for the key gene BFSP1 from DGIdb. We obtained the AlphaFold-predicted protein structure of BFSP1 from the PDB database and performed molecular docking. The molecular docking results showed a binding energy of less than − 4 (Table 1), indicating a strong interaction between the predicted protein and the drug molecules, which may have therapeutic potential in RP. The results of the molecular docking were visualized (Fig. 10).

Table 1 Basic Information on Molecular Docking of predicted drug compound 111 with predicted target protein.
Fig. 10
figure 10

Potential Drugs and Molecular Docking. Molecular docking of the drug compound COMPOUND 111 with the predicted target protein BFSP1.

Discussion

Kidney stones represent a globally prevalent condition with an escalating incidence annually, resulting in considerable pain and economic burden for patients. Current treatment modalities exhibit certain limitations. Investigating RP may yield novel strategies for the early prevention and personalized management of kidney stones.

Through WGCNA analysis, we successfully established a connection between oxidative stress and kidney stones and identified 152 oxidative stress-related genes associated with kidney stones. These genes are considered the primary targets driven by oxidative stress in kidney stones. Furthermore, we intersected these genes with significantly altered targets before and after kidney stone formation (DEGs). The resulting targets, identified as BFSP1, LONRF1, and TAF1D, are considered the most critical factors by which OS contributes to. These genes participate in cellular responses to oxidative stress through multiple keys signaling pathways, thereby influencing the formation of RP.

RP are composed of CaP deposits mixed with collagen fibers, membrane-bound vesicles, some unidentified fibrous materials, and other cellular degradation products6. These calcium deposits are thought to be the origin of calcium oxalate stone formation. When urine is excessively saturated with minerals, crystals in the kidneys begin to form, grow, aggregate, and persist, with the area where RP are located considered an ideal site for calcium oxalate stone growth7,8,9. As a precursor to kidney stone formation, RP has a formation mechanism related to various factors, one of which is oxidative stress. ROS plays a crucial role not only in regulating intercellular and intracellular signaling but also in immune modulation, the regulation of phagocytic cell (neutrophils, plasma cells, macrophages) activity, and can lead to cellular senescence and apoptosis3. Under conditions of urine saturation, exposure of renal epithelium to crystals may lead to the generation of reactive oxygen species (ROS), which in turn activate inflammasomes and trigger a cascade of inflammatory responses. Inhibiting the production of ROS and the activation of inflammasomes can reduce crystal deposition in animal models10.Overall, oxidative stress increases the risk of kidney stones through mechanisms such as promoting ROS production, causing cellular damage and inflammation, and facilitating crystal nucleation and growth11 .

Umekawa et al.12. demonstrated that increases in calcium or oxalate in the tubular fluid may activate the renin-angiotensin system (RAS). Particularly, the involvement of angiotensin II can activate NAD(P)H oxidase, inducing epithelial cells to increase ROS production, which in turn impairs endothelial function13. In our study, BFSP1 expression was upregulated in the RP group. GSEA analysis revealed that genes with expression patterns similar to BFSP1 were also enriched in pathways related to the RAS signaling system. We hypothesize that these genes play roles within this pathway. Exposure to CaOx-induced oxidative stress is also characterized by a loss of mitochondrial membrane potential and a decrease in glutathione, leading to the release of cytochrome c into the cytoplasm and resulting in apoptosis, which causes epithelial cell damage14. We found that TAF1D and LONF1 are downregulated in RP group, while TAF1D and LONF1 are enriched in oxidative phosphorylation pathway, which may accelerate or induce oxidative stress process through the influence of anti-oxidative defense system, leading to cell damage and death. TAF1D is a component of the transcription factor SL1/TIF-IB complex, which stabilizes the rDNA promoter-binding factor during RNA polymerase I-dependent transcription and is involved in cell cycle regulation15. As a transcription factor, the TAF1D gene may be involved in regulating the transcription of antioxidant enzyme genes. These antioxidant enzymes play a crucial role in scavenging free radicals and mitigating oxidative stress damage in the body. Under oxidative stress conditions, cell cycle regulation is abnormal16, which can result in unrepairable damage to renal epithelium17. Li et al.18 found that in cells with high expression of LONRF1 under pathological conditions, activation of NF-kB and p53 signaling pathways occurred. Inhibition of IFNa, IFNg, and proteasome signaling were also observed. They also found that high expressions of LONRF1 inhibited the protective effect of HMOX1 on cells. HMOX1 is a gene that encodes a major antioxidant enzyme involved in heme group degradation regulated by the KAEP1-NRF2 system under oxidative stress. These results suggest that LONRF1 may play a beneficial role in defending against oxidative stress. At the same time LONRF1 may play a crucial role in linking oxidative damage responses and tissue remodeling during wound healing in both aging and non-aging cells in different patterns. Autophagy regulation is a key mechanism by which renal cells respond to oxidative stress19,20. Both LONRF1 and TAF1D are enriched in the autophagy signaling pathway, and their downregulation may disrupt autophagy regulation, reducing the clearance of damaged organelles and ROS, thereby exacerbating oxidative stress.

Osteogenic changes in renal epithelial cells are also one of the key factors in RP formation. Jia et al.21,22 observed osteogenic changes in renal epithelium exposed to a high urinary calcium environment in their study. Priante et al.23 induced CaP deposition and cell osteogenesis by exposing renal tubular epithelial cells to osteogenic culture medium. During the formation of RP, crystals attached to the renal tubular lumen are transferred to the interstitium by renal tubular epithelial cells through endocytosis. This process brings them into contact with macrophages, leading to macrophage activation and the generation of a pro-inflammatory phenotype(M2). During chronic inflammation, calcifying vesicles produced by pro-inflammatory macrophages(M2) and epithelial cells promote collagen deposition, contributing to RP formation11. Yu L et al.24 found that DCN, LUM, and P4HA2 are downregulated in RP tissue. DCN and LUM are associated with hydroxyapatite interaction proteins in bone, where DCN inhibits hydroxyapatite-induced crystal growth. The reduction in P4HA2 levels may relieve the restriction on HIF-1α expression, promoting tubular interstitial fibrosis, thereby accelerating crystal deposition and epithelial ossification changes. TAF1D influences the PI3K/AKT/mTOR signaling pathway, thereby affecting the cell cycle and epithelial-to-mesenchymal transition (EMT) processes25and are also one of the manifestations of epithelial cell ossification. These processes are closely related to the intracellular transport and metabolism of minerals, BFSP1 plays a role in regulating ciliary length and is closely linked to cellular processes such as primary cilium biogenesis, cytoskeletal organization, vesicle transport, and cell-cell adhesion26. BFSP1 may be involved in RP formation through these mechanisms. Research by Wang B et al.27 has also confirmed that BFSP1 is playing a crucial role in maintaining cell structure and function, including cytoskeleton formation, cell morphology, migration, and differentiation28. The study by Tapodi, A. et al.29 confirmed that BFSP1 can regulate the permeability of water channel protein 0 (AQP0) through CA2 + signaling. We hypothesize that it may regulate renal water channel proteins (AQP1, AQP2, etc.) through the same pathway to influence water and sodium reabsorption, thereby promoting stone formation. This hypothesis requires further experimental validation.

Additionally, kidney stones are also believed to be related to immunity and inflammation. Taguchi, K. et al.30 found that LCN2, IL11, PTGS1, GPX3, and MMD are upregulated in RP tissue. These genes are associated with the mitogen-activated protein kinase and Akt-phosphoinositide 3 kinase pathways, as well as pro-inflammatory factors. Hong, SY et al.31 found that SPP1 is significantly upregulated in kidney stone patients. SPP1 is an important chemotactic factor for macrophages, dendritic cells, and T cells. Crystal deposition-stimulated RTECs can secrete SPP1 to induce macrophage migration and phagocytosis. Suen et al.32 found that IL-6, IL-8/CXCL8, RANTES/CCL5, MCP-1/CCL2, Mig/CXCL9, and IP-10/CXCL10 are significantly elevated in the urine of stone patients compared to healthy individuals. These chemokines/cytokines can be secreted by renal tubular epithelial cells, renal tubular cells, and specific immune cell subsets (macrophages, NK cells, monocytes, T cells, B cells, and neutrophils). In the context of oxidative stress, surrounding RP, there is a downregulation of classic pro-inflammatory macrophages (M1) and alternatively activated anti-inflammatory macrophages (M2) in renal tissue. Taguchi et al.33. performed microarray analysis of gene expression in the renal papillary tissue of kidney stone patients, which showed upregulation of genes associated with the inflammatory macrophage phenotype (M1) and downregulation of genes associated with the anti-inflammatory macrophage phenotype (M2). Through immune infiltration analysis, we found significant differences in the immune infiltration levels of CD56dim natural killer cells and Memory B cells between the RP group and the control group. At the same time, we found that the downregulation of LONRF1 and TAF1D, and the upregulation of BFSP1 in RP, correlate with their abnormal distribution. ROS can activate transcription factors via the P38 mitogen-activated protein kinase (MAPK)/JNK pathway, including the activation of nuclear factor kappa B (NF-κB), activator protein 1 (AP-1), and growth factors, thereby regulating the inflammatory response. Under CaOx stimulation, NK cells can induce tissue cells to produce IP-10/CXCL10 and Mig/CXCL9, which then guide activated T cells back to the inflamed tissues34. MCP-1/CCL2 can recruit monocytes, memory T cells, and NK cells, contributing to the immune response35. Through our analysis of the correlation between hub genes and immune cells, we found that the upregulation of BFSP1 in RP plaques is positively correlated with macrophages, while the downregulation of LONRF1 shows a negative correlation with macrophages. Although this finding is consistent with the study by Taguchi et al.33, the difference in macrophages between the two groups did not emerge, which may be due to the small sample size.

In recent years, the intersection of kidney stones and other metabolic diseases has been an area of ongoing research in the medical field. The study by Shen, S. et al.36 identified shared biomarkers and pathways in kidney stones and diabetes. In their study, they found that hub genes were significantly enriched in KEGG signaling pathways, particularly those related to fructose and mannose metabolism, and cell adhesion molecules. In our study, we also found overlapping results in certain pathways (such as insulin signaling, smooth vascular muscle contraction, and terpenoid backbone synthesis). This also indirectly supports the correlation between the presence of shared genetic factors and organ manifestations.

Early identification and intervention of kidney stones have a significant impact on the patient’s prognosis. In this study, we developed diagnostic models for RP utilizing three hub genes, which exhibited significant diagnostic value and hold promise for the early detection of RP. Nevertheless, their efficacy requires validation with a larger cohort of clinical samples.

In the treatment of kidney stones, the main clinical approaches currently include fluid intake management, dietary management, surgical removal, or medication to reduce urinary crystal saturation. However, the high prevalence and recurrence rates of kidney stones remain a major challenge. In recent years, there has been increasing research on new targets for kidney stones. Shen, S. et al.36 proposed the feasibility of resveratrol therapy through studying the shared pathways between diabetes and kidney stones. McCormick N et al.37 confirmed the effectiveness of sodium-glucose cotransporter-2 inhibitors (SGLT-2) in treating recurrent stone patients through target trial emulation studies. In our study, we targeted BFSP1 as a potential therapeutic target for kidney stones and found that COMPOUND 111 strongly binds to it. We identified that COMPOUND 111 may possess therapeutic potential in RP; however, further investigations are necessary to substantiate this finding.

Our study has inherent limitations due to the limited datasets available for kidney stones and the lack of external validation. Additionally, in vivo and in vitro studies are needed to investigate the potential mechanisms of hub genes and immune cells in kidney stones for future clinical translation.

Conclusion

Our investigation revealed that three pivotal genes, BFSP1, TAF1D, and LONF1, may play crucial roles in the formation of RP induced by oxidative stress. Furthermore, we observed significant disparities in CD56dim natural killer cells and Memory B cells between the two cohorts, which could contribute to disease progression. Additionally, we developed an efficient diagnostic model and analyzed its potential therapeutic targets. These findings provide novel perspectives and insights into the molecular mechanisms underlying RP as well as its early diagnosis. Further experimental and clinical validation is necessary to confirm the reliability and applicability of these results.

Materials and methods

Data collection

The renal tissue expression profile datasets of patients with pre-stone plaques (Randall’s plaques, RP) were downloaded from the GEO database (Gene Expression Omnibus, https://www.ncbi.nlm.nih.gov/geo/): GSE117518, with the sequencing platform GPL21827 Agilent-079487 Arraystar Human LncRNA microarray V4 (Probe Name version); and GSE73680, with the sequencing platform GPL17077 Agilent-039494 SurePrint G3 Human GE v2 8 × 60 K Microarray 039381 (Probe Name version). For GSE73680, 29 plaque samples from diseased tissues and 6 non-diseased control tissue samples were included in this study, while 27 normal tissue samples from stone patients were excluded. In GSE117518, 3 plaque samples from diseased tissues and 3 non-diseased control tissue samples were included. We combined the human GEO dataset samples and used the R package sva (version 3.42.0)5 to remove batch effects for subsequent analysis. Oxidative stress (OS)-related genes were downloaded from the Msigdb database (http://www.gseamsigdb.org/gsea/msigdb/cards) (Table S1).

WGCNA analysis

To construct the co-expression network, we performed Weighted Gene Co-Expression Network Analysis (WGCNA) using the WGCNA R package (version 1.70-3)38. We assessed the similarity of gene expression profiles by calculating Pearson correlation coefficients and then weighted the correlation coefficients between genes using a power function to obtain a scale-free network. The function “PickSoftThreshold” was used to raise the co-expression similarity to a power of β = 7 to establish a weighted adjacency matrix. We then conducted hierarchical clustering analysis based on weighted correlations, and the clustering results were segmented according to predefined criteria to obtain different gene modules, which were represented by branches and different colors on the clustering tree. Next, we calculated the correlation between gene modules and phenotypes, identified trait-related modules, and performed correlation analysis.

Oxidative stress-related differentially expressed genes

To identify genes with expression dysregulation between plaque samples and control samples, we conducted differential analysis using the R package limma (version 3.50.0)39, which resulted in the identification of differentially expressed genes (DEGs) between plaque samples and control samples. The genes selected based on |logFC|> 0.5 and P value < 0.05 criteria were used as differentially expressed genes for further investigation. To identify oxidative stress-related differentially expressed genes significantly associated with plaques, we intersected the differentially expressed genes obtained from the differential analysis with the module genes most associated with oxidative stress identified in the WGCNA analysis. The intersected genes are considered oxidative stress-related differentially expressed genes (hub genes).

GO enrichment analysis

Gene Ontology (GO) analysis is a commonly used method for large-scale functional enrichment studies, encompassing biological process (BP), molecular function (MF), and cellular component (CC)40. We conducted GO enrichment analysis on target genes using the R package clusterProfiler (version 4.2.2)41. The criteria for entry selection were a P value < 0.05 and an FDR (q value) < 0.2, with P value correction performed using the Benjamini-Hochberg (BH) method.

Co-expression gene network

The GeneMANIA website (http://genemania.org) predicts relationships between functionally similar genes and target genes, including protein-protein interactions, protein-DNA interactions, pathways, physiological and biochemical responses, co-expression, and co-localization42. We constructed a protein-protein interaction (PPI) network for the hub genes using the GeneMANIA website. We then performed GO analysis on the hub genes using the R package “clusterProfiler.”

GSVA analysis

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised algorithm. We downloaded the c2.cp.kegg.v7.5.1.symbols.gmt gene set from the MSigDB (Molecular Signatures Database) and used the “gsva” method from the R package GSVA (version 1.42.0) to analyze plaque expression data. GSVA transforms gene expression data from a matrix of single genes as features into a matrix of specific gene sets as features. For each gene set corresponding to a pathway43,44,45, we performed rank statistics similar to the K-S (Kolmogorov-Smirnov) test to convert the expression matrix into an enrichment score (ES) matrix for the pathways, obtaining GSVA enrichment scores for each pathway in each sample. We then used the R package limma (version 3.50.0) to analyze pathways with significant differences (p value < 0.05), evaluated pathway activity scores for the RP and control groups, and conducted differential analysis. Finally, we generated heatmaps for the top pathways in each group based on p values.

Single-gene GSEA analysis

Gene Set Enrichment Analysis (GSEA) is used to evaluate the distribution trend of genes within a predefined gene set in a gene list ranked by phenotype correlation, thereby assessing their contribution to the phenotype46. Based on differences in hub gene expression, we grouped the samples and obtained differential analysis results related to the gene. In this study, we used the R package clusterProfiler to perform enrichment analysis on all genes ranked by logFC. The parameters for the GSEA enrichment analysis were as follows: 1000 permutations, a minimum of 10 genes and a maximum of 500 genes per gene set, with P value correction using the Benjamini-Hochberg (BH) method. We obtained the c2.cp.kegg.v7.2.symbols gene set from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb/)46,47,48, with a selection criterion for significant enrichment set at P value < 0.05.

Construction and validation of diagnostic nomograms

We constructed a nomogram for RP diagnosis using the R package rms (version 6.5-0). We constructed a nomogram for RP diagnosis using the R package rms (version 6.5-0). We calculated the subtype prediction score based on the expression of key genes, with the total score defined as the sum of the individual gene scores. We assessed the diagnostic value of the nomogram for RP using ROC curves.

Receiver operating characteristic (ROC) curve

The receiver operating characteristic (ROC) curve is an effective method for evaluating the performance of diagnostic tests. The ROC curve is a composite indicator reflecting sensitivity and specificity as continuous variables, illustrating the relationship between sensitivity and specificity through graphical representation. The most common metric is the area under the curve (AUC), which is obtained from the sensitivity and specificity operational characteristic plot of the subjects. We used the R package “pROC” (version 1.18.0) to create ROC curves and determine the area under the curve (AUC) for evaluating the diagnostic value of key genes49. The area under the ROC curve typically ranges from 0.5 to 1, with an AUC closer to 1 indicating better diagnostic performance.

Immune infiltration analysis

Single-sample Gene Set Enrichment Analysis (ssGSEA) is an extension of Gene Set Enrichment Analysis (GSEA) that calculates separate enrichment scores for each sample and gene set50. Each ssGSEA enrichment score reflects the coordinated upregulation or downregulation of genes within a specific gene set in a sample. ssGSEA is a variant of the GSEA algorithm that calculates enrichment scores for each sample and gene set pair, rather than for sample groups (such as controls and disease groups) and gene sets (such as pathways). Based on immune cell marker gene data downloaded from the Disease and Immune System Interaction Database (http://cis.hku.hk/TISIDB/index.php, TISIDB)51, including Activated CD8 T cell, Central memory CD8 T cell, Effector memory CD8 T cell, Activated CD4 T cell, Central memory CD4 T cell, Effector memory CD4 T cell, T follicular helper cell, Gamma delta T cell, Type 1 T helper cell, Type 17 T helper cell, Type 2 T helper cell, Regulatory T cell, Activated B cell, Immature B cell, Memory B cell, Natural killer cell, CD56bright natural killer cell, CD56dim natural killer cell, Myeloid-derived suppressor cell, Natural killer T cell, Activated dendritic cell, Plasmacytoid dendritic cell, Immature dendritic cell, Macrophage, Eosinophil, Mast cell, Monocyte, Neutrophil, we calculated the relative enrichment scores for each immune cell type from the gene expression profiles of each sample. We used the R package “ggplot2” (version 3.3.6) to plot changes in immune cell infiltration levels between disease and control samples52.

ceRNA network construction

Since the mechanism of competing endogenous RNA (ceRNA) in RP remains unclear, we used the miRTarBase (https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2022/php/index.php)53, starbase2.0 (https://starbase.sysu.edu.cn/starbase2/index.php)54, and miRDB (https://mirdb.org/index.html) databases to perform reverse prediction of microRNAs for key genes and to predict the long non-coding RNAs (LncRNAs) that share microRNAs with the key genes, ultimately constructing the ceRNA network.

RBP-mRNA network construction

This study utilized the widely used open-source platform StarBase (https://starbase.sysu.edu.cn/tutorialAPI.php#RBPTarget) to analyze ncRNA interactions using CLIP-seq, degradome-seq, and RNA-RNA interaction data to investigate the association between mRNA and RNA-binding proteins (RBPs). A P value < 0.05 and cluster number ≥ 5 were defined as cutoff criteria for identifying key mRNA-RBP pairs. Subsequently, the RBP-mRNA network was constructed using Cytoscape.

Drug prediction

In this study, we utilized the Drug-Gene Interaction Database (DGIdb, http://www.dgidb.org/) to explore and collect drug-gene interaction information for specific genes. DGIdb is a database that integrates various publicly available and licensed drug-gene interaction data, playing a crucial role in drug target research and personalized medicine. We downloaded the primary components of drugs from PubChem and the PDB files of target proteins from the PDB database. Molecular docking was performed using the file format conversion tool OpenBabel (version 2.4.1), the molecular docking tool AutoDockTools (version 1.5.7), and the visualization tool PyMOL (version 2.5.0).

Statistical analysis

Statistical analysis in this study was performed using R software version 4.1.2. Spearman correlation tests were used to infer the relationship between two parameters. The Wilcoxon test was used to compare differences between two groups, while the Kruskal-Wallis test was used to compare differences among three or more groups. A two-sided P value of less than 0.05 was considered statistically significant.