Abstract
Clear cell renal cell carcinoma (ccRCC) represents a prevalent malignant kidney tumor characterized by high metastatic potential and recurrence rates. Investigations into the molecular mechanisms and therapeutic targets of ccRCC have provided novel directions for early diagnosis and targeted therapy. The CXCR4 gene plays a pivotal role in tumor progression and metastasis. In this study, this study first conducted differential analysis and WGCNA analysis on the dataset, and performed GO and KEGG analysis on the differentially expressed genes, revealing key biological processes and signaling pathways, particularly in cell adhesion and immune regulation. Based on WGCNA and cross genes of differentially expressed genes, 50 central genes were identified through PPI analysis. LASSO, SVM-RFE, and RF algorithms are used to identify the key gene CXCR4 associated with ccRCC. Survival analysis revealed that high CXCR4 expression correlates with poor patient prognosis. Immune analysis highlighted differences in immune cell distribution across ccRCC subtypes, emphasizing the importance of gene-immune interactions in disease pathogenesis. Finally, validation was conducted in vitro using techniques such as PCR and Western blot. Our findings position CXCR4 as a potential preventive gene, offering new theoretical insights for early diagnosis and targeted treatment of ccRCC.
Similar content being viewed by others
Introduction
Renal cell carcinoma (RCC) mainly consists of 3 types: clear cell carcinoma, papillary cell carcinoma and chromophobe cell carcinoma. Among them, Clear cell renal cell carcinoma (ccRCC) is the main histopathological subtype of renal cell carcinoma, accounting for 70–80% of all renal cell carcinoma cases1. According to global epidemiological data, more than 400,000 new cases are diagnosed each year, making it one of the top ten most prevalent malignancies worldwide2. Surgery is the most common treatment for early-stage kidney cancer, and doctors will do their best to preserve the patient’s kidney function without removing the tumor. However, not all patients meet the surgical indications or benefit from surgery. In clinical practice, early disease concealment often delays diagnosis to late stage. Notably, the tumor microenvironment (TME) plays a pivotal role in ccRCC progression and therapy resistance. The immunosuppressive TME, characterized by infiltrating immune cells (e.g., Tregs, MDSCs)3, cytokine networks, and metabolic reprogramming, fosters tumor immune evasion and limits treatment. In recent years, immunotherapy and targeted therapy have gradually become an important auxiliary role in the treatment of kidney cancer. In recent years, despite significant advances in therapeutic innovation for ccRCC, there are still significant clinical challenges: ccRCC has a high metastasis rate and is not sensitive to conventional chemoradiotherapy, which leads to a poor prognosis for ccRCC4. The current treatment dilemma leads to the poor prognosis of advanced patients, and studies have shown that the 5-year survival rate of advanced patients has decreased significantly5. Recent work highlights that TME-derived factors (e.g., hypoxia, extracellular matrix remodeling) further drive ccRCC aggressiveness and metastatic niche formation6. Therefore, there is a great need to develop early and accurate detection methods, as well as to develop reliable molecular biomarkers to support the development of diagnostic and therapeutic targets.3 6 9.
The pathogenesis of ccRCC is not fully understood. However, previous studies have shown that modifiable risk factors (including smoking, obesity, and high blood pressure) and genetic factors are important factors in the development of ccRCC, and there may be complex interactions between them. It is worth noting that mutations in the von hipell-lindau (VHL) gene play an important role in disease pathogenesis7. It drives tumor progression through HIF stabilization and subsequent activation of pro-angiogenesis and pro-survival pathways8. Critically, VHL-HIF axis dysfunction also reshapes the TME by promoting angiogenesis and immunosuppressive cell recruitment, creating a permissive niche for ccRCC progression9. Recently, relevant mechanism studies have clarified the key role of CXCR4 in the progression of ccRCC10. As a G protein-coupled receptor, CXCR4 interacts with its homologous ligand SDF-1 to coordinate a variety of carcinogenic processes, including pathways from tumor cell migration to immune system escape11,12. The new evidence suggests that CXCR4/SDF-1 signaling axis modulates TME dynamics by recruiting immunosuppressive cells and facilitating metastatic dissemination, linking it directly to poor clinical outcomes3. Relevant clinical studies have shown that increased CXCR4 expression can not only predict the reduced survival of various malignant tumors (including ccRCC), but also reflect the higher risk of metastasis13. Given these findings, CXCR4 has emerged as a promising target with important implications for preventing tumor metastasis and optimizing treatment strategies.
In recent years, integrated bioinformatics methods based on multiple data queues, multiple recombination dimensions, and multiple analysis methods have greatly facilitated the research of ccRCC. The study confirmed that important methods such as machine learning algorithms, weighted gene Coexpression network analysis (WGCNA), and protein interaction mapping can play an important role in deciphering the molecular structure of the disease. For example, WGCNA can reveal key biological pathways in tumor evolution by identifying phenotypically associated gene clusters12. Notably, these approaches have been instrumental in dissecting TME-specific gene signatures and their crosstalk with tumor cells14. In addition, the approach of machine learning models can synthesize multidimensional data to predict treatment responses to identify new therapeutic targets.At present, research on TME is emerging like a spring shoot in prognosis. These representative TME studies will help us to position our research results in a broader field of tumor biology, highlighting the correlation between immune gene interactions, such as the interaction between CXCR4 and immune infiltration15,16,17.
In our study, we systematically explored the molecular landscape of ccRCC and the results showed that CXCR4 is the central regulatory hub. Subsequently, by integrating bioinformatics strategies, including differential expression analysis, functional pathway analysis, and machine learning algorithms, we identified CXCR4’s dual potential as a diagnostic biomarker and therapeutic target. At the same time, this study, combined with the characteristics of the immune microenvironment, provides new insights into the pathogenesis of ccRCC and provides a theoretical basis for the development of targeted therapies.
Materials and methods
Data collection and pre-processing
From the cancer genome atlas (TCGA, https://portal.gdc.cancer.gov/) in the database, 539 cases of ccRCC adjacent normal tissue specimens and 72 cases of were matched the transcriptome data and corresponding clinical data. All sequencing data were converted into Transcripts Per Million (TPM) format for subsequent analysis, and samples with missing information were eliminated. If there are multiple lines for a gene, the multiple lines of data for that gene are averaged to ensure the uniqueness of the gene record. If there are large outliers in the data, log2 is applied to the overall data matrix to improve the data distribution. The above operations are done with the assistance of packages such as “limma” and “edgeR” from R software. The analysis meets TCGA requirements and therefore does not require additional ethical approval.
Differential expression and enrichment analysis
Using the “limma” R package to screen for differentially expressed genes (DEGs) in cancer tissue and healthy kidney tissue. The threshold is |logFC| > 1, and the adjusted p value is < 0.05. Subsequently, the differential gene expression patterns were visualized by volcano maps. Additionally, hierarchical clustering analysis was performed on the 50 most substantially induced and repressed transcripts, revealing distinct expression profiles via a color-coded thermal map. Functional annotation of molecular changes was conducted through integrated Gene Ontology classification and KEGG database mining18,19. The GO enrichment systematically classified DEGs according to three ontological domains: biological activities (BP), protein functionalities (MF), and subcellular localization (CC). Concurrently, KEGG computational mapping identified critical signal transduction cascades and metabolic networks dysregulated in tumor pathogenesis, with pathway significance determined by Benjamini-Hochberg corrected p-values below 0.05.
Weighted gene co-expression network analysis (WGCNA)
WGCNA was used to identify key modules of co-expressed genes associated with ccRCC. The R package “WGCNA” was employed, and the most appropriate soft power for building the co-expression network was determined by calculating the highest R² value (~ 0.9) in the scale-free topology model. Constructing a clustering tree structure of topological overlap matrix using hierarchical clustering method. The different branches of the clustering tree represent different gene modules, and different colors represent different modules. And the module with the strongest correlation to the disease phenotype was chosen for subsequent investigation.
Protein–protein interaction (PPI) network construction and analysis
The identified DEGs were analyzed to uncover potential protein-protein interactions (PPIs). The STRING database with a confidence threshold of 0.9 and Cytoscape software were employed for PPI network construction and visualization. Calculating the connectivity of each gene to identify the hub genes within the network. Based on the centrality and connectivity of these genes in the PPI network, the top 50 hub genes were screened out.
Machine learning models for feature selection and prognostic gene identification
Using three machine learning methods: LASSO regression, support vector machine (SVM) and Random Forest (RF) to analyze 50 hub genes from the PPI network. The aim of this step is to use the dual features of LASSO regression to identify prognostic biomarkers with both decline and characteristic systolic features. To verify the importance of features, SVM and RF algorithms were used for comparative gene sequencing, respectively. Then rigorously evaluating the prediction performance with 10x cross-validation and accuracy metrics to ensure the robustness of the method.
Gene expression validation and survival analysis
TCGA cohort as the primary validation platform was validated the expression patterns of candidate genes. Survival analysis incorporated Kaplan-Meier curves with log-rank testing for preliminary assessment, followed by multivariable Cox proportional hazards models adjusting for clinical covariates. To quantify diagnostic efficacy, we conducted time-dependent ROC curve analysis, calculating area under the curve (AUC) values at clinically relevant follow-up intervals. This multi-modal validation framework confirmed both prognostic stratification capacity and diagnostic discriminative power of the identified genomic signatures.
Nomogram construction and clinical correlation analysis
A nomogram was developed using the “rms” R package, and calibration curves were generated to assess the accuracy of the model at 1, 2, and 3 years. The nomogram was used to predict ccRCC prognosis based on autonomous survival predictors were delineated via dichotomous regression modeling in Cox proportional hazards framework. To evaluate the clinical relevance of the nomogram, decision curve analysis (DCA) was utilized.
Gene set enrichment analysis (GSEA)
GSEA was used to identify significantly enriched biological processes and pathways related to CXCR4 expression. GSEA software was used, with a False Discovery Rate (FDR) < 0.05 considered significant.
Immune cell infiltration and subtype analysis
Immune cell infiltration was assessed using Single Sample Gene Set Enrichment Analysis (ssGSEA). The relative abundance of various immune cell types in different ccRCC subtypes was calculated, and differences in immune cell distribution between subtypes and normal controls were examined. Subtype classification was performed based on the expression profiles of selected hub genes, with clinical relevance evaluated through the integration of clinical data.
Characteristic verification of CXCR4 in CcRCC
Total RNA was isolated from HK-2, ACHN, and Caki-1 cells using the TRIzol method. The extracted RNA (2 µg) was converted to cDNA using the TransScript One-Step gDNA Removal and cDNA Synthesis SuperMix Kit. The expression levels of CXCR4 and actin (internal control) were quantified using the abi Prism 7500 Rapid Real-Time PCR Device and TransStart Green qPCR SuperMix kit. Primer sequences are detailed in Table S1. Cell lysates were prepared with RIPA buffer, and protein samples were collected, separated by electrophoresis, and transferred to a PVDF membrane. After incubation with primary antibodies against CXCR4 (1:2000, Proteintech, 60042-1-Ig) and GAPDH (1:20000, Proteintech, 10494-1-AP), followed by secondary antibodies, protein bands were visualized using an ECL kit. Biomics Biotech synthesized nine siRNA sequences targeting prognostic genes, along with negative controls (NC) (Jiangsu, China). Sequences are listed in Supplementary Table S2. ACHN and Caki-1 cells were transfected using Lipofectamine 3000 reagent, following the manufacturer’s protocol. A wound healing assay assessed the migration ability of ACHN and Caki-1 cells. Cells were seeded into a 6-well plate, and at 95% confluence, a sterile pipette tip was used to scrape the cells. After PBS washes, cells were cultured in serum-free medium for 24 h. Results were analyzed using ImageJ software, comparing the migration area of each group. The invasion ability was evaluated using a Transwell assay. A cell suspension of 1 × 10^5 cells/mL was added to the upper chamber of the Transwell (8-µm pore size; Corning, New York, USA). DMEM with 10% fetal bovine serum was placed in the lower chamber and incubated for 24 h. After fixation with 4% paraformaldehyde for 30 min, cells were stained with 0.5% crystal violet for 20–30 min. Excess dye was removed with a cotton swab, and migratory cells were photographed under an inverted microscope (×200).
Statistical analysis
All statistical analyses were performed using R 4.2.1. Continuous variables were characterized by central tendency with dispersion metrics (µ ± σ, n = 3), while factorial group disparities underwent parametric omnibus testing (α = 0.05, 95% CI) employing Bonferroni-corrected post-hoc contrasts. Experiments were repeated at least three times. A two-sided p-value < 0.05 was considered significant, *p < 0.05, **p < 0.01, and ***p < 0.001.
Results
Transcriptomic profiling and pathway annotation of differentially regulated genes
A heatmap illustrating DEGs was generated using established techniques, uncovering concealed trends and pinpointing clusters of gene activity that could elucidate the processes causing dysregulation in ccRCC (Fig. 1A). The volcano plot results led to the identification of 2,739 DEGs (1,427 upregulated and 1,312 downregulated) showing significant expression changes between cancerous and healthy tissues. (Fig. 1C). Figure 1B presents a visual representation of the 50 most differentially expressed upregulated and downregulated genes, depicting contrasting expression patterns and offering valuable understanding of the altered patterns in gene activity linked to ccRCC. This figure exemplifies the results of a differential expression study. To uncover the potential molecular mechanisms underlying clear cell renal cell carcinoma (ccRCC), a detailed analysis of DEGs was performed, including GO and KEGG pathway analyses. Systems biology interrogation reveals hierarchical regulatory circuits in ccRCC pathogenesis, spanning organelle-specific dysfunction, epigenetic interactome rewiring, and tyrosine kinase pathway derailment. The findings were visualized using a circular plot (Fig. 1D).
An example of a differential expression study and Functional enrichment study based on DEGs. (A) Heatmap displaying the top fifty DEGs of up-down genes. (B) The 50 up-regulated and 50 down regulated genes with the largest differential changes are represented by heatmaps. (C) Volcano plots of DEG distributions contained within the TCGA dataset. (D) Functional enrichment analysis was visualized using circle plots. In this circle graph, the outer circle represents significantly enriched pathway IDS, and different colors represent different go (gene ontology) and KEGG (Kyoto Encyclopedia of genes and genomes) types. The second circle represents the proportion of target pathway genes in the total pathway genes. The color depth represents the significance p value. The darker the color, the smaller the p value, indicating the higher significance of enrichment. The third circle represents the proportion of genes enriched in the target pathway in the differential gene list to the total genes in the gene list, which can reflect the coverage of differential genes in the target pathway. The inner circle represents rich factor, which is the ratio of the number of enriched genes to the total number of genes, and can reflect the enrichment degree of the target pathway. (E) Functional enrichment analysis included KEGG pathway enrichment results and go term enrichment results of differentially upregulated genes, and KEGG pathway enrichment results and go term enrichment results of differentially downregulated genes.
Among the top BPs enriched by the DEGs, functions including control of T cell stimulation, enhancement of cellular binding, and management of leukocyte interactions, and leukocyte cell-cell adhesion were identified. These results offer enhanced insight into the biochemical pathways and biological processes that initiate and advance the development of ccRCC. Moving forward, the focus shifted to the CCs significantly enriched in the DEGs. Comprehensive enrichment analysis delineated critical CCs implicated in ccRCC pathogenesis, including lipid raft microdomains, specialized membrane compartments (outer surface, basolateral domain, and apical regions), and collagen-rich extracellular matrices. These spatially organized structures demonstrate a robust correlation between polarized cellular architecture and aberrant transcriptional activity in ccRCC. Particularly noteworthy is the coordinated dysregulation across membrane microdomains and extracellular matrix interactions, providing novel mechanistic insights into disease-specific molecular circuitry.
Comprehensive analysis of molecular functions enriched in differentially expressed genes elucidates their biological relevance in ccRCC pathogenesis. The circular visualization identifies six core functional categories: immune receptor activity, glycosaminoglycan binding, extracellular matrix structural constitution, heparin binding, cytokine receptor activity, and cytokine binding. These findings systematically characterize the molecular interplay governing aberrant gene regulation in ccRCC, providing a framework for subsequent mechanistic exploration. To establish comprehensive pathway associations, we conducted KEGG pathway analysis using established bioinformatics protocols. A detailed analysis revealed significantly enriched KEGG pathways in ccRCC. To provide a deeper understanding, enrichment analysis was performed separately for upregulated and downregulated DEGs. The key pathways for upregulating DEGs and downregulating DEGs enrichment are completely different. The key pathways for upregulating DEGs enrichment mainly focus on cell adhesion, cell factors, and immune cell differentiation. On the contrary, the key pathway for downregulating DEG enrichment mainly focuses on the metabolism of amino acids, as shown in Fig. 1E. These findings demonstrate the intricate interaction of signaling pathways and molecular processes that drive the initiation and development of ccRCC.
Network-driven prioritization of central regulatory nodes through weighted co-expression topology analysis
The Weighted Gene Co-expression Network Analysis (WGCNA) algorithm was employed to construct a weighted gene co-expression network (Fig. 2A). To balance biological relevance and statistical robustness, we selected a soft threshold power (β) that ensures the network approximates a scale-free topology—a hallmark of real-world biological networks where few highly connected ‘hub genes’ coordinate functional modules. Specifically, we tested a range of β values (1–20) and chose β = 15 because it achieved a scale-free fit index (R²) of ~ 0.9 (Fig. 2B), indicating that 90% of the network’s connectivity distribution follows the expected biological pattern. This threshold suppresses spurious weak correlations (e.g., noise or transient interactions) while preserving strong co-expression relationships likely reflecting shared regulatory mechanisms (e.g., transcription factor binding or pathway co-regulation). The adjacency matrix was then converted into a topological overlap matrix (TOM) to further emphasize genes with shared neighbors, enhancing module detection accuracy. Based on the TOM, 17 gene modules were identified, among which the magenta (1,193) and green (1,898) modules exhibited the highest correlation with disease traits (Fig. 2C, D). The results of the Venn diagram revealed 1,701 genes that intersected between normal and tumor DEGs and WGCNA (Fig. 2E). The STRING database was harnessed to create a protein-protein interaction (PPI) network based on 1,701 intersecting genes. The network consists of 1,699 nodes and 1,722 edges. Subsequently, the network was imported into Cytoscape for further analysis (Fig. 2F). Using the Cell Hub plugin of Cytoscape software (Fig. 2G), 50 hub genes related to ccRCC were identified. Figure 2 illustrates the analysis of the protein interactome interrogation and spotlighting pivotal molecular regulators.
Construction of weighted co-expression network and identifies hub genes. (A) Clustering dendrogram of TCGA dataset and heatmap of clinical trait. (B) Analysis of the scale-free fit index (left) and the mean connectivity (right) for various soft-thresholding power value. (C) The branches of the dendrogram clustered into 17 modules, each labeled with a unique color. (D) Heatmap showing the correlation between modules and feature gene sets. (E) Overlapping genes among the DEGs and key module genes associated with ccRCC progression were selected. (F) Overlapping genes. (G) The genes that serve as nodes in a PPI network.
Construction and validation of lasso, SVM, and RF models
Three predictive algorithms were constructed to select prioritize genes for ccRCC prognosis from the 50 PPI-associated DEGs. The results of Lasso regression analysis revealed 25 transcriptomic signatures significantly correlated with disease status (Fig. 3A). In the Random Forest (RF) algorithm, 25 feature genes were identified with relative importance scores greater than 1 (Fig. 3B).
Construction as well as the validation of the lasso model, the SVM model, and the RF model. (A) The correlation between the total number of trees in the random forest and the error rates. An order based on the relative significance of the genes. (B) A LASSO model’s cross-validation process for adjusting parameter selection. Each curve represents a single gene. LASSO analysis of the coefficients. Plotted at the best lambda are vertical dashed lines. (C) An SVM-RFE approach for the selection of feature genes.
Additionally, a powerful machine learning technique known as Support Vector Machine (SVM) was employed in our research to generate feature vectors representing the genetic characteristics and patterns of ccRCC. A thorough selection procedure was implemented to pinpoint key variables and genes strongly linked to ccRCC. To streamline the investigation and concentrate on highly impactful genes, the computational capabilities of SVM were utilized to remove less significant feature vectors. This approach included assessing and ranking the most essential variables related to ccRCC. Through complex data filtering and analysis, as well as ranking by AvgRank, genes closely associated with ccRCC were identified. The genes, discovered using SVM, have considerable potential for further study and analysis, as depicted in Fig. 3C. Figure 3 displays the development and validation of the Lasso, SVM, and RF models.
Transcriptomic feature identification
The top ten genes identified by three different machine learning models were selected, followed by cross-validation analysis. Consequently, the research was focused on a single ccRCC-related gene, CXCR4 (Fig. 4A). CXCR4 levels were markedly higher in the tumor samples relative to the control samples (Fig. 4B).
Identifying the genes responsible for characteristics. (A) A Venn diagram illustrating the feature genes that are common to LASSO, SVM-RFE, and RF. (B) CXCR4 expression. (C) Relationship between CXCR4 expression and survival time and survival status in TCGA data. (D) KM survival curve of CXCR4 in TCGA data in ccRCC. (E) ROC curves demonstrating the diagnostic utility of the gene as it pertains to the TCGA dataset.
The complex landscape of ccRCC prognosis was explored in our study to uncover the potential prognostic significance of CXCR4 expression. Figure 4C illustrates the relationship between CXCR4 expression, survival time, and survival status. An in-depth examination of survival information obtained from the TCGA dataset uncovered a notable association linking elevated CXCR4 levels in ccRCC cases with unfavorable clinical outcomes (Fig. 4D). This observation underscores CXCR4’s potential as a biomarker for assessing patient prognosis and forecasting disease advancement. Following this discovery, ROC analysis was employed to evaluate CXCR4’s diagnostic accuracy.
This analysis method allowed for the quantitative evaluation of CXCR4’s predictive performance and its ability to distinguish ccRCC patients from healthy individuals. Primary evaluation criterion centered on the area under curve metric, quantifying overall receiver operating characteristic efficacy. The thorough analysis yielded interesting results, showing that CXCR4 exhibited strong predictive value for medium-term survival, with an AUC value of 0.707 at 2 years and 0.602 at 3 years (Fig. 4E). These values suggest that CXCR4 has the potential to act as a dependable diagnostic indicator, offering crucial information about the existence and development. Figure 4 illustrates the identification of characteristic genes responsible for ccRCC.
Nomogram model for CcRCC prognosis: DCA and ROC analysis
Single and multiple variable assessments were performed to determine standalone predictors of prognosis. The results indicated that CXCR4 and TNM staging are independent prognostic factors for ccRCC (Fig. 5A, B). The “rms” R package was used to construct a nomogram model for predicting 1, 2, and 3years OS of ccRCC based on the characteristic gene CXCR4 (Fig. 5C). The calibration plot revealed a slight discrepancy between observed and estimated ccRCC risks, suggesting the nomogram model achieves strong precision (Fig. 5D). Furthermore, the results of the DCA revealed the potential benefits of the developed nomogram for ccRCC patients (Fig. 5E).
The independent prognostic analysis and development of the nomogram. (A) Univariate cox analysis of clinical characteristics in the TCGA cohort. (B) Multivariate cox analysis of clinical characteristics in the TCGA cohort. (C) Nomogram in the TCGA cohort. (D) ROC curves for predicting the overall survival in the TCGA cohort. (E) The decision curve for the 2-year overall survival rate.
Metabolic analysis of CXCR4 and clinical factors
To explore the biological roles and molecular mechanisms linked to CXCR4, an extensive GSEA was carried out. A detailed investigation of CXCR4 was undertaken to uncover critical information about the cellular activities and signaling networks affected by this gene. Figure 8 displays the enrichment outcomes related to the CXCR4 protein. The GSEA results provided a detailed view of the multiple pathways influenced by CXCR4 in ccRCC. Significant enrichment was observed in pathways such as the mitochondrial matrix, organic acid breakdown pathways, lipid degradation processes, single-carbon acid metabolism, and branched-chain amino acid processing (Fig. 6A). These results emphasize CXCR4’s involvement in these biochemical routes and provide a reasonable interpretation of its biological relevance in ccRCC.
Functional enrichment results for the CXCR4 protein. An examination of the top 5 routes using (A) GSEA. (B) Metabolic reprogramming enrichment based on C4(computational gene sets) .(C) Metabolic reprogramming enrichment based on C5(ontology gene sets).(D) Metabolic reprogramming enrichment based on Hallmark gene set.(E) The expression of CXCR4, and clinical features associated with prognosis.
Gene expression and clinical and immune analysis in ccRCC subtype analysis. (A) Consensus clustering of ccRCC samples from TCGA cohorts based on the hub genes. Consensus matrix for optimal k = 2. (B) Box plot demonstrates the expression differences of the 50 hub genes between the two subgroups. (C) Box plot demonstrates differences in immune infiltration between the two subgroups. (D) Distribution of clinical features in two cluster groups.
To investigate the relationship between CXCR4 and metabolic reprogramming, we analyzed different sets of human genes. Using C4 gene collection, we observed a significant enrichment of lipid metabolism related fibroblast programs and MYC related B cell metabolic pathways in non-malignant tissues. In contrast, the metabolic characteristics of renal epithelium in tumor samples were significantly activated (Fig. 6B). Through C5 ontology analysis, tumor tissue showed dysregulation in vitamin related signaling pathways, particularly involving vitamin B and D (Fig. 6C). The Hall marker set evaluation further demonstrated tumor specific activation of fatty acid and bile acid processing, heme catabolism, and exogenous detoxification mechanisms (Fig. 6D). These multi-level findings suggest that CXCR4 may coordinate the progression of ccRCC by regulating interconnected metabolic networks related to nutrient utilization and detoxification.
The disruption of these pathways, driven by CXCR4, adds further intricacy to the molecular mechanisms linked to ccRCC and opens opportunities for future research. Additionally, the relationship between CXCR4 expression and clinical pathological factors was depicted using a Sankey diagram, which demonstrated that high CXCR4 expression significantly affects survival outcomes (Fig. 6E).
Gene expression and clinical, immune analysis in CcRCC subtypes
Based on the key indicators of hub gene expression levels in the characteristic gene set, subtype analysis of ccRCC patients was performed. The results revealed that ccRCC patients were divided into two distinct subtypes, with significant differences in gene expression levels (Fig. 7A). Additional analysis revealed that all core genes displayed notably distinct expression levels when comparing the two subtypes with the normal samples (Fig. 7B). Notably, all genes were highly expressed in the C1 subtype, followed by the C2 subtype, with the lowest expression observed in the normal group. These significant differences in gene expression may constitute an important molecular basis for clinical features, disease progression, and treatment responses between the two subtypes.
Differential expression of risk genes and Cell Function Experiments in ccRCC cell lines. (A, B) Compared with HK-2 cells, the mRNA and protein levels of CXCR4 in ccRCC cells increased significantly. (C, D) Relative the mRNA and protein expression of CXCR4 in NC or CXCR4 knockdown groups. (E) Knockdown of CXCR4 can reduce the proliferation of ccRCC cells. (F, G) Knockdown of CXCR4 can reduce the migration and invasion ability of ccRCC cells. (p < 0.05 *; p < 0.01 **; p < 0.001 ***).
Upon expanding the focus to immune cell analysis, the ssGSEA results showed notable variations in immune cell distributions when comparing ccRCC subtypes with healthy controls (Fig. 7C). For instance, B cells, monocytes, and mast cells were significantly enriched in the normal group, whereas macrophages were notably enriched in both subtypes. Furthermore, T cell enrichment varied depending on the subtype of T cells. These observations reveal the fundamental significance of cellular immune factors in the classification of ccRCC subtypes, suggesting that the distribution and functional status of immune cell subsets may interact with gene expression differences to shape the distinct disease subtype characteristics of ccRCC patients.
Subtype analysis, when combined with clinical factors, indicated that the C1 subtype is associated with adverse clinicopathological characteristics, including advanced tumor grade and progression stage (Fig. 7D). In conclusion, these findings provide key insights into the gene-immune interaction network in the pathogenesis of ccRCC and open new directions toward developing precision therapy tailored to distinct disease subtypes. These findings provide a roadmap for developing subtype-specific immunotherapies targeting tumor-intrinsic signaling pathways and microenvironmental vulnerabilities.
Differential expression of risk genes and cell function experiments
We utilized qRT-PCR and western blotting to assess the expression of CXCR4 in ACHN and Caki-1 cells. Comparison with HK-2 cells revealed increased mRNA and protein expression of CXCR4 in tumor cells (Fig. 8A, B). qRT - PCR and WB confirmed the transfection efficiency of CXCR4. The results indicate that the knockout efficiency of these genes at the mRNA and protein levels can be used for subsequent experimental work (Fig. 8C, D). After knocking out CXCR4, the proliferation ability of ACHN and Caki-1 cells was significantly decreased as the CCK-8 assay result showed (Fig. 8E). We employed the Transwell assay to assess the invasive capacity of ACHN and Caki-1 cells. Compared to the NC group, the number of tumor cells that migrated through the chamber was significantly reduced in the si-CXCR4 group, suggesting that the loss or downregulation of CXCR4 may effectively inhibit the invasiveness of ccRCC cells (Fig. 8F). Compared with the NC group, after 12 and 24 h, the healing area of the si-CXCR4 groups was smaller, indicating that the knockout of CXCR4 effectively inhibited the migration ability (Fig. 8G).
Discussion
CXCR4, a G-protein-coupled receptor, is known to regulate a variety of physiological processes, including immune response, hematopoiesis, and tissue homeostasis20. In the context of ccRCC, CXCR4 plays an essential role in tumor progression by modulating cell migration, survival, and proliferation21. Our analysis revealed that the expression of CXCR4 in ccRCC cells promotes metastatic and proliferation behavior, which aligns with previous studies suggesting CXCR4’s role in promoting tumor aggressiveness in other malignancies22,23.
The understanding of CXCR4’s role in tumor biology, especially in ccRCC, remains incomplete. While previous studies have highlighted CXCR4 as a critical player in tumor metastasis, angiogenesis, and prognosis in various malignancies24,25,26,27, and the high expression of CXCR4 in ccRCC is closely linked to enhanced angiogenesis, which provides nutrients to growing tumors and further supports metastatic spread28,29. The specific molecular mechanisms and its relevance in ccRCC require deeper exploration. This study aimed to investigate screening process and the functional role of CXCR4 in ccRCC, various technical means such as transcriptome sequencing, protein interaction network construction, signal pathway enrichment analysis, and artificial intelligence algorithms are used30,31,32. For example: PPI network, functional annotation analysis, machine learning algorithms, and experimental data to uncover the molecular pathways through which CXCR4 influences disease progression. Through machine learning algorithms, we identified key genes and pathways associated with CXCR4 in ccRCC, shedding light on its potential as a prognostic biomarker. Our predictive models, trained on gene expression profiles, revealed that CXCR4 expression correlates with adverse clinical outcomes, including increased metastatic potential and reduced overall survival. Our findings suggest that CXCR4 significantly impacts key biological processes in ccRCC, including cell migration, metabolism, and immune infiltration, and is associated with poor prognosis. Notably, the expression of CXCR4 in ccRCC was found to be a key factor in metabolism and amino acid degradation processes, which play a crucial role in cancer cell survival and metastasis.
CXCR4’s role in ccRCC cannot be viewed in isolation but must be understood within the context of the tumor microenvironment. The tumor microenvironment is composed of immune cells, blood vessels, and extracellular matrix components, all of which influence tumor behavior and progression. The metabolic characteristics of TME can provide a favorable development environment for tumor survival and support resistance to anti-tumor therapy33,34. Immune combined metabolic analysis has become a promising research target for cancer treatment, and metabolic inhibitors and immune checkpoint blockade have become new methods for cancer treatment35. CXCR4, through its interaction with CXCL12 and other signaling molecules, modulates the recruitment of immune cells and endothelial cells to the tumor site, promoting angiogenesis and immune evasion. This complex interplay underscores the importance of considering CXCR4 not only as a receptor but as part of a broader molecular network that governs ccRCC progression.
Statistics show that there are about 431,000 new cases of kidney cancer worldwide every year, of which ccRCC accounts for 75%36,37. In the past few decades, thanks to the popularization of imaging techniques, the detection rate of ccRCC has continued to rise38. It is worth noting that although surgical resection is the standard treatment for local lesions, most ccRCC patients are already in the advanced stage when diagnosed, and the prognosis of metastatic cases is still unsatisfactory39. Given the crucial role of CXCR4 in the progression of ccRCC, targeting this receptor may be a promising therapeutic strategy. In this study, we demonstrated CXCR4 as a potential carcinogen in ccRCC. Therefore, we speculate that CXCR4 inhibitors may effectively block the activity of tumor cells, which is a key process in metastasis and tumor growth. Future treatments targeting CXCR4 should not only consider inhibiting it signaling pathway, but also the tumor microenvironment, as CXCR4 plays a role in the complex interaction network that supports tumor survival and progression.At present, the role of CXCR4 in tumors is partially focused on the CXCR4/CXCL12 cascade, which mainly interferes with the tumor microenvironment and plays a role in tumor immunotherapy. For example, prostate tumors, gastrointestinal tumors, etc40,41,42.
This study systematically analyzed the molecular characteristics of ccRCC and elucidated the central regulatory role of CXCR4 in the pathogenesis and development of the disease. Integrating machine learning and bioinformatics techniques, we uncovered a complex network of associations between gene expression patterns, immune response mechanisms, and clinical outcomes in ccRCC. These findings not only deepen the understanding of the molecular mechanism of ccRCC, but also provide a theoretical basis for the development of molecular targeted therapy strategies, while promoting precision medicine solutions based on individual genetic characteristics and immune status. Subsequent experimental validation and clinical translational studies are needed to further evaluate the clinical application value of CXCR4 as a therapeutic target and prognostic marker for ccRCC.
Conclusion
The research findings emphasize that CXCR4, as an oncogene, may play a crucial role in ccRCC by regulating metabolic pathways processes and immune microenvironment. The interruption of these pathways leads to the complexity of the ccRCC molecular mechanism, providing new avenues for future research on treatment strategies targeting CXCR4. However, our research still has shortcomings and limitations. Only the potential function of CXCR4 in ccRCC has been explored without delving into its specific mechanism of action. In addition, experimental verification has only been limited to in vitro experiments and has not been confirmed through tissue samples or in vivo experiments.
Data availability
The mRNA data of ccRCC samples is downloaded from the TCGA database. The data sets used and/or analyzed in the current study can be obtained from the corresponding authors according to reasonable requirements.And data is provided within the manuscript or supplementary information files.
References
Escudier, B. & Kataja, V. Renal cell carcinoma: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann. Oncol.: Off. J. Eur. Soc. Med. Oncol. 21 (suppl 5), v137–9. https://doi.org/10.1093/annonc/mdq206. (2010). PubMed PMID: 20555064.
Ferlay, J. et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer. 144 (8), 1941–1953. https://doi.org/10.1002/ijc.31937 (2019). PubMed PMID: 30350310.
Ye, B. et al. iMLGAM: integrated machine learning and genetic Algorithm-driven multiomics analysis for pan-cancer immunotherapy response prediction. Imeta 4 (2), e70011. https://doi.org/10.1002/imt2.70011 (2025). PMID: 40236779; PMCID: PMC11995183.
Motzer, R. J. et al. Nivolumab versus everolimus in advanced Renal-Cell carcinoma. N. Engl. J. Med. 373 (19), 1803–1813. https://doi.org/10.1056/NEJMoa1510665 (2015).
Kashima, S., Braun, D. A. & The Changing Landscape of Immunotherapy for Advanced Renal Cancer. Urologic. Clin. North. Am. 50(2):335–349. doi: https://doi.org/10.1016/j.ucl.2023.01.012. (2023). PMID: 36948676.
Ye, B. et al. Navigating the immune landscape with plasma cells: A pan-cancer signature for precision immunotherapy. Biofactors. ; 51(1): e2142. (2025). https://doi.org/10.1002/biof.2142. PMID: 39495620.
Hsieh, J. J. et al. Renal cell carcinoma. Nat. Rev. Dis. Prim. 3, 17009. https://doi.org/10.1038/nrdp.2017.9. (2017). PMID: 28276433; PubMed Central PMCID: PMCPMC5936048.
Tarade, D. & Ohh, M. The HIF and other quandaries in VHL disease. Oncogene 37 (2), 139–47. https://doi.org/10.1038/onc.2017.338. (2018). PubMed PMID: 28925400.
Sun, W. et al. Systemic immune-inflammation index predicts survival in patients with resected lung invasive mucinous adenocarcinoma. Transl Oncol. 40, 101865. https://doi.org/10.1016/j.tranon.2023.101865 (2024). PMID: 38101174; PMCID: PMC10727949.
Nengroo, M. A., Khan, M. A., Verma, A. & Datta, D. Demystifying the CXCR4 conundrum in cancer biology: beyond the surface signaling paradigm. Biochim. Biophys. Acta Rev. Cancer1877(5), 188790. https://doi.org/10.1016/j.bbcan.2022.188790. (2022). PubMed PMID: 36058380.
Eckert, F. et al. Potential role of CXCR4 targeting in the context of radiotherapy and immunotherapy of Cancer. Front. Immunol. 9, 3018. https://doi.org/10.3389/fimmu.2018.03018 (2018). PubMed PMID: 30622535; PubMed Central PMCID: PMCPMC6308162.
Yang, Y. et al. CXCL12-CXCR4/CXCR7 Axis in cancer: from mechanisms to clinical applications. Int. J. Biol. Sci. 19 (11), 3341–3359. https://doi.org/10.7150/ijbs.82317 (2023). PubMed PMID: 37497001; PubMed Central PMCID: PMCPMC10367567.
Quan, J. et al. Bioinformatics analysis of C3 and CXCR4 demonstrates their potential as prognostic biomarkers in clear cell renal cell carcinoma (ccRCC). BMC cancer. 21 (1), 814. https://doi.org/10.1186/s12885-021-08525-w (2021). PubMed PMID: 34266404; PubMed Central PMCID: PMCPMC8283915.
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559. https://doi.org/10.1186/1471-2105-9-559 (2008). PubMed PMID: 19114008; PubMed Central PMCID: PMCPMC2631488.
Zhang, P. et al. Novel post-translational modification learning signature reveals B4GALT2 as an immune exclusion regulator in lung adenocarcinoma. J. Immunother Cancer. 13 (2), e010787. https://doi.org/10.1136/jitc-2024-010787 (2025). PMID: 40010763; PMCID: PMC11865799.
Wang, S. et al. Machine learning reveals diverse cell death patterns in lung adenocarcinoma prognosis and therapy. NPJ Precis Oncol. 8 (1), 49. https://doi.org/10.1038/s41698-024-00538-5 (2024). PMID: 38409471; PMCID: PMC10897292.
Lahiri, A. et al. Lung cancer immunotherapy: progress, pitfalls, and promises. Mol. Cancer. 22 (1), 40. https://doi.org/10.1186/s12943-023-01740-y (2023). PMID: 36810079; PMCID: PMC9942077.
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 (D1), D457–D462. https://doi.org/10.1093/nar/gkv1070 (2016). PMID: 26476454; PMCID: PMC4702792.
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 (1), 27–30. https://doi.org/10.1093/nar/28.1.27 (2000). PMID: 10592173; PMCID: PMC102409.
Busillo, J. M. & Benovic, J. L. Regulation of CXCR4 signaling. Biochimica et biophysica acta 1768(4), 952–963. https://doi.org/10.1016/j.bbamem.2006.11.002 (2007). PubMed PMID: 17169327; PubMed Central PMCID: PMCPMC1952230.
Zhao, H. et al. CXCR4 over-expression and survival in cancer: a system review and meta-analysis. Oncotarget 6 (7), 5022–5040. https://doi.org/10.18632/oncotarget.3217 (2015). PubMed PMID: 25669980; PubMed Central PMCID: PMCPMC4467131.
Ahmad, S. et al. Epigenetic underpinnings of inflammation: connecting the Dots between pulmonary diseases, lung cancer and COVID-19. Semin. Cancer Biol. (2022) 83, 384 – 98. https://doi.org/10.1016/j.semcancer.2021.01.003. (2022)> PubMed PMID: 33484868; PubMed Central PMCID: PMCPMC8046427.
Patil, K. et al. Molecular pathogenesis of cutaneous T cell lymphoma: role of chemokines, cytokines, and dysregulated signaling pathways. Sem. Cancer Biol. 86 (Pt 3), 382–399 (2022). PubMed PMID: 34906723.
Chung, B. et al. Human brain metastatic stroma attracts breast cancer cells via chemokines CXCL16 and CXCL12. NPJ Breast Cancer 3, 6. https://doi.org/10.1038/s41523-017-0008-8. (2017). PubMed PMID: 28649646; PubMed Central PMCID: PMCPMC5460196.
Cong, Z. et al. High expression of C-X-C chemokine receptor 4 and Notch1 is predictive of lymphovascular invasion and poor prognosis in lung adenocarcinoma. Tumour Biology: J. Int. Soc. Oncodevelopmental Biology Med. 39 (6), 1010428317708698. https://doi.org/10.1177/1010428317708698. (2017). PubMed PMID: 28618922.
Ding, Y. & Du, Y. Clinicopathological significance and prognostic role of chemokine receptor CXCR4 expression in pancreatic ductal adenocarcinoma, a meta-analysis and literature review. Int. J. Surg. (Lond., Engl.) 65, 32 –8. https://doi.org/10.1016/j.ijsu.2019.03.009. (2019). PubMed PMID: 30902754.
Alsayed, R. et al. Epigenetic regulation of CXCR4 signaling in cancer pathogenesis and progression. Sem. Cancer Biol. 86 (Pt 2), 697–708 (2022). PubMed PMID: 35346802.
Saka, B., Ekinci, O., Dursun, A. & Akyurek, N. Clinicopathologic and prognostic significance of immunohistochemical expression of HIF-1α, CXCR4 and CA9 in colorectal carcinoma. Pathol. Res. Pract. 213 (7), 783–792. https://doi.org/10.1016/j.prp.2017.04.001 (2017). PubMed PMID: 28554753.
Durant, A. M. et al. The current application and future potential of artificial intelligence in renal Cancer. Urology 193, 157–163 (2024).
Athanasios, A., Charalampos, V., Vasileios, T. & Ashraf, G. M. Protein-Protein interaction (PPI) network: recent advances in drug discovery. Curr. Drug Metab. 18 (1), 5–10 (2017). doi: 10.2174/138920021801170119204832. PubMed PMID: 28889796.
Innis, S. E., Reinaltt, K., Civelek, M., Anderson, W. D. & GSEAplot A package for customizing gene set enrichment analysis in R. J. Comput. Biol J. Comput. Mol. Cell Biol. 28(6), 629–631. https://doi.org/10.1089/cmb.2020.0426. (2021). PubMed PMID: 33861629; PubMed Central PMCID: PMCPMC8219183.
Greener, J. G., Kandathil, S. M., Moffat, L. & Jones, D. T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23 (1), 40–55. https://doi.org/10.1038/s41580-021-00407-0 (2022). PubMed PMID: 34518686.
Xiao, Y. & Yu, D. Tumor microenvironment as a therapeutic target in cancer. Pharmacol. Ther. 221, 107753. https://doi.org/10.1016/j.pharmthera.2020.107753 (2021). PMID: 33259885; PMCID: PMC8084948.
Vitale, I., Manic, G., Coussens, L. M., Kroemer, G. & Galluzzi, L. Macrophages and metabolism in the tumor microenvironment. Cell Metab. 30(1), 36–50. https://doi.org/10.1016/j.cmet.2019.06.001. (2019). PMID: 31269428.
Bader, J. E., Voss, K. & Rathmell, J. C. Targeting metabolism to improve the tumor microenvironment for cancer immunotherapy. Mol. Cell. 78 (6), 1019–1033. https://doi.org/10.1016/j.molcel.2020.05.034 (2020). PMID: 32559423; PMCID: PMC7339967.
Kocarnik, J. M. et al. Cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life years for 29 cancer groups from 2010 to 2019: a systematic analysis for the global burden of disease study 2019. JAMA Oncol. 8(3), 420–44. https://doi.org/10.1001/jamaoncol.2021.6987. (2022). PubMed PMID: 34967848.
Motzer, R. J. et al. NCCN guidelines insights: kidney cancer, version 1.2021. J. Natl. Compr. Cancer Network: JNCCN. 18 (9), 1160–1170. https://doi.org/10.6004/jnccn.2020.0043 (2020). PubMed PMID: 32886895; PubMed Central PMCID: PMCPMC10191771.
Feldman, D. R. et al. Phase I trial of bevacizumab plus escalated doses of Sunitinib in patients with metastatic renal cell carcinoma. J. Clin. Oncol.: Official J. Am. Soc. Clin. Oncol. 27 (9), 1432–1439. https://doi.org/10.1200/jco.2008.19.0108 (2009). PubMed PMID: 19224847; PubMed Central PMCID: PMCPMC3655420 are found at the end of this article.
Siegel, R. L., Miller, K. D., Wagle, N. S., & Jemal, A. CA: Cancer J. Clin. 73(1), 17–48. https://doi.org/10.3322/caac.21763 (2023). PubMed PMID: 36633525. (2023).
Heidegger, I. et al. Comprehensive characterization of the prostate tumor microenvironment identifies CXCR4/CXCL12 crosstalk as a novel antiangiogenic therapeutic target in prostate cancer. Mol. Cancer. 21 (1), 132. https://doi.org/10.1186/s12943-022-01597-7 (2022). PMID: 35717322; PMCID: PMC9206324.
Mezzapelle, R. et al. CXCR4/CXCL12 activities in the tumor microenvironment and implications for tumor immunotherapy. Cancers (Basel). 14 (9), 2314. https://doi.org/10.3390/cancers14092314 (2022). PMID: 35565443; PMCID: PMC9105267.
Daniel, S. K., Seo, Y. D. & Pillarisetty, V. G. The CXCL12-CXCR4/CXCR7 axis as a mechanism of immune resistance in gastrointestinal malignancies. Semin Cancer Biol. 65, 176–188. https://doi.org/10.1016/j.semcancer.2019.12.007 (2020). PMID: 31874281.
Funding
None.
Author information
Authors and Affiliations
Contributions
Study conception and design: Fengming Hu, Xiao Wang, Hongli Liu and Ranran Dai; data collection: Fengming Hu, Xiao Wang, Hongli Liu and Ranran Dai; analysis and interpretation of results: Fengming Hu, Xiao Wang, Hongli Liu, Runsheng Wu and Ranran Dai; draft manuscript preparation: Fengming Hu, Xiao Wang, Hongli Liu, Runsheng Wu, Sihong Liu and Ranran Dai. All authors reviewed the results and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hu, F., Wang, X., Wu, R. et al. Identification of CXCR4 as a potential preventive gene in clear cell renal cell carcinoma from machine learning and immune analysis. Sci Rep 15, 21321 (2025). https://doi.org/10.1038/s41598-025-08199-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-08199-5