Introduction

Bladder cancer (BLCA), the most prevalent urologic malignancy with over 573,000 new cases annually worldwide, exhibits a dichotomous prognosis contingent on invasiveness1. The prognosis of BCLA is largely dependent on the disease stage. While the majority (75%) of patients are diagnosed with non-muscle invasive bladder cancer (NMIBC), early detection remains challenging. Additionally, approximately 25% of BLCA cases, along with the majority of non-urothelial carcinoma subtypes, tend to be muscle-invasive, necessitating systemic chemotherapy, immunotherapy, or radical interventions2. These patients have a significantly higher risk of mortality.

Emerging evidence implicates mitochondrial-nuclear crosstalk in BLCA progression. Evidence suggests that this process sustains the tumor phenotype by providing key metabolites for macromolecular synthesis and generating tumor-specific metabolites3,4. Massari et al. reported that in BCLA, aerobic glycolysis serves as the main energy source driving unchecked cell growth and proliferation. Moreover, the elevated expression of genes linked to glycolysis, the pentose phosphate pathway, and fatty acid synthesis in BCLA cells indicates that mitochondria play a crucial role in the initiation and progression of bladder cancer5,6.

Compounding this metabolic complexity, neutrophil extracellular traps (NETs) - initially characterized as antimicrobial chromatin webs - are now recognized as metastasis accelerants7,8. Due to the local high concentrations of histones and granule-derived material, this extracellular network can effectively capture and kill bacteria. Besides, the formation of NETs has also been implicated in a wide range of non-infectious diseases with inflammatory components, including cancer8,9,10, NETs are also capable of trapping circulating tumor cells, aiding in their implantation at distant sites, serving as one of the cancer-promoting factors, in addition to the abnormal immune responses they induce11,12. Evidence shows that extensive neutrophil infiltration in the BLCA tumor microenvironment (TME) correlates with poor clinical outcomes13,14. NETs, a functional output of activated neutrophils, further contribute to tumor progression by recruiting immunosuppressive cells such as myeloid-derived suppressor cells (MDSCs) and M2 macrophages, thereby suppressing T cell function15. Additionally, NET-released enzymes like elastase and matrix metalloproteinases (MMPs) degrade the extracellular matrix (ECM) and promote epithelial-mesenchymal transition (EMT), enhancing cell migration. Their DNA-histone complexes may also activate Toll-like receptor (TLR) signaling in tumor cells, promoting proliferation and survival16. Critically, the intersection between mitochondrial metabolic rewiring and NET-mediated microenvironment remodeling remains unexplored, a knowledge gap with direct therapeutic implications. The advent of single-cell RNA sequencing (scRNA-seq) offers a novel perspective for exploring tumor heterogeneity and the tumor microenvironment, leading to significant advancements in tumor research17. The use of single-cell sequencing data has enabled researchers to model cell differentiation trajectories and identify genes associated with various stages of differentiation18,19. Given the multifocal growth and high recurrence rates of BCLA, previous studies have conducted single-cell sequencing analyses to investigate its molecular characteristics19,20,21.

In this research, transcriptome data for BLCA were retrieved from public databases, while mitochondria-related genes (MRGs) and NETs-related genes (NETs-RGs) were sourced from relevant literature. Bioinformatics approaches were applied to identify prognostic genes associated with mitochondria and NETs in BLCA, assess their prognostic value, and analyze their expression at the single-cell level. Furthermore, key cell populations were identified, and the distribution of these prognostic genes among various cell clusters was explored, providing new insights into BLCA diagnosis and treatment.

Materials and methods

Data collection

The TCGA-BLCA dataset, which included RNA sequencing data and survival information for 404BLCA and 18 control tissue samples, was downloaded from UCSC-Xena (https://xenabrowser.net/datapages/). Meanwhile, somatic mutation data and clinical data were downloaded (access time 15th November, 2024).

The dataset GSE222315 was downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov). The GSE13507 dataset was sequenced by the GPL6102 platform and comprised 165 BLCA tissue samples along with their survival information. GSE222315 was sequenced by the GPL24676 platform and contained nine single-cell BLCA tumor tissue samples and four single-cell control tissue samples, which was a single-cell RNA sequencing (scRNA-seq) dataset.

In total, 1,136 mitochondria-related genes (MRGs) and 136 neutrophil extracellular traps-related genes (NETs-RGs) were downloaded from published literature, respectively (Supplementary Tables 12)22,23.

Analysis of differential gene expression

“DEseq2” (v 1.38.0)24 package was utilized to acquire differentially expressed genes (DEGs) by comparing BLCA samples to controls within the TCGA-BLCA dataset. The criteria for significance were set as p < 0.05 and |log2 fold change (FC)| > 2. The volcano plot was generated using “ggplot2” (v 3.4.4)25 package to visually represent the DEGs. The top 5 up- and down-regulated genes in BLCA group, sorted by log2FC, were labeled on the volcano plot. At the same time, the “ComplexHeatmap” (v 2.14.0)26 package was utilized for generating the heat map of the ten most significantly upregulated and downregulated genes in the BLCA group were identified, also sorted by log2FC.

Weighted gene co-expression network analysis (WGCNA)

The DEGs were intersected with MRGs and NETs-RGs, respectively. The overlapping was achieved using the “VennDiagram” (v 1.7.3)27 package. These overlapping genes were recorded as DE-MRGs(differentially expressed mitochondrial-related genes) and DE-NETs(differentially expressed NETs-related genes). Based on DE-MRGs and DE-NETs, the MRGs score and NETs-RGs score for BLCA samples and control samples in TCGA-BLCA were calculated by ssGSEA(single-sample gene set enrichment analysis) algorithm in “GSVA” (v 1.46.0)28package. Utilizing the MRGs score, the patients were classified into groups with high and low scores, and Kaplan-Meier (K-M) survival analysis was accomplished using “survival” (v 3.5.3)29 package (p < 0.05). The same analysis was performed using NET-RGs. Subsequently, WGCNA was performed by “WGCNA” (v 1.72.5)30package. Outlier samples were identified and excluded firstly. A scale-free network was subsequently established after calculating the soft threshold (power) and scale-free fit index (R2) through the pickSoftThreshold function. The selection of the soft threshold also needed to take into account the situation where the mean connectivity tends to 0. Then, the minimum number of genes per module was defined as 50, and mergeCutHeight was established as 0.2 to get the gene modules. Using MRGs and NETs-RGs scores as phenotypic traits, Pearson correlation analysis between gene modules and traits was performed using the “psych” (v 2.2.9)31package (|correlation coefficient (cor)| > 0.3, p < 0.05). Genes in the modules with the strongest correlations were defined as key module genes.

Identification and functional analyses of candidate genes

DEGs and key module genes were overlapping using the “VennDiagram” (v 1.7.3) package, and these overlapping genes were recorded as candidate genes. Following, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses32,33,34 were then undertaken (p < 0.05) utilizing “clusterProfiler” (v 4.7.1.003)35 package. Subsequently, the candidate genes were submitted to the search tool for recurring instances of neighboring genes (STRING) (https://string-db.org/) to construct a protein-protein interaction (PPI) network to explore the interactions of the candidate genes at the protein level (interaction score > 0.4), and the network was displayed by Cytoscape (v 3.10.3)36.

Acquisition of prognostic genes

Candidate genes with interaction pairs were then included in subsequent analyses. To ascertain whether a relationship existed between the prognosis of patients with BLCA and candidate genes, we integrated the expression data with survival time and status information sourced from TCGA-BLCA. The proportional hazards (PH) assumption test (p > 0.05) and univariate Cox regression analysis (Hazard Ratio (HR) ≠ 1, p < 0.001) were accomplished utilizing “survival” (v 3.5.3) package. The results were visualized using “forestplot” (v 3.1.1)37 package. Then, “glmnet” (v 4.1.4)38 package was utilized to perform least absolute shrinkage and selection operator (LASSO) regression analysis. The LASSO model with the minimum λ value derived through 10-fold cross-validation was identified as the optimal model, and the genes corresponding to this model were used as prognostic genes.

Prognostic model construction and validation

To estimate the predictive power of the risk model, risk scores were computed for BLCA patients. The calculation formula was \(\:\text{r}\text{i}\text{s}\text{k}\:\text{s}\text{c}\text{o}\text{r}\text{e}=\sum\:_{\text{i}=1}^{\text{n}}\left(\text{c}\text{o}\text{e}\text{f}\right(\text{g}\text{e}\text{n}\text{e}\text{i})\:\times\:\text{e}\text{x}\text{p}\text{r}(\text{g}\text{e}\text{n}\text{e}\text{i}\left)\right)\). Coef were obtained from LASSO and expr stood for the gene expression. At the same time, BLCA patients were divided into high-risk (HRG) and low-risk groups (LRG) based on the optimal risk score cutoff value. The “survminer” (v 0.4.9)39 package was utilized to plot the KM(Kaplan - Meier) curve. A log-rank test was also performed to demonstrate the disparity in survival rates between HRG and LRG (p < 0.05). The “survivalROC” (v 1.0.3.1)40 package was used to draw the receiver-operating characteristic (ROC) curve at 1, 3, and 5 years to evaluate the predictive accuracy of the prognostic model. The model effectiveness was assessed through the area under the curve (AUC) (AUC ≥ 0.6), where a higher AUC signifies a more precise predictive ability (AUC ≠ 1). Finally, the “pheatmap” (v 1.0.12)41 package was utilized to display the expression heatmap of prognostic genes in HRG and LRG in TCGA-BLCA. Similarly, we performed the same analysis in 165 BLCA samples from the GSE13507 dataset. Finally, the expression levels of prognostic genes between high and low risk groups were analyzed by Wilcoxon test in the TCGA-BLCA and GSE13507 datasets (p < 0.05).

Independent prognostic analysis and nomogram construction

Firstly, based on the samples with patient clinical information and risk scores from TCGA-BLCA, univariate Cox regression analysis was executed to confirm candidate independent prognostic factors (p < 0.05, HR ≠ 1). The PH assumption test was also conducted, and a p-value exceeding 0.05 was considered to fulfill the PH assumption. Subsequently, the candidate independent prognostic factors were subjected to multivariate Cox regression analysis to confirm independent prognostic factors (HR ≠ 1, p < 0.05). The “forestplot” (v 3.1.1) package was applied to visualize. Next, to further explore the predicted outcome, a prediction nomogram contained independent prognostic factors for BLCA patients was developed through the “rms” (v 6.5.0)42 package. Calibration and ROC curves at 1, 3, and 5 years were utilized to assess the precision by “rms” (v 6.5.0) package and “timeROC” (v 0.4)43 package, respectively.

Gene set enrichment analysis (GSEA)

To gain insight into the primary pathways enriched within HRG and LRG, the GSEA analysis was performed. Firstly, the “DESeq2” (v 1.38.0) package was used for differential expression analysis based on two risk groups in TCGA-BLCA. The “h.all.v2023.2.Hs.symbols.gmt” and “c2.cp.kegg.v2023.1.Hs.symbols.gmt” were employed as the reference gene sets and the “clusterProfiler” (v 4.7.1.003) package was applied to conduct GSEA. The thresholds were set as |normalized enrichment score (NES)| >1 and p value below 0.05.

Then, “c2.cp.kegg.v7.4.symbols.gmt” was set as the reference gene set, and Spearman correlation analysis was conducted among prognostic genes and all the remaining genes in TCGA-BLCA using “psych” (v 2.2.9) package. All remaining genes were ranked from largest to smallest by correlation coefficient. Finally, “clusterProfiler” (v 4.7.1.003) package was applied and thresholds were set as |NES| > 1, FDR < 0.25 and p < 0.05. For all GSEA results, the top 5 pathways according to the p value were visualized using “ggplot2” (v 3.4.4) package.

Tumor immune microenvironment analysis

To investigate the variations in immune microenvironment between HRG and LRG, the ssGSEA algorithm was carried out to determine the presence of 28 immune cell44, and Wilcoxon test was accomplished to assess the relationship in the immune cell infiltration (p < 0.05). The correlation between risk score and differential immune cells (DICs) was also executed (|cor| > 0.3, p < 0.05).

Analysis of response to immunotherapy

Immune dysfunction was assessed to determine whether the T cells presented in the tumor microenvironment were dysfunctional. These dysfunctional T cells were unable to effectively attack and kill cancer cells, even when they were present inside the tumor. Immune exclusion was used to assess whether immune cells were excluded and were unable to enter the tumor to attack. Elevated scores of tumor immune dysfunction and exclusion (TIDE) indicated a worse treatment effect. Therefore, based on BLCA tumor samples from TCGA-BLCA, these scores were calculated for samples in two risk groups from the TIDE (http://tide.dfci.harvard.edu/login/). The Wilcoxon test was conducted to assess differences in scores between HRG and LRG (p < 0.05). Correlation analyses of TIDE scores and risk score were also performed (|cor| > 0.3, p < 0.05). Next, the Wilcoxon test was applied to analyze the variations in the expression of immune checkpoint genes45 between two risk groups (p < 0.05). Spearman analysis was utilized to examine the correlation among differentially expressed immune checkpoint genes and prognostic genes(|cor| > 0.3, p < 0.05).

Analysis of genomic variation

To explore the genomic mutations, the somatic mutation data for each BLCA sample were acquired from the TCGA-BLCA. Waterfall plots for HRG, LRG and each prognostic gene were drawn by the “Maftools” (v 2.14.0)46 package. The top 20 genes exhibiting the highest frequencies of mutation between HRG and LRG were shown.

Drug sensitivity analysis

To evaluate the response to chemotherapy in HRG and LRG, the “pRRophetic” (v 0.5)47 package was utilized to compute Drug half-maximal inhibitory concentration (IC50) values retrieved from Genomics of drug sensitivity in cancer (GSDC) (https://www.cancerrxgene.org/) in TCGA-BLCA. Spearman analysis was performed using the “psych” (v 2.2.9) package to analyze the correlation between the IC50 values of the drugs and the risk scores (|cor| > 0.3, p < 0.05). Wilcoxon test was utilized to compare the difference in IC50 of relevant drugs (p < 0.05). The “ggplot2” (v 3.5.1) package was used to display the top 10 differential drugs ranked by p value. Pearson correlation analysis was performed between the top 10 drugs and prognostic genes, and the associations between drugs and genes were displayed using correlation heatmaps (|cor| > 0.3, p < 0.05).

Construction of molecular regulatory networks

Prediction of upstream miRNAs for mRNA (prognostic genes) was carried out by the mirdb (https://mirdb.org/) and the targetscan (https://www.targetscan.org/) database in the “multiMiR” (v 1.20.0)48 package. The lncRNAs of the above miRNAs were subsequently predicted in starbase (http://starbase.sysu.edu.cn), and the screening criteria were clipExpNum > 15. Then, a sankey map of the regulatory network was visualized using “ggsankey” (v 0.0.9999)49 package. Finally, to understand the post-translational modifications of prognostic genes, we applied PhosphoSitePlus website (https://www.phosphosite.org/homeAction) for our analysis.

scRNA-seq data analysis

The “Seurat” (v 5.0.1)50 package was used for scRNA-seq data analysis in the GSE222315 dataset. The cells with fewer than 200 or greater than 6,000 genes, those with a total sum of expressed genes greater than 20,000, and cells with greater than 10% proportion of genes expressed in mitochondria were removed for subsequent analyses. After quality control, the data was normalized utilizing the “Seurat” (v 5.0.1) package. Subsequently, the top 2,000 genes with the highest variability were distinguished through the application of the FindVariableFeatures function. Next, the dimensionality of GSE222315 was decreased through principal component analysis (PCA). The “Seurat” (v 5.0.1) package was applied to obtain the linear optimal dimension value of cell clustering, and the cell groups could be divided according to the optimal value. Subsequently, based on the selected principal components (PCs), unsupervised cluster analysis (resolution = 0.5) was conducted via uniform manifold approximation and projection (UMAP) for all cells. Annotated analysis of cell clusters was conducted to ascertain specific cell types based on marker genes51 obtained from the literature. At the same time, the percentage of various cell types within the BLCA and the control were shown. The Wilcoxon test was applied to identify differences in cell types between BLCA and control samples (p < 0.05), and the results were presented using the “ggplot2” (v 3.5.1) package. Functional enrichment analysis of differential cell types was performed using the “ReactomeGSA” package (v 1.12.0)52. The top 15 pathways with the largest differences were visualized. Next, the prognostic gene expression in each cell was shown. Subsequently, the comparison of prognostic gene expression in BLCA and control cells was displayed, and all cells with differential expression of prognostic genes were defined as key cells (p < 0.05). To explore the developmental trajectories of key cells, the “Seurat” (v 5.0.1) package was used to perform secondary clustering of key cells in GSE222315 firstly. Then, the monocle algorithm was used to perform pseudo-time analysis of key cells to investigate the differentiation of key cells and the change rule of prognostic gene expression during the transition process of key cells.

Reverse transcription-quantitative polymerase chain reaction (RT-qPCR)

To assess gene expression levels related to prognosis, we obtained a validation sample from five patients diagnosed with muscle invasive BCLA who were recruited in Zhuzhou Hospital Affiliated to Xiangya School of Medicine, Central South University in 2024. Ethics approval for the study was obtained from that institution. First, total RNA was isolated with TRIzol reagent (Invitrogen, USA) following the manufacturer’s protocol. RNA concentration and purity were determined by a NanoDrop 2000 spectrophotometer (Thermo Fisher, USA), and samples with an A260/A280 ratio between 1.8 and 2.0 were used for further analysis. Nest, RT-qPCR was conducted on a QuantStudio™ 6 Flex Real-Time PCR System (Applied Biosystems, USA) with TB Green™ Premix Ex Taq™ II (Takara, Japan). The reaction system (20 µL total volume) contained 10 µL of SYBR Green Mix, 0.4 µL each of forward and reverse primers (10 µM), 2 µL of cDNA, and 7.2 µL of RNase-free water. The PCR thermal profile included an initial step at 95 °C for 30 s, followed by 40 repeated cycles of 95 °C for 5 s and 60 °C for 30s. The primer sequences used are summarized in Table 2, and the results of gene expression were plotted by Graphpad (v 10.4.0).

Table 1 List of Lasso regression coefficients.

Statistical analysis

All analyses were executed utilizing R programming language (v 4.2.2). The Wilcoxon test was harnessed to contrast differences, employing a statistical threshold of p < 0.05.

Table 2 Primer sequences were used in the RT-qPCR reactions.

Results

DEGs and key module genes were identified

A total of 2,152 DEGs were identified between BLCA and control samples, among which 1,296 were upregulated and 856 were downregulated in BLCA samples (|log2FC| > 2, p < 0.05) (Fig. 1a-b). Using the Venn diagram, by intersecting the 2,152 DEGs with the 1,136 MRGs, we identified 42 DE-MRGs. Similarly, taking the intersection of DEGs with 136 NET-RGs, 17 DE-NETs were obtained (Fig. 1c). Then, in the BLCA samples in TCGA-BLCA, ssGSEA score of MRGs and NETs were calculated based on DE-MRGs and DE-NETs, and the best cut-off value of DE-MRGs score (−0.3716) and DE-NETs (−0.0729) were used to categorize BLCA samples into high and low scoring group. The analysis revealed significant differences in survival related to both MRGs (p = 0.0029) and NETs-RGs (p = 0.0009) scores between high- and low-scoring groups, with the high-scoring group exhibiting significantly poorer survival outcomes(Fig. 1d). Then, module genes were obtained by WGCNA. Using cluster analysis, there were no outliers in the TCGA-BLCA (Fig. 1e). Secondly, when the optimal soft threshold was 5 (R2 = 0.85), the constructed network had a scale-free distribution and the mean connectivity tended to 0 (Fig. 1f), and 22 modules were identified (Fig. 1g). The red module, which had the highest positive correlation with MRGs (cor = 0.58, p = 4 × 10−37) and NETs-RGs (cor = 0.74, p = 2 × 10−72) score, was selected to obtain a total of 1,006 key module genes (Fig. 1h).

Fig. 1
figure 1

A. Volcano Plot: Each point in the figure represents a gene, and the top 5 up-regulated genes and top 5 down-regulated genes are marked according to the log2FC order of fold change; B. Distribution as Heatmap: Each column represents a sample, and the top 10 up-regulated and top 10 down-regulated genes ranked by fold change log2FC are plotted in this figureC1. Venn Diagram: DEGs vs. MRGs: The numbers indicate the count of genes in each category and their intersections. C2. Venn Diagram: DEGs vs. NETs-RGs. D1. Kaplan-Meier survival analysis between high and low score groups of intersection genes in C1 figure. D2. Kaplan-Meier survival analysis between high and low score groups of intersection genes in C2 figure. E. Sample level clustering: Hierarchical clustering was performed using the Euclidean distance of the expression to check the absence of outliers in the samples. F. Soft threshold: From the left figure, when the scale-free fitting index is 0.85, the minimum soft threshold conforming to the construction of the scale-free network is 5, and the right figure shows the network connectivity under different soft thresholds. G. Cluster Dendrogram: Dendrogram resulting from hierarchical clustering of genes. The branches indicate groups of genes sharing comparable expression profiles, 22 modules were obtained. H. Correlation heat map: The color bar on the left indicates different gene modules, while the right color bar shows the correlation scale. In the central heat map, darker colors correspond to stronger correlations, with red denoting positive and blue indicating negative correlations. Numbers within each cell represent correlation coefficients and statistical significance.

Candidate genes were identified and functions were initially analyzed

Using Venn diagram, a total of 195 candidate genes were identified by intersecting the 2,152 DEGs with the 1,006 key module genes (Fig. 2a). Subsequently, GO and KEGG analyses were performed to explore the enrichment pathways associated with these candidate genes (p < 0.05). A total of 824 items were enriched in GO enrichment analysis. Altogether, 713 items were significantly enriched in BPs such as muscle contraction, 50 items were significantly enriched in CCs such as myofibril, and 61 items were significantly enriched in MFs such as heparin binding. In the KEGG enrichment analysis, 31 pathways demonstrated notable enrichment such as focal adhesion and proteoglycans in cancer. The top five pathways for each part of GO and top 15 KEGG pathways were visualized, ranked by p value (Fig. 2b-c). The full enrichment pathway information was shown in Supplementary Tables 34. Finally, altogether 58 genes did not interact with other genes and were not demonstrated. The PPI results showed that there were 442 pairs of interactions among 137 candidate genes such as TDO2 with MAOB. JUN, ITGB3, DCN and other seven genes were located in the center of the network, indicating that they had more interactions with other candidate genes (Fig. 2d). These 137 candidate genes were included in the subsequent analysis.

Fig. 2
figure 2

A. Venn Diagram: Blue represents the genes in the WGCNA set, red represents the genes in the DEGs gene set, 195 candidate genes were selected. B1. GO Terms Bar Plot: The three parts of GO analysis were enriched for pathway-specific names, with each color representing a different part. B2. Circular GO Terms Visualization: The circular diagram consists of multiple layers, arranged from the outside inward. The outermost layer displays the IDs of GO functions. The second layer shows significance levels through color gradients, while the length, width, and numeric values correspond to the number of genes enriched in each function. The third layer indicates the number of downregulated genes enriched in the corresponding function. The innermost color segments represent different functions, with segment size indicating the Rich Factor of each pathway—larger segments signify greater enrichment. C1. KEGG Pathway Enrichment Plot: Names of KEGG enriched pathways. Each dot represents a pathway, with its position along the x-axis showing the number of involved genes and the y-axis indicating the -log10(p-value) for statistical significance.C2. Circular KEGG Pathway Visualization: The circular plot layers, from the outermost to the innermost, represent the following: the first (outermost) layer indicates KEGG pathway IDs. The second layer illustrates significance levels using color gradients, with the length, width, and numerical annotations corresponding to the count of genes enriched in each pathway. The third layer indicates the number of upregulated and downregulated genes enriched in each pathway, differentiated by distinct colors. The innermost layer employs colors to distinguish between upregulated and downregulated gene enrichment. D. PPI network: The oval represents the protein, and the color shade represents the degree, the darker the higher. The lines in the figure represent the connections between proteins.

Six prognostic genes were identified

The 137 candidate genes mentioned above were tested for PH assumption and 125 genes passed PH test (p > 0.05) (Supplementary Table 5). Subsequently, 125 genes were screened for univariate Cox regression analysis, and eight prognostic related genes were obtained (HR ≠ 1, p < 0.001). As shown in the forest plot, DPYSL3, CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, RBMS3, and PCOLCE2 were identified as risk genes (HR > 1) (Fig. 3a). Then, the LASSO analysis showed that when the optimal lambda was 0.03441586 and the regression coefficients were non-zero, six genes were identified as prognostic genes, including CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, and PCOLCE2 (Fig. 3B-C).

Fig. 3
figure 3

(A) Forest Plot: The leftmost column is the genes screened by univariate cox, and the second and third columns are the corresponding p value and HR value, where the 95% confidence interval of HR value is in parentheses after the HR value. In the right plot, the red points are the HR values, and the line segments on both sides indicate the 95% confidence intervals of the HR values. (B) Path plot of Lasso regression coefficients: The abscissa is log(Lambda), and the ordinate represents the cross-validation error, the position of the dashed line is the position with the least cross-validation error. (C)  Cross validation curve: The graph shows the coefficients of genes as a function of log (λ) values. The optimal lambda value is indicated by the dashed line (obtained from the position of the minimum crossover error determined by the B panel). Each line represents a gene, and the color corresponds to the gene listed in the legend.

Prognostic model demonstrated a good prognostic outcome on BLCA

The expression of six prognostic genes, combined with the risk coefficients derived from LASSO regression (Table 2), were utilized to determine the risk score for each BLCA sample in TCGA-BLCA. Based on the best cut-off value (0.5027401), BLCA samples in TCGA-BLCA were categorized into HRG (N = 148) and LRG (N = 256). In the same manner, 165 BLCA patients in GSE13507 were also classified as HRG (N = 69) and LRG (N = 96) according to the best cut-off value (2.792534). In both TCGA-BLCA and GSE13507, the risk score distribution plots showed that as it rose in the cohort, the number of fatalities increased (Fig. 4a-b). In both TCGA-BLCA (p < 0.0001) and GSE13507 (p = 0.00018), the KM curves for both datasets showed that the survival was worse in HRG (Fig. 4c-d). The AUC values of the prognostic model at 1, 3 and 5 years were all greater than 0.6 or equal to 0.6 (Fig. 4e-f), which demonstrated the ability to forecast the survival of BLCA patients. Ultimately, six prognostic genes were notable differences between the two risk groups. Specifically, all six genes were upregulated in the HRG. The gene expression profiles were consistent between the TCGA-BLCA and GSE13507 (Fig. 4g-h). Additionally, in the TCGA-BLCA and GSE13507 datasets, the six prognostic genes were all highly expressed in the high-risk group (Supplementary Fig. 1a-b).

Fig. 4
figure 4

A1. Curve of risk(training set): Distribution of risk scores in the training set, ordered by increasing risk scores. Patients were categorized into high-risk and low-risk groups based on the optimal risk-score cutoff. Red dots represent patients in the high-risk group, whereas blue dots represent those in the low-risk group. A2. survival status(training set): Red points are dead samples, blue points are survival samples, and dashed lines are the best cutoff value of risk score. B1. Curve of risk(validation set).B2. survival status(validation set). C. Train KM: Kaplan-Meier survival curves for the training dataset, The unit of time is days; Red represents the high-risk group, blue represents the low-risk group, and the bottom number represents the number of samples that survived at this time. D. Verify KM. E. Train ROC: Receiver Operating Characteristic (ROC) curves for the training dataset at 1, 3, and 5 years. The area under the curve (AUC) values were all > 0.6. F. Verify ROC. G. Train Heatmap: The heatmap displays the expression patterns of the selected genes in the training dataset. Each row represents a gene, and each column represents a patient. The color scale indicates the expression level. H. Verify Heatmap.

Nomogram had excellent predictive capability

For BLCA samples in TCGA-BLCA with clinical information and risk scores, it was found that risk scores, age, T stage and N stage exhibited p value less than 0.05 and HR not equal to 1 in univariate Cox regression, and p greater than 0.05 in the PH assumption test (Fig. 5a) (Table 3). Following, multivariate regression analysis indicated a correlation between prognosis and factors such as risk score, age, as well as T and N stages. (p < 0.05), and they were regarded as independent prognostic factors (Fig. 5b). Then, a nomogram was constructed. The mortality of BLCA was predicted based on the total points, with a higher score indicating a lower survival rate (Fig. 5c). Calibration curves at 1, 3, and 5 years displayed good consistency, reflecting reasonable accuracy of the nomogram (Fig. 5d). Additionally, the AUC values for 1-year (0.74), 3-year (0.73), and 5-year (0.75) survival were all above 0.7, suggesting strong predictive capability of the nomogram for overall survival at these time points (Fig. 5e).

Fig. 5
figure 5

(A) Univariate Analysis: Forest plot showing the hazard ratios (HR) and 95% confidence intervals (CI) for each variable in the univariate Cox proportional hazards model. The variables include riskscore, age, gender, T stage, and N stage. Each row represents a variable, with the HR indicated by the point estimate and the CI by the horizontal line. (B) Multivariate Analysis: In the multivariate Cox model, riskScore, Age, T stage, and N stage exhibited a significant correlation with prognosis(P<0.05). (C) Nomogram: Nomogram for predicting the probability of overall survival at 1, 3, and 5 years. The nomogram includes points assigned to each variable (N stage, T stage, age, and risk score), which are summed to predict survival probabilities. (D) Calibration Plot: Calibration plot comparing the predicted survival probabilities from the nomogram with the actual observed survival probabilities at 1, 3, and 5 years. The diagonal line represents perfect calibration, and the plotted points show the agreement between predicted and observed probabilities. (E)Independent Prognosis ROC: Receiver Operating Characteristic (ROC) curves for the independent prognosis model at 1, 3, and 5 years. The Area Under the Curve (AUC) values are provided for each time point(AUC>0.6).

Table 3 PH assumption test.

Six prognostic genes were co-enriched in 42 signaling pathways

Using “h.all.v2023.2.Hs.symbols.gmt” as the reference gene set, 20 pathways were enriched in both risk groups, such as epithelial mesenchymal transition and allograft rejection (Fig. 6A) (Supplementary Table 6). When “c2.cp.kegg.v2023.1.Hs.symbols.gmt” was used as the reference gene set, 64 pathways were enriched in both risk groups, such as ECM receptor interaction (Fig. 6B) (Supplementary Table 7). Subsequently, enrichment analysis of prognostic genes revealed that CCDC80 was significantly enriched in 78 pathways, NIBAN1 in 92 pathways, CSPG4 in 81 pathways, PDGFRA in 82 pathways, MAP1A in 68 pathways, and PCOLCE2 in 76 pathways (Fig. 6c-h) (Supplementary Table 8). These genes were significantly enriched in 20 pathways, including the ribosome pathway.

Fig. 6
figure 6

Gene Set Enrichment Analysis (GSEA): The resulting graph can be divided into three parts: The horizontal axis is the tested gene set sorted by gene correlation. In this study, the regions on the left side greater than 0 represent genes that are up-regulated in the high-risk group, the regions on the right side less than 0 represent genes that are up-regulated in the low-risk group, and the vertical axis is the corresponding Runing ES. The peak of the line graph was the enrichment score of the enriched pathway, and the gene before the peak was the core gene in the enriched pathway in the tested gene set. The second part was hit, which marked the genes under the tested gene set with lines. The third part shows the rank value distribution map of all genes, and the Signal2Niose algorithm is used by default. A Display of GSEA enrichment results in high- and low- risk groups(HALL). B GSEA enrichment results in high- and low- risk groups(KEGG). C GSEA enrichment results of prognostic gene CCDC80 (KEGG). D GSEA enrichment results of prognostic gene NIBAN1 (KEGG). E GSEA enrichment results of prognostic gene CSPG4 (KEGG). F GSEA enrichment results of prognostic gene PDGFRA (KEGG). G GSEA enrichment results of prognostic gene MAP1A (KEGG). H GSEA enrichment results of prognostic gene PCOLCE2 (KEGG).

The immune microenvironment and immune checkpoint expression exhibited notably varied between the two risk group

The heatmap illustrates the enrichment levels of 28 immune cell types across the two BLCA risk groups (Fig. 7A), and the abundance of central memory CD4 T cells was higher. The enrichment scores of 25 cells, such as activated dendritic cells and eosinophils, demonstrated a difference between HRG and LRG (p < 0.05) (Fig. 7B). Of these, only CD56(bright) natural killer cells had a lower infiltration in HRG, and all the rest of DICs had higher infiltration in HRG. Correlation analysis between risk scores and DICs showed that 21 DICs were positively correlated with risk scores and the most positively correlated cell was natural killer cells (cor = 0.64, p < 0.0001) (Fig. 7C) (Supplementary Table 9).

A notable disparity was observed in TIDE scores between the high and low risk groups (p < 2.2 × 10−16), with a marked positive association being evident (cor = 0.53, p = 3.51 × 10−31) (Fig. 7d). The TIDE, exclusion, and dysfunction scores of the HRG were significantly higher (p < 0.0001), indicating a more severe suppression of T cell function, and difficulties in the entry of immune cells into the tumor (Fig. 7e). Notable variations were observed in 44 out of 49 immune checkpoint genes between HRG and LRG (p < 0.05). Only TNFRSF14, TNFRSF25, and VEGFA expression was higher in the LRG (Fig. 7F). Most of the differential immune checkpoint genes exhibited a positive relationship with prognostic genes (cor > 0.3, p < 0.05), and only a few were negatively correlated with prognostic genes such as TNFRSF25 with MAP1A (cor = −0.34) and VEGFA with CCDC80 (cor = −0.44) (p < 0.05) (Fig. 7G).

Fig. 7
figure 7

A. Heatmap of immune cell ssgsea scores: High risk group samples are shown in orange and low risk group samples are shown in blue. B Boxplots of immune cell differences: The ordinate is the ssgsea score of immune infiltrating cells. Orange represents the high-risk group samples, blue represents the low-risk group samples, Significance indicators: **** (p < 0.0001), *** (p < 0.001), ** (p < 0.01), * (p < 0.05), ns (no significant difference) C Correlation between Risk Score and Immune Cell Types: The ordinate represents the differential immune cells and the abscissa represents the correlation R value. DTIDE Prediction Score and Risk Score: The blue line represents the fitted regression line, and the histogram on the right shows the distribution of TIDE prediction scores. E Differences in scores(TIDE, exclusion, dysfunction) between the high and low risk groups. F Expression of immune checkpoint genes: Differences in immune checkpoint gene expression between the high-risk and low-risk groups, the abscissa represents immune checkpoint genes, and the ordinate represents gene expression. G Correlation Matrix of Gene Expression: Correlation between prognostic genes and differential immune checkpoint genes. The ordinate represents prognostic genes, the abscissa represents differential immune checkpoints, red indicates positive correlation, and blue indicates negative correlation. Numerical values represent correlation coefficients (r values) and their significance levels.

Mutation status differed between two risk groups

The three genes with the most frequent mutations in the HRG were TP53 (51%), TTN (42%), and ARID1A (26%). Missense mutations were the predominant mutation type observed in TP53, and those of TTN were mainly missense mutations and multiple mutations in the same sample. The mutation types of ARID1A were more nonsense mutations, frameshift deletion mutations, and multiple mutations in the same sample (Fig. 8A). The three genes exhibiting the highest mutation rates within the LRG were TTN, TP53, and KDM6A, with mutation frequencies of 49%, 48%, and 33%, respectively. The primary mutation type for TTN was multiple mutations within the same sample, whereas for TP53, it was a missense mutation, and for KDM6A, it was a nonsense mutation (Fig. 8B). Analysis of the mutation profiles of each prognostic gene showed that, except for NIBAN1, all the other genes had more missense mutation (Fig. 8C).

Fig. 8
figure 8

Oncoplot: A Oncoplot showing mutation profiles of 144 samples. The most frequently mutated genes include TP53 (51%) and TTN (42%). Mutations are categorized by type, with 90.97% of samples harboring at least one alteration. TMB values are shown at the top. B Oncoplot of 255 samples with lower mutation burden. TTN (49%) and TP53 (48%) are the most frequently mutated genes. Mutations are classified by type, with 97.25% of samples showing at least one alteration. C Oncoplot of 399 samples focusing on prognostic gene mutations. Alterations occur in 11.78% of samples, with low mutation frequencies across genes such as NIBAN1 and PDGFRA. Mutations are categorized by type.

The risk score correlated with chemotherapy sensitivity

Correlation of drug IC50 values with risk scores showed that 50 drugs had significant correlations, of which 41 were negative and 9 were positive (|cor| > 0.3, p < 0.05) (Fig. 9 A). The IC50 values of these drugs in HRG and LRG were significantly different (p < 0.0001). The IC50 values of AP.24,534, Bexarotene, BX.795, GSK269962A, NU.7441, Pazopanib, WO2009093972, and XMD8.85 were higher in the LRG, suggesting that these drugs were more sensitive to the HRG. The IC50 values of BIBW2992 and GW.441,756 were higher in the HRG, indicating that these drugs were more sensitive to the LRG(Fig. 9B). GW.441,756 showed the strongest positive correlation with MAP1 (cor = 0.60, p = 5.71e-41), while pazopanib exhibited the strongest negative correlation with CCDC80 (cor = −0.77, p = 1.101523e-81) (Fig. 9 C).

Fig. 9
figure 9

Correlation between risk scores and drug sensitivity: A. Correlation analysis between drug IC50 values and risk scores, the threshold was |cor|> 0.3&p < 0.05. There were 50 drugs with significant correlations, of which 41 showed a negative correlation, while 9 exhibited a positive correlation, annotated according to the |r| value ranking TOP5. B The IC50 of the 50 drugs showed significant differences between the high and low risk groups, and the top 10 drugs showed significant differences between the groups according to the p value order. They were AP.24,534, Bexarotene, BIBW2992, BX.795, GSK269962A, GW.441,756, NU.7441, Pazopanib, WO2009093972, XMD8.85. C. The correlation between prognostic genes and drugs.

Phosphorylation modification of prognostic genes was the main type of post-translational modification

According to the database prediction, 28 miRNAs were predicted for five prognostic genes, and 22 lncRNAs were further predicted. In the lncRNA-miRNA-mRNA network, XIST regulated PDGFRA by targeting hsa-miR-93-5p. MALAT1 regulated PCOLCE2 by targeting hsa-miR-32-5p. AC234582.1 regulated CCDC80 by targeting hsa-miR-141-3p. AC021078.1 regulated NIBAN1 by targeting hsa-miR-20b-5p. TUG1 regulated MAP1A by targeting hsa-miR-9-5p (Fig. 10a). All miRNAs and lncRNAs are listed in Supplementary Table 10. Prediction of post-translational modification types for prognostic genes revealed that most prognostic genes were predominantly phosphorylated, along with other modifications such as acetylation and ubiquitination (Fig. 10b-g).

Fig. 10
figure 10

A. Sankey Diagram: The diagram illustrates the interaction network among long non-coding RNAs (lncRNAs), microRNAs (miRNAs), and prognostic genes. LncRNAs are shown on the left, mirnas in the middle, and key genes on the right. The width of the connecting lines indicates the strength or significance of the interactions between these molecules. B-G. The potential phosphorylation sites of prognostic gene proteins were predicted by PhosphoSitePlus: From B to G, PDGFRA, NIBAN1, MAP1A, CSPG4, PCOLCE2, CCDC80; Blue spheres represent phosphorylation, green acetylation, brown ubiquitination, and gray others.

Cells were clustered into eight types

To explore cells associated with BLCA, scRNA-seq analysis was performed. Among all the samples in the scRNA-seq dataset, a total of 102,232 cells and 30,971 genes were retrieved after quality control (Supplementary Fig. 23), and top 2,000 highly variable genes were shown (Fig. 11a). In PCA, 30 PCs were selected for subsequent analyses (Fig. 11b). PCA plots were drawn for the BLCA group and the control group, and 30 cell clusters were obtained through clustering (Fig. 11c). It could be seen that there was no clear boundary (Supplementary Fig. 4). Based on marker genes, the clustered cells were categorized into eight distinct cell types, including mast cells (TPSAB1, KIT), NK cells (NKG7, KLRD1), fibroblasts (COL1A1, COL3A1), endothelial cells (PECAM1, CD34), myeloid cells (CD14, LYZ), B cells (CD79A, MS4A1), epithelial cells (EPCAM, CDH1) and T cells (CD3D, CD3G) (Fig. 11d-e). According to the proportion of the eight cell types in the BLCA and the control group, it could be seen that the proportion of epithelial cells, fibroblasts, and T cells was larger in the BLCA group. Compared with the control group, the proportions of epithelial cells, fibroblasts, and endothelial cells were significantly higher, while T cells and B cells were significantly lower (Fig. 11f).

Fig. 11
figure 11

(A) Hypervariable genes: Pink dots are hypervariable genes and black dots are non-hypervariable genes, X-axis: Gene expression level(normalized expression values), Y-axis: Standardized variance, represents the variability of gene expression across cells. Higher values indicate greater variability. (B) Left figure: Linear dimension graph: Comparison of empirical P-values and theoretical p-values of principal components (pc). The X-axis represents the theoretical quantile, and the Y-axis represents the empirical p-value. Important PCS are represented by their p-values (for example, pc7: 4.28e − 273,pc8: 1.24e − 320). Right figure: Screen figure: Standard deviation (PC) interpreted by each principal component. The X-axis represents the PC number and the Y-axis represents the standard deviation. The optimal latitude value of 30 corresponds to the subsequent scatter planarization. (C) UMAP cluster map: Each point represents a cell, and the position of each point in the two-dimensional space (umap_1 and umap_2) reflects the similarity of its gene expression profile to other cells. The numbers (0–29) indicate different cell clusters. (D) Maker Gene expression: The X-axis denotes cell subtypes, while the Y-axis represents marker genes, different colors represent different cell subtypes, and the red color represents higher gene expression. (E) UMAP cluster map (annotated): Different colors represent different taxa. (F) Proportion between cell groups: Different colors indicate distinct cell subtypes, with values representing their respective proportions.

Fibroblasts were regarded as the key cell type

Between BLCA and control samples, T cells, B cells, and mast cells showed significant differences and were classified as differential cells (Fig. 12a). Due to the low infiltration levels of mast cells, the functional enrichment analysis of T cells and B cells was performed. A total of 15 pathways were enriched, including classical antibody-mediated complement activation, TNFR1-mediated ceramide production, reuptake of GABA, etc. (Fig. 12b). NIBAN1 expression was higher in all cells and all six prognostic genes were highly expressed in fibroblasts (Fig. 12d-d). The expression differences of prognostic genes between BLCA and control groups were subsequently plotted, and the expression of all prognostic genes was significantly different in Fibroblasts cells (Fig. 12E). Therefore, fibroblast cells were recorded as key cells. Subsequently, secondary clustering of key cell fibroblasts yielded 25 key cell subclusters(Supplementary Figs. 57). Then, the key cells were analyzed by pseudo-time series. The results showed that there were seven possible different stages of fibroblasts during developmental differentiation (Fig. 12F-G).

The changes in the expression of prognostic genes in key cells during differentiation had a certain pattern. In the key cells of fibroblasts, CCDC80 gradually increased in the early stage, decreased in the middle stage, and then rose to the peak in the late stage. The expression of CSPG4 declined continuously to the trough in the early and middle stages, began to rise in the late stage, and then gradually declined again. MAP1A tends to be flat in the overall stage. NIBAN1 expression was low in the early stage, began to rise in the middle and late stages, and then declined in the late stage. PCOLCE2 expression was low in both the early and middle stages, and then increased sharply in the late stage. PDGFRA expression was low in the early stage, and then tended to flatten out in the late stage after rising to the peak in the middle stage (Fig. 12H).

Fig. 12
figure 12

(A) Box plot of differential cell: The X-axis represents different cell subtypes, while the Y-axis denotes their proportions. Red indicates the disease group, and blue represents the control group (B) Functional enrichment: Heatmap showing the expression of the top 20 most significantly altered pathways, X-axis represents key immune cells, while the Y-axis displays pathway names. (C) UMAP plot: Distribution of prognostic genes across all cells. (D) Violin plot: The expression of prognostic genes (CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, PCOLCE2) in each cell subtype. The x-axis represents various cell subtype, the y-axis represents the expression level of the hub genes. (E) Violin plot (Prognostic gene expression between groups): Red represents the control group, and blue represents the tumor group. The main title of the figure indicates the cell subtype name. The expression of all prognostic genes shows inter-group differences in Fibroblasts. (F) Pseudo-time trajectory plot: The color gradient ranges from dark blue to light blue, indicating the position of cells along the pseudo-time trajectory. Dark blue represents early pseudo-time stages, while light blue indicates later pseudo-time stages. (G) Differentiation trajectory (state) diagram: Different colors represent distinct cell states. The cell states illustrate the distribution of cells along the pseudo-temporal trajectory, with each color corresponding to a potential developmental stage or functional state. (H) Prognostic gene dynamics plot: The x-axis represents pseudotime, with progression from left to right indicating the sequence of cells from early to late stages in development or differentiation. The y-axis shows the relative expression levels of genes. The color coding on the right indicates the states of cell subpopulations along the differentiation trajectory, where different colors represent distinct states.

RT-qPCR validation of prognostic genes

In TCGA-BLCA, all prognostic genes (CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, PCOLCE2) were significantly downregulated in the BLCA group. These results were consistent with our bioinformatics predictions, further supporting their potential role in BLCA prognosis (Fig. 13A).

Fig. 13
figure 13

A RT-qPCR validation of six prognostic genes in BLCA: Each bar plot represents individual patient samples (P1–P5), comparing gene expression levels (2^-ΔΔCt) in normal (blue) and BLCA (red) tissues. GAPDH was used as the endogenous control (*p < 0.05, **p < 0.01, **p < 0.001, ***p < 0.0001).

Discussion

BCLA is a prevalent disease worldwide, with a higher incidence observed in developed countries, representing a major public health challenge. Clarifying the risk factors for BLCA is of critical importance2. This study employed bioinformatics analysis in conjunction with experimental verification to explore the association between MRGs, NETs-EGs, and BLCA from multiple perspectives. After screening, we ultimately identified six genes associated with prognosis: CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, and PCOLCE2.The six-gene signature orchestrates immune polarization, metabolic adaptation, and matrix remodeling in BLCA.

CCDC80 (Coiled-Coil Domain Containing 80) has a significant impact on cancer biology53. It is involved in processes like cell migration, invasion, proliferation, and differentiation. Its expression is frequently increased in tumor cells, correlating with enhanced metastasis and invasion54,55,56.High expression of CCDC80 promotes M1 macrophage polarization while inhibiting M2 polarization, positioning it as a potential prognostic biomarker for immune-invasive gastric cancer56,57. In addition to its role in macrophage polarization, CCDC80 may also influence neutrophil activation or NET formation pathways. In the context of BLCA, upregulation of CCDC80 might not only enhance M1 macrophage-mediated anti-tumor activity but also counteract NET-driven immune evasion mechanisms58. In ovarian cancer, knockdown of CCDC80 inhibits cell proliferation, colony formation, and migration, while promoting apoptosis59.These studies are consistent with our findings, indicating that CCDC80 may serve as a potential therapeutic target and biomarker for BLCA.

NIBAN1, also known as FAM129A, GIG39, and C1orf24, is responsible for a protein that modulates p53-driven apoptosis. Research indicates that NIBAN1 is strongly expressed in multiple cancer cell types and has potential as a prognostic marker for various cancers, such as oral squamous cell carcinoma and thyroid cancer60,61,62. Research suggests that anticancer drugs, such as phenformin, can inhibit cancer cell growth by inducing NIBAN1 expression, thereby promoting both autophagy and apoptosis62. Specifically, in BCLA, NIBAN1 has been shown to enhances gemcitabine resistance through activation of the focal adhesion kinase (FAK) signaling pathway63. Additionally, NIBAN1 may modulate NET formation by influencing neutrophil transcriptional programs, thereby contributing to a pro-inflammatory and apoptosis-resistant tumor microenvironment that promotes cancer cell survival and proliferation64. The NET-induced inflammatory milieu may, in turn, further upregulate NIBAN1 expression, creating a feed-forward loop. Moreover, NIBAN1 may affect cellular sensitivity to chemotherapeutic agents, potentially cooperating with NETs to modulate therapeutic responses in BLCA65.

CSPG4 (chondroitin sulfate proteoglycan 4), commonly recognized as NG2 or melanoma-associated proteoglycan, is a transmembrane protein that regulates cell proliferation, migration, and angiogenesis through receptor tyrosine kinase (RTK) and phosphatase signaling66. Its aberrant expression in cancers, such as glioblastoma (GBM) and melanoma, promotes tumor invasion and metastasis67. CSPG4 is expressed in tumor-associated fibroblasts (CAFs) and endothelial cells, where it may interact with extracellular matrix (ECM) components of NETs, such as hyaluronic acid and fibrin, to stabilize NET structures and enhance their pro-invasive functions68. In BLCA, CSPG4 may facilitate tumor cell adhesion to ECM and promote migration and invasion through interactions with NETs69. Excessive NET formation could disrupt ECM homeostasis, thereby creating a permissive environment for CSPG4-overexpressing tumor cells to disseminate.

PDGFRA (platelet-derived growth factor receptor alpha), a component of the human growth factor receptor family, is a key gene whose mutations are a major driver of gastrointestinal stromal tumors (GISTs)70. PDGFRA signaling is closely associated with tumor angiogenesis, fibrosis, and immune cell recruitment, and may indirectly modulate NET formation71. In the BLCA tumor microenvironment, PDGFRA activation may regulate the function of myofibroblastic cancer-associated fibroblasts (myCAFs), altering ECM stiffness and composition, thereby influencing tumor cell growth, migration, and immune cell infiltration. The synergistic effect of NETs and PDGFRA activation may further exacerbate tumor progression and malignancy72.

MAP1A (microtubule-associated protein 1 A) is traditionally recognized for its role in maintaining neuronal architecture by stabilizing microtubules and organizing the cytoskeleton73. In cancer, research indicates that MAP1A is closely associated with the tumor immune response, and MAP1A with immune infiltration is seen as a potential prognostic biomarker for BCLA74,75. Beyond its role in maintaining cytoskeletal architecture, MAP1A is also involved in regulating microtubule dynamics and cellular secretory processes, which may directly influence NET release from neutrophils76. In BLCA, MAP1A may affect immune cell trafficking into tumor tissues by modulating microtubule-dependent motility. Furthermore, NET-associated inflammatory signaling, particularly in coordination with iCAF-derived cytokines, could influence MAP1A expression and thereby shape the tumor immune microenvironment77.

PCOLCE2 (procollagen C-endopeptidase enhancer2), also known as PCPE2, facilitates collagen maturation by activating procollagen C-proteinase. Through its dual collagen- and heparin-binding capacity, it plays a key role in extracellular matrix (ECM) remodeling78,79. In this study, we observed that PCOLCE2 was expressed at low levels in fibroblasts within the BLCA tumor microenvironment. Such downregulation may impair the fibroblasts’ capacity to process collagen, leading to structural disorganization or altered stiffness of the ECM80. Under these abnormal ECM conditions, NETs may more effectively entrap tumor cells, thereby enhancing their invasive and metastatic potential81. Clinically, High PCOLCE2 expression is correlated with poor outcomes in breast, ovarian, and gastric cancers82,83,84, In colorectal cancer, PCOLCE2 is typically expressed at low levels; however, its high expression is linked to poor prognosis85. Although the precise mechanisms underlying PCOLCE2’s oncogenic effects remain unclear, accumulating evidence highlights its significant potential as a prognostic biomarker.

Integrated analysis reveals critical immunological and molecular factors associated with BCLA. Immune infiltration profiling elucidated dynamic changes in immune cell populations stratified by risk, identifying distinct immune cell subsets, their infiltration levels, and cells significantly correlated with risk scores, each of these may have an essential role in BLCA pathogenesis. The roles of various immune cells in BLCA are heterogeneous. Notably, CD56⁺ bright NK cells, a subset of natural killer cells, is closely linked to enhanced overall survival, exhibiting a pronounced protective effect through enhanced cytokine production and cytotoxic activity compared to their CD56⁺ dim counterparts86,87,88. Activated dendritic cells (DCs), leveraging their potential in vaccine-based therapies, demonstrate promising therapeutic efficacy as professional antigen-presenting cells. The combination of activated DCs with cytokine-induced killer (CIK) cells further amplifies treatment outcomes, yielding synergistic antitumor effects89,90,91. Eosinophil infiltration in BLCA tissues is closely linked to prognosis, with higher infiltration levels observed in advanced stages. Moreover, eosinophil counts and percentages in peripheral blood acts as an indicator for predicting disease recurrence in NMIBC patients receiving BCG immunotherapy92,93. Myeloid-derived suppressor cells (MDSCs) show increased infiltration in BLCA tissues, inhibiting T cell-mediated immunity by producing reactive oxygen species (ROS) and secreting immunosuppressive cytokines, including IL-1, IL-6, and TGF-β. These processes facilitate tumor progression and metastasis by enabling immune evasion and are strongly associated with adverse prognostic outcomes94,95,96. Given their critical role in the BLCA immune microenvironment, MDSCs have emerged as a focal point in immunotherapy research. Strategies targeting MDSCs, such as reducing their levels to restore T cell-mediated antitumor activity or employing chemotherapeutic agents like cisplatin to decrease MDSC proportions, have demonstrated potential in boosting immune function and inhibiting tumor progression97,98. The tumor microenvironment (TME) plays a pivotal role in tumor initiation, progression, and metastasis99. In our study, we observed a significant positive correlation between the risk score and TIDE score, suggesting that patients in the high-risk group typically exhibit a more immunosuppressive TME100. The TIDE (Tumor Immune Dysfunction and Exclusion) score, which integrates measures of T cell dysfunction and immune exclusion, is a widely used predictor of tumor immune evasion101. Notably, high-risk patients exhibited significantly elevated TIDE, exclusion, and dysfunction scores compared to the low-risk group. These findings imply that tumor tissues in high-risk patients may impede immune cell infiltration through mechanisms such as a dense extracellular matrix, chemokine imbalances, or other physical and biochemical barriers102. The overall increase in TIDE-related parameters in the high-risk group reflects a greater capacity for immune evasion by tumor cells103. Collectively, these discoveries underscore the interaction between immune microenvironment remodeling and molecular regulation in BLCA pathogenesis, offering novel insights into the advancement of targeted treatments and immunomodulatory approaches.

GSEA identified several pathways involved in BLCA progression, among which cytochrome P450-mediated xenobiotic metabolism plays a key role in detoxification and drug response. This pathway involves cytochrome P450, epoxide hydrolase, and glutathione S-transferase, essential for metabolizing xenobiotics104. In BLCA, the expression rates of CYP1A, CYP2C, and CYP3A are 68%, 28%, and 68%, respectively, with CYP1A expression correlating with tumor grade and differentiation, suggesting its role in disease progression105,106.

Epithelial-mesenchymal transition (EMT) is another crucial process in BLCA. TWIST1 downregulation and promoter hypermethylation have been proposed as biomarkers for recurrence monitoring. Dibenzolium (DIB) induces apoptosis and inhibits EMT, while GALNT6 promotes BLCA malignancy by facilitating EMT and suppressing CD8⁺ T cell activity107,108,109.

Focal adhesion is critical for tumor interactions and therapy resistance. NIBAN1, highly upregulated in recurrent non-muscle-invasive BLCA, modulates focal adhesion kinase (FAK) signaling, contributing to gemcitabine resistance63. Notably, curcumin, in combination with visible light, inhibits adhesion and migration of BLCA cells, suggesting potential therapeutic value110.

In the treatment of BLCA, chemotherapy remains a critical adjunctive strategy; however, considerable variability in drug sensitivity among patients often leads to inconsistent therapeutic outcomes. Pazopanib, a multi-targeted tyrosine kinase inhibitor, has been shown to inhibit tumor angiogenesis by targeting VEGF receptors (VEGFR)111. In our study, pazopanib sensitivity was negatively correlated with CCDC80 expression, suggesting that patients in the high-risk group (HRG) with elevated CCDC80 levels may derive greater benefit from pazopanib treatment.

Additionally, afatinib, an irreversible inhibitor of EGFR and HER2, is commonly used in tumors harboring EGFR mutations112. Our findings indicate that patients in the low-risk group (LRG) may harbor EGFR-related mutations or exhibit lower expression of HRG-specific genes, potentially rendering them more responsive to afatinib113.These results highlight distinct patterns of drug sensitivity between HRG and LRG patients, particularly the potential advantage of pazopanib in the high-risk group, offering a novel perspective on risk-based therapeutic stratification in BLCA.

Single-cell data analysis enabled the annotation of diverse cell populations within the BLCA tumor microenvironment. Mast cells, for example, release inflammatory mediators such as histamine, TNF-α, and IL-8 that directly activate neutrophils and induce NET formation114. In BLCA, interactions between mast cells and neutrophils may exacerbate NET-mediated tissue damage, promote chronic tumor-associated inflammation, and accelerate tumor invasion and ECM remodeling115.

Conversely, natural killer (NK) cells degrade NET components via the release of perforin and granzyme, thereby mitigating NET-induced immunosuppression116. However, in BLCA, NK cell function is often impaired by tumor-derived metabolic products such as adenosine and indoleamine 2,3-dioxygenase (IDO). Under these conditions, NETs may further suppress NK cytotoxicity and facilitate immune evasion by tumor cells117.

In addition, fibroblasts and cancer-associated fibroblasts (CAFs) contribute to the tumor ECM, providing an adhesive scaffold that stabilizes NET structures and prolongs their half-life. By expressing transmembrane proteins that bind NETs, CAFs may also enhance their matrix-degrading capacity and promote tumor invasiveness118.

Collectively, these cell types contribute to the generation, persistence, and clearance of NETs through intercellular signaling and immune modulation. Therapeutic strategies aimed at disrupting mast cell–neutrophil signaling or enhancing NK cell-mediated NET degradation may represent promising directions for precision therapy in BLCA.

Single-cell analysis revealed high expression of all prognostic genes in fibroblasts, with significant differences between groups. As a result, fibroblasts were identified as a key cell type, and pseudo-time analysis demonstrated changes in the expression of prognostic genes. Cancer-associated fibroblasts (CAFs) have diverse roles in tumor development and disease progression by releasing cytokines, growth factors, and chemokines that influence immune responses, stimulate tumor growth, and drive therapeutic resistance119. Additionally, through EMT induction and extracellular matrix remodeling, CAFs strengthen tumor invasiveness and metastatic potential119,120. Recent studies have highlighted the functional heterogeneity of cancer-associated fibroblast (CAF) subtypes, particularly myofibroblastic CAFs (myCAFs) and inflammatory CAFs (iCAFs)121. MyCAFs are typically characterized by high expression of α-smooth muscle actin (α-SMA) and vimentin122, whereas iCAFs play critical roles in the tumor microenvironment by producing chemokines such as IL-1β and CXCL12, which recruit effector immune cells and promote M1 macrophage polarization to stimulate anti-tumor immune responses123.

Among the prognostic genes identified in our study, PCOLCE2, CSPG4, and PDGFRA are likely associated with myCAF-mediated ECM stiffening and tumor invasion85,124,125, while CCDC80, MAP1A, and NIBAN1 may contribute to immune activation and inflammatory regulation through their association with iCAFs56,61,75. Further research using single-cell spatial transcriptomics, subtype-specific gene knockout models, and pathway validation is warranted to elucidate the crosstalk between CAF subtypes and prognostic genes, potentially guiding novel therapeutic strategies and overcoming treatment resistance in BLCA.

Conclusion

This study identified six prognostic genes —CCDC80, NIBAN1, CSPG4, PDGFRA, MAP1A, and PCOLCE2—that are associated with mitochondrial function and NETs in BLCA. Through bioinformatics analysis, we explored their molecular regulatory mechanisms and developed a prognostic model that accurately classify BLCA patients according to survival risk. Additionally, RT-PCR validation confirmed the markedly different expression of these genes in clinical samples of muscle-invasive bladder cancer and adjacent tissues, further reinforcing their prognostic significance in BLCA.