Introduction

Bladder cancer (BCa) is one of the most common epithelial tumors in the urinary tract1,2. Despite advancements in the treatment for BCa through various therapeutic approaches like intravesical chemotherapy, immunotherapy, transurethral resection of bladder tumor (TURBT), and radical cystectomy, the five-year survival rates for BCa patients remain fairly low, with significant variations among individuals. For patients with non-muscle-invasive bladder cancer (NMIBC), the five-year survival rate is estimated to be around 90%3,4. Nevertheless, individuals diagnosed with muscle-invasive bladder cancer (MIBC) experience a significantly reduced five-year survival probability due to the deeper tumor infiltrating in the layers of bladder5,6.

The application of whole slide images (WSIs) is essential for accurate pathological evaluation in tumor categorization and staging. However, traditional prognostic methods like the TNM staging system, fall short in capturing the complex biological characteristics of bladder tumors, frequently leading to suboptimal therapeutic approaches7. Meanwhile, the variability and subjectivity inherent in manual pathological assessments, which rely on visible morphological features in WSIs, further complicate the accurate prognosis and treatment process8,9. This subjectivity can lead to inconsistencies in diagnosis and treatment decisions, which illustrates the necessity for more objective and standardized approaches10.

Recent advances in computational pathology have introduced promising alternatives to traditional methods. Deep learning algorithms have emerged as highly effective tools for improving the accuracy of diagnosis and prognosis prediction11,12,13. However, challenges in interpretability and generalizability often limit the application of these technologies in clinical practice, which are essential considerations for achieving widespread acceptability in the medical field14. Currently, numerous studies employ multiple-instance learning methods on WSIs to extract microscopic patch features for diagnosis and prognosis risk prediction15,16,17,18. Nevertheless, feature-based models sacrifice crucial tissue distribution information, which in turn limits the model’s interpretability19. Prior studies have demonstrated that macroscopic tissue spatial distributions not only enhance prognostic accuracy but also hold potential to find new tissue biomarkers20,21. Building upon this evidence, our study aimed to construct a novel deep learning system that integrates interpretable artificial intelligence (AI) technology with a comprehensive understanding of tissue distribution information in pathology slides. First, we employed the BlaPaSeg tile classification network based on the ResNeXt50 architecture to generate multi-class tissue probability heatmaps and segmentation maps from WSIs. After that, we developed two complementary prognostic networks: MacroVisionNet and UniVisionNet. MacroVisionNet focuses on analyzing broad tissue distribution patterns within the probability heatmaps to identify macro-level prognostic features essential for patient survival. In contrast, UniVisionNet was designed to integrate these macro-level prognostic features with micro-level tumor patch features generated by a self-supervised network, capturing both global and localized tissue characteristics. This deep learning-based prognostic framework was validated across multiple medical institutions and The Cancer Genome Atlas (TCGA) cohort. In addition, we explored and validated several potential prognostic biomarkers based on the tissue segmentation maps and MacroVisionNet attribution heatmap. Finally, an integrated pathology-based prognostic AI system was created to enhance its application in clinical settings

Results

Baseline characteristics

In this retrospective, multicenter, prognostic study, 1108 patients with BCa were recruited from three major medical institutions: the First Affiliated Hospital of Chongqing Medical University [CMUFH], the Second Affiliated Hospital of Chongqing Medical University [CMUSH], and the Yongchuan Hospital of Chongqing Medical University [YCH]. Randomly selected by a 7:3 ratio, 621 patients from CMUFH were included in the training dataset, and 266 patients from CMUFH were incorporated in the validation dataset during the development of the prognostic system (1 December 2012 to 30 December 2023). The external validation datasets included 113 patients from CMUSH (1 May 2013 to 30 December 2023), 108 patients from YCH (2 January 2016 to 30 December 2023), and 375 patients from the TCGA-BLCA dataset. The median follow-up time with interquartile range (IQR) was 30.5 (12.7–66.4) months for the CMUFH training group, 29.5 (13.0–58.4) months for the CMUFH validation cohort, 15.3 (5.3–37.4) months for the CMUSH validation cohort, 23.1 (11.2–44.9) months for the YCH validation cohort, and 18.0 (11.1–31.5) months for the TCGA validation cohort (Table 1).

Table 1 Baseline patients characteristics

Performance of BlaPaSeg, MacroVisionNet, and UniVisionNet across enrolled cohorts

The Receiver Operating Characteristic (ROC) curve shows the AUC of the BlaPaSeg tile classification network varied from 0.9906 (95% CI: 0.9899–0.9913) to 0.9945 (0.9939–0.9950) in the training, validation, and external validation cohorts (Fig. 1A). The detailed tissue patch classification results for each cohort are displayed in the Supplementary Fig. 2. After training the BlaPaSeg network, we applied it to infer WSIs, generating tissue probability heatmaps and tissue segmentation maps. Building on these results, we trained and validated the MacroVisionNet and UniVisionNet. For the CMUFH training cohort, the C-index for MacroVisionNet reached 0.834 (0.782–0.879) and the C-index for UniVisionNet reached 0.853 (0.809–0.895). For the CMUFH validation cohort, the C-index for MacroVisionNet achieved 0.787 (0.717–0.855) and the C-index for UniVisionNet achieved 0.797 (0.730–0.862). To further explore the performance of MacroVisionNet and UniVisionNet, we conducted verification across three external validation cohorts. The CMUSH cohort achieved a C-index of 0.788 (0.693–0.881) for MacroVisionNet and a C-index of 0.811 (0.731–0.893) for UniVisionNet. For the YCH cohort, the C-index for MacroVisionNet reached 0.752 (0.557–0.944) and the C-index for UniVisionNet reached 0.820 (0.696–0.954). In the TCGA cohort, MacroVisionNet demonstrated moderate performance with a C-index of 0.655 (0.600–0.705), and UniVisionNet reached 0.661 (0.612–0.708). Figure 1B, C presents the time-dependent area under the curves for MacroVisionNet and UniVisionNet in each cohort.

Fig. 1: Diagnostic performance of BlaPaseg and time-dependent area under the curves of MacroVisionNet and UniVisionNet.
figure 1

Area under the receiver operator characteristic curve of BlaPaseg (A). Time-dependent area under the curves of MacroVisionNet (B) and UniVisionNet (C). AUC Area under the receiver operator characteristic curve, CMUFH The First Affiliated Hospital of Chongqing Medical University, CMUSH The Second Affiliated Hospital of Chongqing Medical University, YCH Yongchuan Hospital of Chongqing Medical University, TCGA The Cancer Genome Atlas set.

Risk score cutoffs and survival analysis in multiple cohorts

Based on the maximally selected rank statistic calculated in the CMUFH training cohort, the cutoff of the MacroVisionNet risk score is 1.93, and the cutoff of the UniVisionNet risk score is 3.34. In both MacroVisionNet and UniVisionNet, patients in the high-risk groups experienced poorer survival outcomes compared to those in the low-risk groups. Specifically, the hazard ratios (HR) value of the MacroVisionNet high-risk group for OS was 16.12 (95% CI 10.45–24.87; p < 0.001) in the CMUFH training cohort, 7.58 (4.16–13.81; p < 0.001) in the CMUFH validation cohort, 9.39 (4.04–21.84; p < 0.001) in the CMUSH cohort, 18.58 (6.08–56.79; p < 0.001) in the YCH cohort, and 2.18 (1.61–2.96; p < 0.001) in the TCGA cohort. The HR value of the UniVisionNet high-risk group for OS was 14.74 (95% CI 9.56–22.70; p < 0.001) in the CMUFH training cohort, 6.76 (3.83–11.91; p < 0.001) in the CMUFH validation cohort, 10.59 (4.50–24.93; p < 0.001) in the CMUSH cohort, 18.21(5.62–59.03; p < 0.001) in the YCH cohort, and 2.20(1.59–3.03; p < 0.001) in the TCGA cohort. In the Kaplan-Meier analysis, the deep-learning risk score effectively stratified the OS risk for BCa patients across all enrolled cohorts (Figs. 2, 3). Detailed HR risk results and Kaplan-Meier curve results for each subgroup within enrolled cohorts are displayed in Supplementary Figs. 716.

Fig. 2: Kaplan-Meier and multivariable Cox regression analysis in the CMUFH cohort.
figure 2

Kaplan-Meier curves for overall survival are presented in the training set (A) and validation set (B). Forest plot for multivariable Cox regression analysis in the training set (C) and validation set (D). CMUFH The First Affiliated Hospital of Chongqing Medical University. MacroVisionNet macro vision network. UniVisionNet=unified vision network.

Fig. 3: Kaplan-Meier and multivariable Cox regression analysis in the CMUSH, YCH, and TCGA cohort.
figure 3

Kaplan-Meier curves for overall survival are presented in the CMUSH cohort, (A) YCH cohort (B), and TCGA cohort (C). Forest plot for multivariable Cox regression analysis in the CMUSH-YCH cohort, (D) and TCGA cohort (E). CMUSH The Second Affiliated Hospital of Chongqing Medical University. YCH Yongchuan Hospital of Chongqing Medical University. TCGA The Cancer Genome Atlas set. MacroVisionNet macro vision network. UniVisionNet unified vision network.

Multivariable cox regression analysis for prognostic significance

We conducted a multivariable Cox regression analysis to evaluate the prognostic significance of the risk groups, adjusting for established prognostic variables (Figs. 2, 3). In the CMUFH cohort, after adjusting for covariates including age, gender, T stage, N stage, and tumor grade, the HR value of the MacroVisionNet risk group for OS was 5.06(2.44–10.49; p < 0.001) in the training set and 4.54(1.45–14.18; p = 0.009) in the validation set. Similarly, the HR value of the UniVisionNet risk group was 4.01(1.94–8.27; p < 0.001) in the training set and 3.40(1.14–10.12; p = 0.028) in the validation set. In the TCGA cohort, the adjusted HR value in the MacroVisionNet risk group was 1.97(1.41–2.76; p < 0.001), and the adjusted HR value in the UniVisionNet risk group was 2.13(1.49–3.04; p < 0.001). To incorporate additional covariates, we performed a multivariate analysis on the combined CMUSH and YCH cohort. After adjusting for age, gender, and T stage, the HR value for OS in the MacroVisionNet risk group was 3.69 (1.69–8.06; p = 0.001), while the UniVisionNet risk group was 4.26 (95% CI 1.96–9.27; p < 0.001). The deep learning-based risk score consistently demonstrated robustness across all cohorts (Table 2). Following the multivariable Cox regression analysis, we constructed two nomograms that integrated the prediction scores with clinical information. Detailed nomogram presentations, ROC plot, and calibration plot are displayed in Supplementary Figs. 1922.

Table 2 Detailed performances of MacroVisionNet and UniVisionNet across different cohorts

AI inspired prognostic biomarker exploration

To interpret and explore which areas largely contribute to the predicted risk score, we adopted the attribution method to identify the specific attention areas inside MacroVisionNet. We represented attribution information as a two-dimensional heatmap, overlaying it with the tissue segmentation map for enhanced visualization and comprehension. For the high-risk group (Supplementary Figs. 3, 4), the attribution heatmap showed that MacroVisionNet focused on the boundary areas between tumor and muscle tissues. In contrast, for the low-risk group, the attribution heatmap indicated that MacroVisionNet focused on areas enriched with lymphocytes. Inspired by attribution information in the segmentation map, we have proposed and validated six potential quantitative prognostic biomarkers for bladder tumors. The six potential tumor prognostic biomarkers are: Integrated Muscle Tumor Score (IMTS), Tumor Muscle Infiltration Fraction (TIM), Tumor-infiltrating lymphocytes (TILs), Tumor Fraction Score (TFS), Inflammation Fraction Score (IFS), and Tumor Co-localization Score (Coloc). To quantify the distribution differences of tumors across tissues, we utilized TIM, Coloc, TFS, and IMTS, while IFS and TILs were employed to measure the distribution of lymphocytes within the tissues. The detailed definitions of these prognostic biomarkers are provided in “Methods“ section. To further validate whether the spatial distribution of tumors in specific tissues is associated with prognosis, we conducted Cox analysis and Kaplan-Meier analysis for each biomarker. The HR value of the Coloc high-risk group for OS was 5.42 (95% CI 3.42–8.58; p < 0.001) in the CMUFH training cohort, 5.92 (3.29–10.65; p < 0.001) in the CMUFH validation cohort, 5.09 (1.87–13.82; p = 0.001) in the CMUSH cohort, 3.37 (1.06–10.70; p = 0.039) in the YCH cohort, and 1.41 (1.04–1.92; p = 0.028) in the TCGA cohort. The HRs of the IMTS high-risk group for OS were 3.94 (95% CI 2.41–6.44; p < 0.001) in the CMUFH training cohort, 4.21 (2.04–8.69; p < 0.001) in the CMUFH validation cohort, 10.16 (1.29–80.33; p = 0.027) in the CMUSH cohort, 3.60 (1.14–11.38; p = 0.029) in the YCH cohort, and 1.46 (1.07–1.98; p = 0.016) in the TCGA cohort (Supplementary Fig. 5). Both IMTS and Coloc demonstrated statistical significance across all cohorts. After adjusting for age and gender covariates, the multivariate Cox analysis also confirmed that IMTS and Coloc remained statistically significant (Supplementary Table 1). This demonstrated that the infiltration distribution of tumors in muscle tissues on WSIs is related to BCa prognosis. The Kaplan-Meier curves, HR values, and the distribution differences between the high-risk and low-risk groups for each potential biomarker in MacroVisionNet are presented in Supplementary Figs. 5, 6.

Associations between biological markers, immune infiltration, and UniVisionNet risk scores

To further explore the biological associations in UniVisionNet groups, we utilized biomolecular information from the TCGA dataset. We identified a total of 1076 differentially expressed genes (DEGs) between the high-risk and low-risk groups. The GO bubble chart revealed the top 10 associated significant differences in the molecular functions, cellular components, and biological processes. These differences included epidermis development, serine-type peptidase activity, and intermediate filament organization (Supplementary Fig. 17). Immune cell type-specific analysis revealed significant differences in fourteen immune cell types, including CD4 T cells (p = 0.043), CD8 T cells (p = 0.043), neutrophil cells (p = 0.006), and macrophage cells (p < 0.001). The heatmaps present the distribution differences of all immune cells between the high-risk and low-risk groups based on the CIBERSORT and TIMER algorithms (Supplementary Fig. 18). These findings indicate a relationship between the UniVisionNet risk scores and cellular tissue information, enhancing the model’s interpretability.

Discussion

Accurate prediction of OS risk is beneficial for risk stratification and treatment selection for BCa patients. In this study, we developed and validated a deep learning-based prognostic system for BCa risk stratification. The reliability and applicability of the prognostic system are determined by the following key factors: (1) It included more comprehensive data from 887 BCa patients in CMUFH for model training and internal validation; (2) It demonstrated robust predictive performance across multiple large medical institutions and TCGA-BLCA datasets originated from different countries and ethnicities; (3) It incorporated the macro vision and micro vision information in WSIs to enhance the accuracy of risk prediction for BCa patients; (4) It explored, quantified, and validated several potential BCa prognostic biomarkers in WSIs; (5) Based on the proposed prognostic models, an end-to-end pathology-based AI prognostic system was developed to enhance its utility in clinical practice.

To explore quantitatively interpretable AI biomarker predictions, we employed the attribution method in MacroVisionNet. For individuals with a negative prognosis, the attribution maps of MacroVisionNet focus more on the areas where the tumor invades the muscularis and adipose tissue. For individuals with a positive prognosis, the MacroVisionNet attribution maps focus more on areas where tumor-associated lymphoid tissues accumulate. Inspired by this information, we explored and validated six potential biomarkers and compared the differences between these biomarkers in the low-risk and high-risk groups of the MacroVisionNet. In the univariate Cox regression analysis, both IMTS and Coloc were identified as independent prognostic factors across all enrolled cohorts22. Similarly, the IMTS and Coloc distribution differences in the MacroVisionNet risk groups were also observed in the CMUFH training, CMUFH validation, and TCGA cohort, which suggests that IMTS and Coloc are potential prognostic biomarkers for bladder cancer. The lack of robust risk stratification performance by other prognostic markers may be due to variations in data distribution within the validation and training set and potential interactions among different biomarkers. Further validation with additional datasets is needed to confirm the prognostic utility of these markers. Attribution techniques employed in this study facilitated the identification and interpretation of crucial regions within the WSIs, including places with prominent immune activity and distinct tissue distribution characteristics, aiding urologists and pathologists in understanding the model’s decision-making process. The proposed prognostic risk scores generated by MacroVisionNet offered independent prognostic insights, which were integrated with tissue distribution information to enhance overall prognostic assessment. This integration facilitated an improved risk stratification performance, especially in deciding the need and intensity of additional medications in BCa patients. During the construction of UniVisionNet, we employed a pretrained self-supervised histopathology network to extract tumor features from image tiles. By integrating macro-level prognostic features with micro-level tumor characteristics, the TransMIL network demonstrated superior model performance, leading to enhanced prognostic assessment. Additionally, the attention score heatmaps revealed that the UniVisionNet also focuses on regions of tumor muscle invasion and immune cell presence.

Significant variations in immune infiltration distribution were detected between the high-risk and low-risk UniVisionNet groups in the TCGA cohort. Specifically, the high-risk score group showed a notably elevated level of immunological infiltration in CD8 T cells, CD4 T cells, and neutrophil cells. This suggests a potential connection between immune cell infiltration, disease severity, and prognosis for BCa patients. Further research could concentrate on elucidating the precise mechanisms by which immune cell infiltration impacts BCa progression and patient prognosis23. Additionally, integrating multi-omics data and deep learning models could further enhance the predictive accuracy and interpretability of potential pathological biomarkers24,25.

Our study has some limitations. First, all slides used in our study were collected and analyzed retrospectively, which could introduce a degree of selection bias. Despite this, the favorable performance of the prognostic system in consecutive external validation cohorts suggests that the bias is not significant. Second, considering that some subgroups had a limited number of cases, there is still room for improvement in the sample size, and further external cohort verification with larger participants is necessary. Third, the high-resolution WSI inference demands high memory and powerful computer devices, whereas digital portable devices are more commonly used in resource-limited regions of developing countries. Therefore, exploring the development of lightweight AI networks for more affordable and accessible devices is essential. Lastly, the lack of a unified framework integrating histopathology, radiomics, and genomics for BCa prognosis restricts the development of multimodality-based AI prognostic systems. Hence, the development of multimodal-based AI prognostic diagnostic system is to be explored26,27.

In summary, we developed and validated a deep learning system, integrating macroscopic tissue distribution information with microscopic tumor information, to accurately predict the survival risks of bladder cancer. The output risk scores are an independent prognostic indicator that urologists and pathologists can use to stratify the OS risk in BCa patients.

Methods

Patient cohorts

This study adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, and Reporting recommendation for tumor marker prognostic studies28,29. In this retrospective, multicenter, prognostic study conducted in China, we enrolled consecutive patients with bladder cancer who underwent surgery, including radical cystectomy or TURBT, at three medical institutions. Additionally, we incorporated bladder diagnostic slides from the TCGA public database as an international validation cohort. From the medical record files of the three participating hospitals, the authors retrieved baseline participant characteristics, including clinical information, preoperative imaging reports, postoperative care records, cystoscopy follow-up documentation, and postoperative pathology reports. Patients who underwent treatment between 1 December 2012 and 30 December 2023 were enrolled in this retrospective study. The last follow-up was conducted on 28 February 2024. The records omitted ethnicity data. The follow-up data collation and verification period ranged from November 2023 to February 2024. Patients enrolled in CMUFH were randomly divided into the training set and the validation set, in a ratio of 7:3. Patients from CMUSH and YCH were allocated into two separate external validation cohorts. We designated the Cancer Genome Atlas Urothelial Bladder Carcinoma (TCGA-BLCA) set as an extra external validation cohort to enhance the generalizability of the findings. We excluded patients with a postoperative diagnosis of non-urothelial carcinoma due to the heterogeneous nature and small number of these tumors. We excluded low-quality WSIs, such as those with extreme fading, low resolution, or improper scanning. Detailed inclusion and exclusion criteria are displayed in Fig. 4.

Fig. 4: Study profile.
figure 4

CMUFH The First Affiliated Hospital of Chongqing Medical University. CMUSH The Second Affiliated Hospital of Chongqing Medical University. YCH Yongchuan Hospital of Chongqing Medical University. TCGA-BLCA The Cancer Genome Atlas Urothelial Bladder Carcinoma.

Ethics statement

This retrospective study received approval from the research ethics committee of the First Affiliated Hospital of Chongqing Medical University, the research ethics committee of the Second Affiliated Hospital of Chongqing Medical University, and the research ethics committee of Yongchuan Hospital of Chongqing Medical University. The committees waived the need for informed consent since the study solely relied on existing medical data. The study has been registered on ClinicalTrials.gov (registration number: NCT06389019).

Image acquisition

We collected hematoxylin and eosin-stained slides from the three participating hospitals. Subsequently, these slides were then scanned by digital slide scanners to get the WSIs file (40x magnification). The non-TCGA WSIs were generated using three digital slide scanners: KF-PRO-020 (Jiangfeng Bio-Information Technology Co, Ningbo, China) with a specimen-level pixel size of 0.246 μm × 0.246 μm, KF-PRO-005-EX (Jiangfeng Bio-Information Technology Co, Ningbo, China) with a specimen-level pixel size of 0.252 μm × 0.252 μm, and SQS-600P (Shengqiang Technology, Shenzhen, China) with a specimen-level pixel size of 0.206 μm × 0.206 μm. The three types of scanners were performed using 40-times objective lenses. All WSIs in the TCGA cohort were scanned with Leica Aperio scanners. Comprehensive diagnostic whole-slide data and related scanning information can be accessed from the National Institutes of Health Genomic Data Commons (GDC).

WSI annotation method and deep learning procedures

In the tissue segmentation procedure, tiles in WSIs are classified into eight classes: tumor area, connective tissue area, muscular tissue area, lymphovascular area, non-relevant areas (non-ROI), adipose tissue area, empty area, and lymphocyte area. The WSI annotation method utilizes the pre-trained Segment Anything Model (SAM) to assist pathologists in delineating various regions within the QuPath software. For precise annotation of lymphovascular and lymphocyte regions, pathologists leverage the SAM to rapidly outline the contours, followed by minor adjustments30. Pathologists efficiently outline larger regions like muscularis and connective tissue areas by drawing rectangular contours. The corresponding patch label is assigned based on the component that occupies more than fifty percent of the area. After completing the annotations, we used the DeepZoomGenerator function from the OpenSlide package to extract the corresponding patches based on the annotated coordinates. Each tile annotation is independently performed by two experienced pathologists (BL and YWT, senior expertise in clinical diagnostic pathology). In case of any disagreements, a third pathologist (YDC, chief physician with over 30 years of experience in clinical diagnostic pathology) is consulted to resolve any disputed annotations. Figure 5 depicts the workflow of the AI prognostic system comprising three primary components: tissue segmentation procedure, prognostic network construction, and model explanation compared with AI-inspired biomarker exploration. A visual representation of the application of the AI prognostic system is available in the Fig. 6.

Fig. 5: AI prognostic system workflow chart.
figure 5

MacroVisionNet macro vision network. UniVisionNet unified vision network.

Fig. 6: Example usage of the bladder cancer AI prognostication prediction system.
figure 6

The AI prognostication prediction system consists of three main components: the tissue segmentation part (BlaPaSeg inference process), the MacroVisionNet part, and the UniVisionNet part. In the PDF version of this article, please click anywhere on the figure or caption to play the video in a separate window.

BlaPaSeg network augmentation method and training strategy

We employed the ResNeXt50 as the backbone of BlaPaSeg network31. To improve the BlaPaSeg network generalization performance, common augmentation methods, including random gamma adjustment, random nineteen-degree rotation, RGB shifting, random brightness, and contrast adjustment, were applied during the training phase. Given the scarcity of lymphovascular area samples relative to other categories, dynamic augmentation was specifically applied to lymphovascular patches during the training process to mitigate the effects of data imbalance. Specifically, we introduced random initial coordinate offsets during lymphovascular patch extraction, maximizing the number of patches containing lymphovascular content. Meanwhile, we tripled the number of lymphovascular patches by duplication and further applied dynamic transformations in the training process. Additionally, to avoid overfitting, we implemented a two-stage based training process focused on identifying and labeling hard samples—patches that the BlaPaSeg model found challenging to classify. Initially, pathologists marked typical regions and areas, followed by an inference step using the initial BlaPaSeg model to identify and annotate areas where the model made errors. These hard samples were subsequently used to retrain the BlaPaSeg network. This iterative strategy was applied twice to generate additional hard samples, ultimately refining the final model.

BlaPaSeg inference procedure

In the BlaPaSeg network inference phase, WSIs were divided into patch images. Initially, we applied the OTSU technique to eliminate the background in the thumbnail image of the tissue. We generate patch images (256*256 pixels) by utilizing the coordinates of the non-blank region retrieved from the thumbnail image of WSI file (20x magnified). To mitigate information loss caused by the patch size, each patch generated by corresponding coordinates overlaps with its neighboring patches by half of its own size. Through continuous patch inference by the BlaPaSeg network and coordinate arrangement, multi-class tissue probability heatmaps and multi-class tissue segmentation maps will ultimately be generated. In other word, we modified the last output feature of fully connected layers (\({f}_{fc}\)) of ResNeXt50 into 8, \(p(i,j)\) is the probability of each patch. The \(I(i,j)\) is the input patch with corresponding coordinates. The \(p(i,j)\) is defined as follows:

$$p(i,j)=softmax({f}_{fc}({f}_{covn}(I(i,j))))$$
(1)

To capture information at both macroscopic and microscopic levels from WSIs, we employed multi-class tissue probability heatmaps created by BlaPaSeg as the macroscopic component, complemented by microscopic tissue patch images as the microscopic component.

MacroVisionNet construction

To predict survival outcomes from the probability heatmaps created by BlaPaSeg, we developed the MacroVisionNet by building upon the ResNeXt50 network. The MacroVisionNet is designed to focus on the broader view of WSIs, learning to identify and represent key features that are important for predicting patient survival. It works by analyzing probability heatmaps, which indicate different tissue types within the WSIs. Unlike the original ResNeXt50 network, we modified the initial convolutional layer to accept eight input channels. This adjustment aligns the MacroVisionNet with the multi-class tissue probability heatmaps. To make the model more efficient, we reduced the output dimension of the final fully connected layer by incorporating a fully connected layer followed by batch normalization and a ReLU activation function. Finally, a linear layer translates these feature vectors into the final survival risk scores.

UniVisionNet construction

The UniVisionNet network was developed to integrate and leverage both macro and micro-level information in WSIs. Firstly, the trained MacroVisionNet generates the macroscopic prognostic features (2048 one-dimensional features). Simultaneously, micro-level prognostic features are extracted from tumor patches identified by the BlaPaSeg network. Specifically, we selected the top 200 patches with the highest tumor probabilities, each sized at 1024×1024 pixels. In cases where fewer than 200 patches are available, the highest-probability patches are used, with zero-padding applied if necessary to maintain the required number of patches. All selected patches were subsequently color normalized using the Macenko method. Subsequently, each patch is processed through a pretrained self-supervised histopathology network to extract microscopic features. We then replicate the macroscopic feature for each selected patch and concatenate it with the corresponding microscopic feature, resulting in a comprehensive feature representation for each patch. To determine the most effective architecture for these multi-level features, we evaluated several models from previous research, including AttMIL, Patch-GCN, Perceiver, Multi-perceiver and TransMIL. Meanwhile, we compared several self-supervised histopathology networks on the downstream task of Bca prognosis, including CTransPath, Virchow, Uni, and Prov-Gigapath32,33,34,35. Among these, the combination of TransMIL with self-supervised features extracted by the Uni network demonstrated slightly more stable performance across all validation cohorts, which contributed to its selection as the backbone of UniVisionNet. By combining macro and micro-level information within the TransMIL framework, UniVisionNet effectively captures both broad and detailed patterns in WSIs, leading to improved survival predictions. Detailed ablation experiment results are presented in Supplementary Table 2.

Attribution methods and attention score visualization

To explore the relationship between MacroVisionNet and prognosis, we used saliency maps, which were generated by calculating the gradient of the loss function for risk score with respect to the input pixels, combined with tissue segmentation maps to achieve interpretation. To enhance visualization, we increased the first 30% of the saliency map values and overlaid it with the corresponding segmentation map. Similarly, to investigate the relationship between UniVisionNet and prognosis, we extracted the attention scores generated by the UniVisionNet network. These attention scores were mapped to the corresponding patch coordinates within the WSI images and overlaid on the WSI thumbnail images. This approach allowed us to visualize the specific patch regions that the UniVisionNet network focused on.

Quantification of AI inspired prognostic biomarker

Tissue fraction calculation: Utilizing the segmentation map S, the tissue fraction for each class among the six tissue classes (excluding empty areas and non-ROI areas) can be expressed as:

$$Fractio{n}_{t}=\frac{{N}_{t}}{N-{N}_{empty}-{N}_{non-ROI}}$$
(2)

\({N}_{t}\) represents the number of pixels belonging to class t in set S, \({N}_{empty}\) represents the number of empty pixels in set S, and \({N}_{non-ROI}\) indicate the number of pixels corresponding to area to be ignored in set S. N indicates the total number of pixels in set S. The segmentation map (S) is calculated by applying an argmax function to tissue probability heatmap (\(Mp\)).

$$S=argmax(Mp)$$
(3)

TFS: Tumor fraction score (TFS) represents the tumor tissue fraction in the segmentation map (S). The TFS is defined as follows:

$$TFS=\frac{{N}_{TUM}}{N-{N}_{empty}-{N}_{non-ROI}}$$
(4)

\({N}_{TUM}\) denote the number of pixels corresponding to tumor area.

\(IFS\): Infiltrating lymphocytes score (\(IFS\)) represents the infiltrating lymphocytes fraction in the segmentation map (S). The \(IFS\) is defined as follows:

$$IFS=\frac{{N}_{INF}}{N-{N}_{empty}-{N}_{non-ROI}}$$
(5)

\({N}_{INF}\) denote the number of pixels corresponding to lymphocytes.

\(TILs\): Tumor-infiltrating lymphocytes (\(TILs\)) have been identified as a significant prognostic indicator for various cancers. In our study, we assessed \(TILs\) based on the segmentation map \(S\) created by BlaPaSeg and TIL abundance (TILAb) score. To quantify TILs, we partitioned the segmentation map S into m × n grids of equal size, with each grid having a dimension of ten pixels. Subsequently, we defined the lymphocytes co-localization score M using the Morisita–Horn index. The M is defined as follows:

$$M=\frac{2{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{INF}\times {p}_{ij}^{TUM})}{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}{({p}_{ij}^{INF})}^{2}+{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}{({p}_{ij}^{TUM})}^{2}}$$
(6)

\({p}_{ij}^{INF}\) and \({p}_{ij}^{TUM}\) denote the percentage of inflammation and tumor regions in the grid-cell (i, j). Recognizing inflammatory proliferation within the tumor as a favorable prognostic factor for patient survival, the quantified TILs can be expressed as:

$$TILs=\left\{\begin{array}{cc}\frac{M}{2}\times \frac{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{INF})}{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{TUM})}, \hfill & \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM}) > 0\\ 1, \hfill & \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM})\le 0\end{array}\right.$$
(7)

\(TIM\) and \(Coloc\): Inspired by \(TILs\), we defined and verified a novel prognostic biomarker called Tumor Muscle Infiltration Fraction (\(TIM\)) and tumor Co-localization score (\(Coloc\)). \(TILs\) quantified the spatial distribution and interaction between tumor and inflammation to characterize tumor-infiltrating lymphocytes. Similarly, \(TIM\) and \(Coloc\) are employed to quantify the spatial overlap of tumor boundaries and muscularis boundaries, representing the interaction and spatial distribution between tumor and muscularis. Recognizing muscularis tissues within the tumor as a negative prognostic factor for patient survival. A higher \(TIM\) value and \(Coloc\) value reflect a more extensive infiltration of muscle tissue by the tumor, indicating a potentially more aggressive or advanced stage of the disease3. The quantified \(TIM\) and \(Coloc\) can be expressed as:

$$Coloc=\frac{2{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{MUS}\times {p}_{ij}^{TUM})}{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}{({p}_{ij}^{MUS})}^{2}+{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}{({p}_{ij}^{TUM})}^{2}}$$
(8)
$$TIM=\left\{\begin{array}{cc}\frac{Coloc}{2}\times \frac{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{MUS})}{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{TUM})}, \hfill & \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM}) > 0\\ 0, \hfill & \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM})\le 0\end{array}\right.$$
(9)

\({p}_{ij}^{MUS}\) and \({p}_{ij}^{TUM}\) denote the percentage of muscularis and tumor regions in the grid-cell (\(i,j\)).

\(IMTS\): Integrated Muscle Tumor Score (\(IMTS\)) is a novel prognostic biomarker developed to assess the extent and severity of tumor infiltration within muscle tissues, alongside the quantification of overall tumor burden within bladder cancer. The \(IMTS\) is calculated using the formula: \(IMTS=TIM\times TFS\). A higher \(IMTS\) indicates a greater extent of muscle infiltration by the tumor and a larger overall tumor presence, suggesting a potentially more aggressive or advanced disease state. The quantified IMTS can be expressed as:

$$IMTS=\left\{\begin{array}{cc}\frac{Coloc}{2}\times \frac{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{MUS})}{{\sum }_{i=1}^{m}{\sum }_{j=1}^{n}({p}_{ij}^{TUM})}\times TFS, \hfill & \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM}) > 0\\ 0, \hfill& \mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}({p}_{ij}^{TUM})\le 0\end{array}\right.$$
(10)

Statistical analyses

The concordance index (C-index) and the area under the receiver operating characteristic curve (AUC) were employed to assess the predictive performance for overall survival (OS).OS is defined as the duration from the time of surgery to death from any cause or until the date of the final follow-up. Using the threshold determined by the maximally selected rank statistic in the training set, risk scores across all cohorts were divided into two categories: high-risk and low-risk. High-risk group refers to a situation where the level is equal to or greater than the threshold, while low-risk group refers to a situation where the level is lower than the threshold. The survival differences between the groups were compared using a Kaplan-Meier analysis and log-rank test. A Cox proportional hazards model was subsequently applied for these groups. Multivariable analyses were conducted using a Cox proportional hazards model from the survival package. The cutoff values for potential biomarkers were similarly based on the maximally selected rank statistic in the training set. The edgeR package was utilized to identify differentially expressed genes (DEGs) between the low- and high-risk groups in UniVisionNet36. The Gene Ontology (GO) enrichment analysis was done for these DEGs. To further explore the correlation between risk score and immune infiltration, the CIBERSORT algorithm and TIMER algorithm were utilized to compute the proportions of tumor-infiltrating immune cells in the TCGA cohort37,38. All statistical tests were two-sided, and a p-value of less than 0.05 was considered statistically significant.

Computational hardware and software

In Python (version3.9.12) environment, several specialized packages were incorporated: PyTorch (version2.0.0) for deep learning model construction, Lifelines (version 0.27.8) for survival analyses, NumPy (version 1.24.1) and Pandas (version 2.1.2) for data handling, Albumentations (version 1.3.1) and OpenCV (version 4.8.1) for image transformations, and OpenSlide (version 1.1.2) for handling WSIs. Models were trained on a dual-GPU (Nvidia RTX 4090) workstation.