Development and validation of interpretable machine learning models to predict distant metastasis and prognosis of muscle-invasive bladder cancer patients

Deng, Qian; Li, Shan; Zhang, Yuxiang; Jia, Yuanyuan; Yang, Yanhui

doi:10.1038/s41598-025-96089-1

Download PDF

Article
Open access
Published: 06 April 2025

Development and validation of interpretable machine learning models to predict distant metastasis and prognosis of muscle-invasive bladder cancer patients

Qian Deng¹^na1,
Shan Li²^na1,
Yuxiang Zhang³,
Yuanyuan Jia⁴ &
…
Yanhui Yang⁵

Scientific Reports volume 15, Article number: 11795 (2025) Cite this article

2827 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Muscle-Invasive Bladder Cancer (MIBC) is a more aggressive disease than non-muscle-invasive bladder cancer (NMIBC), with greater chances of metastasis. We sought to develop machine learning (ML) models to predict metastasis and prognosis in MIBC patients. Clinical data of MIBC cases from 2000 to 2020 were sourced from the Surveillance, Epidemiology, and End Results (SEER) database. Clinical variables used to predict DM were identified through univariate and multivariate logistic regression, and Recursive Feature Elimination (RFE). Thirteen ML models predicting DM were evaluated based on AUC, PRAUC, accuracy, sensitivity, specificity, precision, cross-entropy, Brier score, balanced accuracy, and F-beta score. SHapley Additive exPlanations (SHAP) framework helped interpret the best model. Additionally, we utilized ML algorithm combinations to predict prognosis in MIBC patients with metastasis. A total of 43,951 T2-T4 MIBC patients aged over 18 years old from the SEER database were enrolled consecutively. Nine clinical variables were selected to predict DM. The CatBoost model was identified as the optimal predictor, with AUC values of 0.956 [0.933, 0.969] for the training set, 0.882 [0.857, 0.919] for the internal test set, and 0.839 [0.723, 0.936] for the external test set. The model achieved an accuracy of 0.875 [0.854, 0.896], sensitivity of 0.869 [0.851, 0.889], specificity of 0.883 [0.823, 0.912], and precision of 0.917 [0.885, 0.944]. SHAP analysis revealed that tumor size was the most influential factor in predicting distant metastasis. For prognosis, the “RSF + Enet[alpha = 0.8]” model emerged as the top performer, with C-index values of 0.683 in training, 0.688 in the internal test, and 0.666 in the external test sets. Our ML models provide high accuracy and dependability, delivering refined, individualized predictions for metastasis risk and prognosis in MIBC patients.

Predicting response to neoadjuvant chemotherapy in muscle-invasive bladder cancer via interpretable multimodal deep learning

Article Open access 22 March 2025

A multicenter bladder cancer MRI dataset and baseline evaluation of federated learning in clinical application

Article Open access 18 October 2024

Precise grading of non-muscle invasive bladder cancer with multi-scale pyramidal CNN

Article Open access 24 October 2024

Introduction

Bladder cancer (BLCA) is the leading malignancy affecting the urinary tract and ranks as the world’s ninth most common cancer¹. Urothelial cell carcinoma (UCC), which generally originates in the lower urinary tract such as the bladder and urethra but can also develop in upper urinary structures like the renal pelvis and ureters, is the predominant subtype of BLCA. Clinically, bladder cancer is classified into non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) according to tumor penetration depth. NMIBC includes early-stage lesions limited to the mucosa or submucosa, such as carcinoma in situ (CIS; Tis), non-invasive papillary tumors (Ta), and tumors that extend into the lamina propria (T1)². In comparison, MIBC, identified by detrusor muscle invasion and classified as T2-T4 disease, carries a greater likelihood of lymph node spread and distant metastasis³. This underscores the importance of early detection of metastatic MIBC to improve treatment strategies and patient outcomes.

MIBC is highly aggressive, with a strong tendency for both local and distant spread, greatly elevating mortality risk in affected individuals. Around 50% of MIBC cases show micro metastases at the time of diagnosis, primarily involving lymph nodes, lungs, liver, and bones⁴. Such metastatic occurrences are linked to a poor outlook, as the five-year survival rate for metastatic MIBC patients remains low, between 5% and 15%⁵. Additionally, 20–30% of those initially diagnosed with NMIBC experience disease progression to MIBC during the disease course⁶. The primary treatment for MIBC includes cisplatin-based neoadjuvant chemotherapy (NAC), followed by radical cystectomy (RC) and pelvic lymph node dissection (PLND)³. However, RC is a highly invasive surgery with notable perioperative risks, leading a significant proportion of MIBC patients to be considered ineligible and unable to undergo the procedure^7,8. Furthermore, postoperative complications, especially those related to urinary diversion, can significantly diminish quality of life (QoL), making RC an unfavorable choice for some. Although recent urinary assays have enhanced early detection of bladder cancer, treatment options for MIBC remain limited, and patient prognosis often remains poor⁹. Trimodality therapy (TMT), which includes maximal transurethral resection of the bladder tumor (TURBT) followed by concurrent chemoradiotherapy, has emerged as a promising option for patients in T2N0M0 disease¹⁰. Previously, MIBC patients unfit for RC due to physical or psychological constraints would choose TMT, but it is generally limited to those who meet certain clinical criteria¹¹. This highlights the urgent need to further investigate the risk factors influencing MIBC metastasis and to advance therapeutic strategies that enhance patient survival.

Recently, nomograms have gained popularity for prognosis prediction, though their sensitivity and specificity can be limited. Machine learning (ML), a branch of artificial intelligence, offers a practical alternative due to its strength in managing complex, non-linear relationships, making it especially effective for large datasets and unstructured information¹². With advances in big data analytics, creating new tools to identify risk factors impacting metastasis and prognosis has become achievable. However, no ML-based model currently exists for predicting metastasis in MIBC, highlighting the need for an accurate model’s development and validation. The Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/) provides extensive clinicopathological data and follow-up records for cancer patients, making it an invaluable asset for developing and validating ML models in healthcare. Although ML is widely applied across numerous fields, there is still a significant lack of research dedicated to forecasting metastasis and prognosis in MIBC patients. This study seeks to fill this gap by creating and validating predictive models with various ML algorithms, offering clinicians personalized tools to more precisely evaluate metastasis risk and prognostic outcomes for individuals with MIBC.

Materials and methods

Data collection and patient demographics

Clinicopathological information of MIBC patients from 2000 to 2020 was sourced from the SEER database. Additionally, retrospective data from The First Affiliated Hospital of Henan University of Science and Technology (2010–2020) was gathered through its electronic medical records system. This study adhered to the Declaration of Helsinki and received approval from Ethics Committee of The First Affiliated Hospital of Henan University of Science and Technology (protocol code: 2024-03-K0144, approved on 2024-03-20), with written informed consent obtained from all participants. Inclusion criteria specified T2-T4 MIBC patients with an initial diagnosis of bladder cancer (C67.0-C67.9) per the Third Edition of the International Classification of Diseases for Oncology (ICD-O-3) and patients over 18 years old. Exclusion criteria included cases missing follow-up information on survival and cause of death, survival less than one month, lacking histological confirmation, absence of surgical data, and incomplete TNM staging or grading. Selecting one month as the time point for landmark analysis can exclude the interference of patients who died early, and more accurately evaluate the long-term effects of the clinical factors on those patients who were able to survive until that time point. Metastasis was defined as the spread to distant sites at initial diagnosis.

We gathered patient data covering demographics (age, gender, race, marital status, household location, and income), disease attributes (pathological grade, TNM stage, tumor size, primary tumor location, pathology, and metastasis details), treatment information (surgery, lymph node surgery, lymph node biopsy, radiotherapy, chemotherapy), and follow-up outcomes (overall survival (OS) and cancer-specific survival (CSS) status, survival duration in months). Age and tumor size, both continuous variables, were transformed into categorical variables based on established clinical practices and prior research findings. Age was grouped into “<40,” “40–59,” “60–69,” “70–79,” and “>=80” categories^13,14, while tumor size was categorized as “<=2 cm,” “2.1-4 cm,” “4.1-6 cm,” “>6 cm,” and “Unknown”^15,16,17. Tumor size was measured based on imaging and pathology reports according to SEER database guideline. Cases were classified as having “Distant Metastasis” if metastasis was identified in the brain, bone, liver, lung, or distant lymph nodes during follow-up, or if tumors were verified as M1 stage via pathology or imaging at diagnosis. Missing data from either the database or hospital records were categorized as “Unknown.” The minimum sample size required for the external validation cohort was calculated using Riley et al.’s formula¹⁸.

Model development and validation for predicting distant metastasis

To identify predictive variables, we employed univariate and multivariate logistic regression analyses in the training cohort, selecting those with P-values below 0.05 for the following feature selection. We then applied recursive feature elimination (RFE) technique to perform feature selection based on six ML algorithms, namely categorical boosting (CatBoost), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), decision tree (DT), and gradient boosting machine (GBM), with 10-fold cross-validation¹⁹. RFE operates by training an ML model and ranking features based on their predictive strength, iteratively evaluating and removing features until all are assessed and prioritized²⁰. We applied the Robust Rank Aggregation (RRA) method to consolidate the importance ranks of variables across the six ML algorithms in RFE process, yielding an integrated ranking of clinical variables²¹. The random seed was set to “123”. After feature selection, we developed the model by 13 ML algorithms, including CatBoost, RF, SVM, XGB, DT, GBM, k-nearest neighbor (KNN), logistic regression (LR), naive bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), neural network (NNET), and generalized linear model (GLM), instead of building a mixed model, using the “mlr3” R package²². This approach enabled us to compare each algorithm’s performance to identify the most effective predictive model for distant metastasis. To address potential class imbalance, which could bias performance metrics, we used the Synthetic Minority Over-sampling Technique (SMOTE) during model training²³. We refined the model development by applying nested resampling, which utilized a two-level k-fold cross-validation structure: an outer layer for hyperparameter optimization and an inner layer dedicated to model selection. Additionally, we conducted a random search with 1,000 evaluations within a 10-fold cross-validation framework, repeating the process five times for each model.

Subsequently, metrics including area under the curve (AUC), area under the precision-recall curve (PRAUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier score, balanced accuracy (bacc), and F-beta score (beta = 1) were calculated to identify the top-performing ML model. For internal validation, we employed 10-fold cross-validation. Precision-recall curves (PRC) were generated to evaluate each model’s handling of imbalanced data, while calibration curves assessed discrimination ability. Decision curve analysis (DCA) was used to gauge the clinical benefits of the ML model, implemented through the “runway” R package (https://github.com/ML4LHS/runway/). The best model was selected based on high AUC, high PRAUC, low Brier score, an accurate calibration curve, and balanced accuracy and F-beta score (beta = 1). To determine each variable’s significance, we calculated its mean contribution to the AUC as a percentage of the full model using the “DALEX” R package²⁴. SHapley Additive exPlanations (SHAP) values, visualized in a bee swarm plot via the “shapviz” R package (https://github.com/ModelOriented/shapviz)²⁵, were applied to interpret the black-box ML model.

Integrating ML algorithms for prognosis prediction

Univariate and multivariate Cox analyses were conducted to identify clinical variables with P-values below 0.05 that could predict overall survival (OS). We applied 10 ML algorithms, including random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), GBM, and survival support vector machine (survival-SVM), to estimate prognosis in terms of OS for MIBC patients with distant metastasis. In total, 101 ML algorithm configurations were tested on the training set, using a leave-one-out cross-validation (LOOCV) framework to identify the optimal prognostic model. Models utilizing fewer than three clinical variables were excluded from consideration. The concordance index (C-index) for each ML configuration was then computed across training, internal validation, and external validation sets. The top five ML configurations, achieving the highest average C-index across the three datasets, were selected for subsequent assessment using k-fold cross-validation to address overfitting and ensure model stability and generalizability. Logarithmic loss, recall, and decision calibration were calculated to identify the most effective prognostic ML model using the “mlr3proba” R package²⁶. Patient risk scores were derived through a linear combination for each prognostic ML configuration. The median risk score in the training set was used as a threshold, allowing us to categorize patients in the training, internal validation, and external validation sets into high- or low-risk groups. Kaplan-Meier (KM) survival analysis and log-rank testing were conducted with the “survival” and “survminer” R packages. Additionally, AUC, time-dependent ROC curves, calibration plots, and DCA were performed to evaluate the precision, discrimination capability, and clinical value of the optimal model.

Results

Demographic and clinical baseline characteristics

To develop a predictive model for distant metastasis, data from 43,951 MIBC patients in the SEER database and 125 MIBC patients from The First Affiliated Hospital of Henan University of Science and Technology (used as the external validation set) were collected. SEER patients were randomly divided into training and internal test sets with a 7:3 split. In the training cohort, 2,513 patients (8.17%) exhibited distant metastasis, while 28,253 patients (91.8%) did not. The internal test cohort included 1,048 patients (8.02%) with metastasis and 12,012 patients (92.0%) without it. Detailed clinical characteristics for both groups are provided in Table 1.

Table 1 Clinicopathological characteristics of MIBC patients in the training, internal validation and external validation cohorts.

Full size table

To construct a prognostic model for predicting outcomes in MIBC patients with distant metastasis, patients from the SEER database with distant metastasis were similarly divided into training and internal test sets in a 7:3 ratio. An additional 100 MIBC patients with distant metastasis from The First Affiliated Hospital of Henan University of Science and Technology served as the external validation group. In the training set, 2,394 patients (95.5%) had died by the last follow-up, while 114 (4.55%) remained alive (Table 2). For the internal test set, 1,016 patients (96.5%) had died, and 37 (3.51%) were alive (Table 2). Median follow-up durations for the training, internal, and external validation sets were 7.00 [3.00;15.0] months, 7.00 [3.00;14.0] months, and 7.00 [2.00;14.0] months, respectively (Table 2). Figure 1 outlines the selection process for MIBC patients from the SEER database.

Table 2 Clinicopathological characteristics of MIBC patients with distant metastasis in the training, internal validation and external validation cohorts.

Full size table

Feature selection for the predictive model

We began by calculating the correlation coefficients of baseline clinical variables and visualizing the results with a lollipop chart, which indicated that “N stage” had the strongest association with metastasis (Fig. 2A). Based on clinical expertise, 19 variables were selected for logistic regression analysis (Table 3), while variables with correlation coefficients above 0.6 were excluded (Supplementary Fig. 1A). Next, univariate and multivariate logistic regression analyses were conducted on the training set, identifying “Age,” “Primary Site,” “Histology,” “Grade,” “T Stage,” “N Stage,” “Surgery Type,” “Lymph Node Surgery,” “Lymph Node Biopsy,” “Surgery in Other Sites,” “Radiotherapy,” “Chemotherapy,” and “Tumor Size” as significant predictors for metastasis (P < 0.05, Table 3; Fig. 2C). Further correlation analysis between model variables and metastasis highlighted “N stage” as the most influential predictor (Fig. 2B). We applied RFE with six ML algorithms (GBM, SVM, RF, DT, XGB, and CatBoost) to identify the relevant clinical features (Fig. 2D–I). RFE analysis revealed that RF was the optimal model, retaining 11 variables and achieving the highest AUC of 0.844 (Fig. 2F). The RRA method was then used to create an overall ranking of each clinical variable across the six ML algorithms, with “Age” emerging as the most influential factor (Supplementary Table 1). Given the strong associations between “Surgery Type,” “Lymph Nodes Surgery,” “Lymph Node Biopsy,” and “Surgery in Other Sites,” only “Surgery Type” and “Surgery in Other Sites” were retained for further analysis (Supplementary Fig. 1A). Ultimately, nine clinical variables with a frequency greater than three in RRA analysis, indicating substantial importance in most ML selection methods, were included in the subsequent model development steps (Supplementary Table 1). In summary, we selected the ultimate model variables by logistic regression analysis, RFE based on six ML algorithms, and RRA algorithm, guaranteeing the robustness of our feature selection process.

Table 3 Univariate and multivariate logistics regression analyses of MIBC patients for distant metastasis in the training cohort.

Full size table

Design and assessment of the predictive model

To accurately predict metastasis, “Age,” “Chemotherapy,” “N Stage,” “Histology,” “Radiotherapy,” “Surgery in Other Sites,” “Surgery Type,” “Tumor Size,” and “T Stage” were selected through the feature selection process. Thirteen ML algorithms, including CatBoost, RF, SVM, XGB, DT, GBM, KNN, LR, NBC, LDA, QDA, NNET, and GLM, were employed to build the predictive model in the training set, with hyperparameters fine-tuned via 10-fold cross-validation and random search. Model performance was then evaluated in both internal and external validation sets. ROC curves (Figs. 3A, 4A and 5A) and AUC values (Figs. 3B, 4B and 5B) highlighted CatBoost as the leading model, achieving the highest AUC scores of 0.956 [0.933, 0.969], 0.882 [0.857, 0.919], and 0.839 [0.723, 0.936] across the training, internal test, and external test sets. To thoroughly evaluate model performance, we calculated accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, balanced accuracy (bacc), and F-beta score (beta = 1) across the 13 ML models. Results indicated that CatBoost was the most precise and reliable for predicting distant metastasis (Figs. 3C, 4C and 5C). CatBoost achieved outstanding metrics with an accuracy of 0.875 [0.854, 0.896], sensitivity of 0.869 [0.851, 0.889], specificity of 0.883 [0.823, 0.912], and precision of 0.917 [0.885, 0.944] (Fig. 3C). Grid search during hyperparameter tuning identified optimal CatBoost settings: depth of 4, learning rate of 0.03927546, iterations of 669, and leaf regularization of 6.098357. Precision-recall curves (PRC) demonstrated CatBoost’s effectiveness in managing imbalanced data (Figs. 3D, 4D and 5D). Calibration curves demonstrated that the CatBoost model excelled in both fit and predictive accuracy (Figs. 3E, 4E and 5E). DCA indicated that CatBoost provided the greatest benefit for predicting distant metastasis (Figs. 3F, 4F and 5F). 10-fold cross-validation further confirmed that CatBoost surpassed other ML algorithms (Supplementary Fig. 1B). Thus, CatBoost was identified as the most effective algorithm to build ML model for predicting distant metastasis, supporting clinical decision-making.

Model interpretation

To provide interpretability for the ML models, we calculated feature importance scores for each model, which were derived based on each algorithm’s intrinsic properties, showing that “N Stage” was the most influential risk factor for metastasis (Fig. 6A). We then applied the SHAP framework to further explain the top-performing CatBoost model. By evaluating the mean absolute SHAP values, it was evident that “Tumor size” had the greatest predictive power (Fig. 6B). Additionally, a bee swarm plot was used to illustrate the effect of each risk factor on metastasis (Fig. 6C). The y-axis indicates the strength of each risk factor, while the x-axis shows their impact on the model output, specifically the probability of metastasis, as quantified by SHAP values. The SHAP analysis revealed that factors such as absence of surgery, high N stage, advanced T stage, older age, lack of surgery in additional sites, and no radiotherapy are associated with a higher probability of metastasis (Fig. 6C). To demonstrate model interpretability, we examined two representative patients. SHAP values were employed to evaluate the influence of each feature on the model’s predictions. In this study, lower SHAP values corresponded to a reduced likelihood of metastasis, while higher values indicated an increased probability. We set the median SHAP value (0.0359) as the threshold for distinguishing between low and high metastasis risk. For example, the first patient, who did not experience metastasis, had a SHAP value and prediction score of -0.0143, suggesting a low metastasis risk (Fig. 6D). Conversely, the second patient, with confirmed metastasis, showed a SHAP value and prediction score of 0.0984, indicating a high probability of metastasis (Fig. 6E).

Prognostic model for metastatic MIBC patients

To build an accurate model for predicting OS in MIBC patients with metastasis, we performed univariate and multivariate Cox analyses. These analyses identified “Histology,” “T Stage,” “N Stage,” “Surgery Type,” “Lymph Node Surgery,” “Chemotherapy,” “Tumor Size,” “Bone Metastasis,” “Brain Metastasis,” “Liver Metastasis,” and “Lung Metastasis” as independent prognostic factors for OS in metastatic MIBC (P < 0.05, Table 4). Using the clinical variables identified, we generated 101 prognostic ML combinations within a LOOCV framework. The C-index of each model was calculated across the training, internal validation, and external validation sets (Fig. 7A). Among the top five combinations with the highest C-index across all datasets, we assessed model performance through logarithmic loss, recall, and decision calibration, which highlighted the “RSF + Enet [alpha = 0.8]” model for its strong calibration and accuracy (Supplementary Fig. 1C). The optimal model was “RSF + Enet [alpha = 0.8],” employing RSF for feature selection (Fig. 7B) and an elastic network with alpha set to 0.8 for model construction (Fig. 7C), achieving the highest average C-index (0.679) across the three datasets (Fig. 7A). A prognostic ML model, “RSF + Enet [alpha = 0.8],” was developed to predict OS in metastatic MIBC patients, identifying “Chemotherapy” as the most influential factor in both the RSF and Enet [alpha = 0.8] algorithms (Fig. 7B, C). Using risk scores calculated by Enet[alpha = 0.8], we set the median risk score in the training cohort as a threshold to classify patients into high- and low-risk groups. Kaplan-Meier (K-M) curves validated the model’s effectiveness in risk stratification, showing that low-risk patients had significantly longer OS than high-risk patients in the training, internal validation, and external validation cohorts (Fig. 7D). Additionally, ROC curves for 1-, 3-, and 5-year OS highlighted the model’s strong specificity (Fig. 7E). Time-dependent ROC curves showed that the “RSF + Enet [alpha = 0.8]” model consistently outperformed others across most time points, demonstrating superior discrimination and predictive power over traditional clinical variables (Fig. 7F). Calibration curves (Fig. 7G) and DCA curves (Fig. 7H) further highlighted the model’s accuracy and clinical value. Together, these evaluation metrics confirm that the “RSF + Enet [alpha = 0.8]” model provides a robust and superior tool for predicting OS in MIBC patients with metastasis, outperforming other ML models in reliability and predictive performance.

Table 4 Univariate and multivariate Cox regression analyses of MIBC patients with distant metastasis for overall survival in the training cohort.

Full size table

Discussion

Muscle-invasive bladder cancer (MIBC) is a particularly aggressive form of bladder cancer, known for its ability to invade the detrusor muscle. Once this barrier is breached, the risk of metastasis increases significantly, leading to a poorer outlook for patients. Metastasis typically spreads through lymphatic and vascular pathways, often reaching organs like the lungs, liver, and bones. The development of metastatic disease is associated with sharply reduced survival rates, as treatment options become increasingly limited and less effective²⁷. Therefore, accurately predicting metastasis and identifying prognostic risk factors are crucial for managing MIBC. This study aimed to develop innovative ML models to predict metastasis at an early stage and to assess prognosis in MIBC patients with metastasis. By collecting clinical data on key variables and building ML models through a benchmark framework, we generated risk scores to aid in both metastasis prediction and prognosis estimation, supporting precise risk stratification. The model evaluates clinical and demographic factors to generate a risk score for metastasis and prognosis, assisting clinicians in decision-making and personalized treatment planning.

Previous research has identified various biomarkers, molecular subtypes, immune cell infiltration patterns, and protein expression levels as possible indicators of metastasis in MIBC. For example, combining CD8 + T cell infiltration with molecular subtype analysis has revealed that patients with the Basal-CD8 + T low subtype tend to have poorer OS and a higher risk of metastasis compared to other subtypes²⁸. Matrix metalloproteinase-14 (MMP14), associated with the basement membrane, has emerged as a critical marker linked to poor prognosis and heightened metastatic risk in MIBC. Elevated MMP14 expression correlates with aggressive tumor behavior and reduced treatment response²⁹. Additionally, recent studies using deep learning models have shown that lymphocytic inflammation within the tumor stroma may predict lymph node metastasis in MIBC, underscoring the significance of tumor microenvironment features in metastasis prediction³⁰. These findings imply that integrating molecular, immune, and histological biomarkers could improve the precision of metastasis predictions, assisting clinicians in creating more targeted treatment strategies.

Our research emphasizes the use of clinical variables for predicting metastasis, noting that older MIBC patients typically experience higher rates of metastasis and mortality compared to younger patients. This is partially due to their reduced likelihood of undergoing aggressive treatments like NAC and RC³¹. Additionally, the T stage, reflecting the degree of tumor invasion into the bladder wall and nearby tissues, plays a crucial role in influencing the risk of distant metastasis in MIBC³. As the T stage advances from T2 (invasion of the muscle layer) to T4 (invasion into nearby organs or tissues), the risk of metastasis rises. Higher T stages, particularly T3 and T4, are linked to an increased likelihood of cancer spreading to distant organs, including the liver, lungs, and bones. This underscores the importance of precise staging for prognosis and guiding treatment choices. The N stage, indicating regional lymph node involvement, is a critical factor to assess the risk of distant metastasis for MIBC patients. Those with lymph node metastasis (N1-N3) face a substantially higher chance of developing distant metastasis than patients without lymph node involvement (N0). This relationship between lymph node status and distant spread highlights the importance of precise lymph node staging to inform the selection of neoadjuvant or adjuvant therapies, ultimately aiming to improve patient outcomes³. These above three clinical variables, namely age, T stage and N stage, were not only significantly predictive for metastasis in our ML model, but also powerful in non-ML nomograms, confirming their huge impact on clinical decision-making³².

The choice of surgical treatment for MIBC has a major impact on the risk of metastasis and long-term survival outcomes. RC combined with PLND is considered the gold standard, but it is not a frequent combination during clinical practice. Research has indicated that RC could offer equivalent OS and CSS compared to bladder-preserving trimodal therapy (TMT) particularly in T2N0M0 disease¹⁰. Although short-term results (1–2 years) show minimal differences between TMT and RC, RC provides better outcomes at 5-year and 10-year follow-ups, especially in lowering the risk of metastasis^33,34. For patients unable to undergo RC due to health conditions or personal preference, bladder-preserving options like TMT remain feasible, although they may carry a greater risk of recurrence and metastasis^35,36. Tumor size is also a crucial factor in determining prognosis and the likelihood of metastasis in MIBC. Larger tumors are linked to higher metastasis risks and poorer OS. Research has shown that patients with larger tumors are often at more advanced disease stages, which correlates with an increased chance of distant spread³⁷. Additionally, tumor size, along with factors like pathological response to treatment, is frequently used to estimate long-term survival and recurrence risks in patients undergoing RC or NAC³⁸.

The value of our study lies in its potential to enhance management and treatment strategies for patients with MIBC. By providing a dependable risk stratification tool, the model supports clinicians in making tailored and data-informed decisions for treatment. For example, patients deemed high-risk for metastasis could be prioritized for more intensive surgical options, supplemental therapies, and rigorous post-operative monitoring, which may contribute to improved survival rates. On the other hand, patients classified as low-risk could benefit from less aggressive treatments, reducing the likelihood of unnecessary side effects and complications linked to overtreatment. Additionally, the model’s predictions can help tailor adjuvant therapy choices, set appropriate follow-up intervals, and determine when extra lab tests are needed. Incorporating this predictive tool into clinical workflows is likely to enhance data-driven decision-making, improving both patient outcomes and resource efficiency. Furthermore, it can aid in standardizing care across providers and institutions, thereby minimizing variation in treatment practices and outcomes for MIBC patients.

Despite the innovative attempt of developing a ML model for predicting metastasis in our study, clinical utility depends on the interpretability and practicality of the selected variables should be noticed³⁹. One of the main strengths of our study is utilization of interpretable ML algorithms, particularly through SHAP values and the DALEX framework, to pinpoint crucial factors affecting metastasis predictions. The CatBoost algorithm, a gradient boosting method built on symmetric decision trees (oblivious trees), delivers high accuracy with minimal parameter tuning, excelling at handling categorical data⁴⁰. CatBoost’s performance is on par with other advanced ML techniques, proving effective across various applications. However, the model’s “black-box” nature calls for interpretability, which we provided using SHAP summary plots and force maps. These visualization tools give clinicians a straightforward way to understand the primary factors driving survival predictions, making the model’s insights more accessible for informed clinical decisions. Furthermore, incorporating advanced techniques, such as RFE with cross-validation (RFECV), GridSearchCV for hyperparameter tuning, and SMOTE oversampling to address sample imbalance, has significantly boosted the model’s accuracy in predicting metastasis. With this precise ML model, clinicians can create individualized treatment plans, allowing for timely adjustments in therapy that enhance diagnostic and prognostic outcomes for MIBC patients.

Although this study has several strengths, it also has some limitations. Firstly, approximately 8–9% of MIBC patients in SEER database developed metastasis, which is lower than anticipated based on the general survival trends. The primary reason for this discrepancy lies in the limitations of the retrospective database. In retrospective studies, reporting of metastasis may be incomplete, particularly in patients who transition to palliative care. Although SEER database contains systematic follow-up data, metastatic screening and documentation were not consistently applied across all cases. Some patients may have been lost to follow-up or received treatment at other institutions, which may have led to an underrepresentation of metastatic cases. We acknowledge that this limitation may impact the generalizability of our results, particularly when assessing overall prognosis and metastasis-related risks. Future studies should aim to include a more representative sample of metastatic patients, or explore ways to improve metastatic screening and follow-up processes to minimize such biases. Secondly, over 50% of MIBC patients in SEER database received only ‘Local tumor destruction/excision,’ with a subset also undergoing radiotherapy (17%). We acknowledge that this high proportion may influence overall dataset composition and outcomes, as these patients generally have a poor prognosis and shorter follow-up. However, we deliberately included these patients to ensure our study reflects real-world clinical practice, where certain patients, due to age, comorbidities, or treatment constraints, may receive limited interventions rather than standard radical treatment. Excluding them entirely could introduce selection bias and limit the generalizability of our findings. Nonetheless, we recognize that the inclusion of these patients may impact survival-related analyses and should be carefully considered when interpreting the results. Future studies may benefit from stratified analyses or alternative modeling approaches to account for the potential influence of treatment heterogeneity on outcomes.

Besides, while we determined the required sample size for external validation, patient availability with complete follow-up data was limited, resulting in a smaller validation cohort than ideal. Acknowledging that larger sample sizes provide more reliable evaluations, we gathered the maximum feasible number of samples given current constraints. To address this limitation, a 10-fold cross-validation approach was used and improved the model’s generalizability. Future studies will focus on enlarging the external validation cohort to further validate the model’s reliability. Moreover, differences in data collection methods across various hospitals in SEER database, along with the study’s retrospective design, led to some missing clinical variables. Additionally, essential clinicopathological details like imaging type and scope, comorbidities, inflammatory markers, genetic information, and occupational exposures were excluded due to their absence in the SEER database. Although we included a wide array of features to enhance predictive accuracy, this complexity might limit the model’s practicality in real-world clinical applications. Finally, the model has not yet been tested in clinical practice, highlighting the need for prospective, multicenter, large-scale validation studies to thoroughly evaluate its generalizability and suitability in real-world settings. Such future research will be essential for confirming the model’s effectiveness across varied patient groups and clinical contexts.

Conclusions

In this study, we identified key risk factors for predicting metastasis in MIBC patients and developed a CatBoost-based prediction model within an ML benchmark framework, outperforming both traditional clinical metrics and other ML models in accuracy and effectiveness. Additionally, we created an RSF + Enet [alpha = 0.8] prognostic model to reliably forecast outcomes for MIBC patients with metastasis. Our work provides a foundation for future advancements in metastasis prediction and prognosis assessment, aiding clinicians in making informed treatment decisions and personalizing therapy plans.

Data availability

The original contributions presented in the study are included in the article or Supplementary Material. Further inquiries about the raw data and code can be directed to the corresponding authors.

References

Antoni, S. et al. Bladder cancer incidence and mortality: a global overview and recent trends. Eur. Urol. 71(1), 96–108 (2017).
Article PubMed MATH Google Scholar
Knowles, M. A. & Hurst, C. D. Molecular biology of bladder cancer: new insights into pathogenesis and clinical diversity. Nat. Rev. Cancer. 15(1), 25–41 (2015).
Article CAS PubMed MATH Google Scholar
Alfred Witjes, J. et al. Updated 2016 EAU guidelines on muscle-invasive and metastatic bladder cancer. Eur. Urol. 71(3), 462–475 (2017).
Article CAS PubMed MATH Google Scholar
Stein, J. P. et al. Radical cystectomy in the treatment of invasive bladder cancer: long-term results in 1,054 patients. J. Clin. Oncol. 41(22), 3772–3781 (2023).
Article PubMed MATH Google Scholar
Galsky, M. D. et al. Real-world effectiveness of chemotherapy in elderly patients with metastatic bladder cancer in the united States. Bladder Cancer. 4(2), 227–238 (2018).
Article PubMed Google Scholar
Cumberbatch, M. G. K. et al. Epidemiology of bladder cancer: a systematic review and contemporary update of risk factors in 2018. Eur. Urol. 74(6), 784–795 (2018).
Article PubMed MATH Google Scholar
Smith, A. B. et al. Muscle-invasive bladder cancer: evaluating treatment and survival in the National Cancer Data Base. BJU Int. 114(5): 719–726. (2014).
Article PubMed MATH Google Scholar
Kang, N. W. et al. Comparative survival analysis of bladder preservation therapy versus radical cystectomy in muscle-invasive bladder cancer. Cancer Med. 13(2), e6972 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Urine DNA methylation assay enables early detection and recurrence monitoring for bladder cancer. J. Clin. Invest. 130(12), 6278–6289 (2020).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Zlotta, A. R. et al. Radical cystectomy versus trimodality therapy for muscle-invasive bladder cancer: a multi-institutional propensity score matched and weighted analysis. Lancet Oncol. 24(6), 669–681 (2023).
Article PubMed Google Scholar
Jiang, D. M., Chung, P., Kulkarni, G. S. & Sridhar, S. S. Trimodality therapy for muscle-invasive bladder cancer: recent advances and unanswered questions. Curr. Oncol. Rep. 22(2), 14 (2020).
Article PubMed Google Scholar
Collins, G. S. et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 11(7), e048008 (2021).
Article PubMed PubMed Central Google Scholar
Wen, P., Wen, J., Huang, X. & Wang, F. Development and validation of nomograms predicting the 5- and 8-year overall and cancer-specific survival of bladder cancer patients based on SEER program. J. Clin. Med., 12(4). (2023).
Fang, W., Yang, Z. Y., Chen, T. Y., Shen, X. F. & Zhang, C. Ethnicity and survival in bladder cancer: a population-based study based on the SEER database. J. Transl Med. 18(1), 145 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Karakiewicz, P. I. et al. Precystectomy nomogram for prediction of advanced bladder cancer stage. Eur. Urol. 50(6), 1254-60; discussion 1261-2. (2006).
Li, S. et al. Establishment and validation of nomograms to predict the overall survival and cancer-specific survival for non-metastatic bladder cancer patients: A large population-based cohort study and external validation. Med. (Baltim). 103(11), e37492 (2024).
Article CAS Google Scholar
Sylvester, R. J. et al. Predicting recurrence and progression in individual patients with stage Ta T1 bladder cancer using EORTC risk tables: a combined analysis of 2596 patients from seven EORTC trials. Eur. Urol. 49(3), 466-5; discussion 475-7. (2006).
Riley, R. D. et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat. Med. 40(19), 4230–4251 (2021).
Article MathSciNet PubMed MATH Google Scholar
Li, S., Wang, J., Zhang, Z., Ren, C. & He, D. Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation. Int. J. Med. Inform. 105813. (2025).
Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002).
Article MATH Google Scholar
Kolde, R., Laur, S., Adler, P. & Vilo, J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Lang, M. et al. mlr3: A modern object-oriented machine learning framework in R. J. Open. Source Softw. 4, p1903 (2019).
Article ADS MATH Google Scholar
Wang, K. et al. Improving risk identification of adverse outcomes in chronic heart failure using SMOTE + ENN and machine learning. Risk Manag Healthc. Policy. 14, 2453–2463 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Biecek, P. Dalex: explainers for complex predictive models in R. J. Mach. Learn. Res. 19. (2018).
Lo, Y. T., Liao, J. C., Chen, M. H., Chang, C. M. & Li, C. T. Predictive modeling for 14-day unplanned hospital readmission risk by using machine learning algorithms. BMC Med. Inf. Decis. Mak. 21(1), 288 (2021).
Article Google Scholar
Sonabend, R., Király, F. J., Bender, A., Bischl, B. & Lang, M. mlr3proba: An R package for machine learning in survival analysis. Bioinformatics. 37(17), 2789–2791. (2021).
Soukup, V. et al. Follow-up after surgical treatment of bladder cancer: a critical analysis of the literature. Eur. Urol. 62(2), 290–302 (2012).
Article PubMed MATH Google Scholar
Li, B. et al. Integrating molecular subtype and CD8(+) T cells infiltration to predict treatment response and survival in muscle-invasive bladder cancer. Cancer Immunol. Immunother. 73(4), 66 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Basement membrane-related MMP14 predicts poor prognosis and response to immunotherapy in bladder cancer. BMC Cancer. 24(1), 746 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Zheng, Q. et al. Predicting lymph node metastasis status from primary muscle-invasive bladder cancer histology slides using deep learning: a retrospective multicenter study. Cancers. 15(11). (2023).
Posielski, N., Koenig, H., Ho, O., Porter, C. & Flores, J. P. Use of neoadjuvant chemotherapy in elderly patients with muscle-invasive bladder cancer: a population-based study, 2006–2017. Oncology 36(1), 21–33 (2022).
PubMed Google Scholar
Shi, S., Peng, G., Luo, L. & Li, D. Predictive nomograms for risk and prognostic factors in metastatic bladder cancer: a population-based study. Transl Cancer Res. 12(12), 3284–3302 (2023).
Article PubMed PubMed Central MATH Google Scholar
Matsukawa, A. et al. Trimodality therapy versus radical cystectomy for muscle-invasive bladder cancer: a systematic review and meta-analysis of matched cohort studies. Eur. Urol. Focus (2024).
Qiu, J. et al. Comparing long-term survival outcomes for muscle-invasive bladder cancer patients who underwent with radical cystectomy and bladder-sparing trimodality therapy: a multicentre cohort analysis. J. Oncol. 2022, 7306198. (2022).
Fong, K. Y. et al. Trimodality therapy versus radical cystectomy for muscle-invasive bladder cancer: A systematic review and meta-analysis. Urol. Oncol. (2025).
Su, X., Dong, C., Liao, W. & Liu, W. Oncological effectiveness of bladder-preserving trimodal therapy versus radical cystectomy for the treatment of muscle-invasive bladder cancer: a system review and meta-analysis. World J. Surg. Oncol. 21(1), 271 (2023).
Article PubMed PubMed Central Google Scholar
Yin, M. et al. Neoadjuvant chemotherapy for muscle-invasive bladder cancer: a systematic review and two-step meta-analysis. Oncologist 21(6), 708–715 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
van Ginkel, N. et al. Survival outcomes of patients with muscle-invasive bladder cancer according to pathological response at radical cystectomy with or without neo-adjuvant chemotherapy: a case-control matching study. Int. Urol. Nephrol. 54(12), 3145–3152 (2022).
Article PubMed PubMed Central MATH Google Scholar
Yan, M. et al. Comparative analysis of nomogram and machine learning models for predicting axillary lymph node metastasis in early-stage breast cancer: a study on clinically and ultrasound-negative axillary cases across two centers. Ultrasound Med. Biol. 51(3), 463–474 (2025).
Article PubMed Google Scholar
Zhang, C. et al. Using catboost algorithm to identify middle-aged and elderly depression, National health and nutrition examination survey 2011–2018. Psychiatry Res. 306, 114261 (2021).
Article ADS PubMed Google Scholar

Download references

Acknowledgements

We are grateful to the Surveillance, Epidemiology, and End Results (SEER) database for providing data. We especially thank Ting Wei and Chunqian Yang from Zhujiang Hospital of Southern Medical University for their generous help in methodology.

Author information

Qian Deng and Shan Li contributed equally to this work.

Authors and Affiliations

Luoyang Central Hospital Affiliated of Zhengzhou University, Henan, China
Qian Deng
Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, China
Shan Li
Department of Urology Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
Yuxiang Zhang
Department of Oncology, Huai’an Second People’s Hospital, Affiliated to Xuzhou Medical University, Huai’an, Jiangsu, China
Yuanyuan Jia
Department of Emergency Surgery (Trauma Center), The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
Yanhui Yang

Authors

Qian Deng
View author publications
Search author on:PubMed Google Scholar
Shan Li
View author publications
Search author on:PubMed Google Scholar
Yuxiang Zhang
View author publications
Search author on:PubMed Google Scholar
Yuanyuan Jia
View author publications
Search author on:PubMed Google Scholar
Yanhui Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

QD and SL: Collection and assembly of data, Investigation, Visualization, Validation, Data analysis and interpretation, Data curation, Methodology, Writing original manuscript. YZ: Collection and assembly of data. YJ and YY: Conception and design, Administration support, Manuscript edition, Funding acquisition. All authors contributed to this article and approved the submitted version.

Corresponding authors

Correspondence to Yuanyuan Jia or Yanhui Yang.

Ethics declarations

Ethics statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of The First Affiliated Hospital of Henan University of Science and Technology (protocol code: 2024-03-K0144, approved on 2024-03-20).

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, Q., Li, S., Zhang, Y. et al. Development and validation of interpretable machine learning models to predict distant metastasis and prognosis of muscle-invasive bladder cancer patients. Sci Rep 15, 11795 (2025). https://doi.org/10.1038/s41598-025-96089-1

Download citation

Received: 04 November 2024
Accepted: 26 March 2025
Published: 06 April 2025
DOI: https://doi.org/10.1038/s41598-025-96089-1

Subjects

Abstract

Similar content being viewed by others

Predicting response to neoadjuvant chemotherapy in muscle-invasive bladder cancer via interpretable multimodal deep learning

A multicenter bladder cancer MRI dataset and baseline evaluation of federated learning in clinical application

Precise grading of non-muscle invasive bladder cancer with multi-scale pyramidal CNN

Introduction

Materials and methods

Data collection and patient demographics

Model development and validation for predicting distant metastasis

Integrating ML algorithms for prognosis prediction

Results

Demographic and clinical baseline characteristics

Feature selection for the predictive model

Design and assessment of the predictive model

Model interpretation

Prognostic model for metastatic MIBC patients

Discussion

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics statement

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links