Interpretable multimodal deep learning improves postoperative risk stratification in intrahepatic cholangiocarcinoma in multicentre cohorts

Wan, Mingyu; Ding, Yongfeng; Wang, Yanli; Jia, Yunlu; Wu, Siqi; Qu, Wenxin; Xu, Yifan; Fu, Wenguang; Timko, Michael P.; Wan, Ledong; Ying, Le; Ye, Chanqi; Chen, Ruyin; Li, Qiong; He, Yuqing; Xu, Keyi; Xu, Nong; Chen, Jinzhang; Zheng, Dayong; Shen, Yifei; Ruan, Jian

doi:10.1038/s41746-025-02282-x

Download PDF

Article
Open access
Published: 29 December 2025

Interpretable multimodal deep learning improves postoperative risk stratification in intrahepatic cholangiocarcinoma in multicentre cohorts

Mingyu Wan¹^na1,
Yongfeng Ding¹^na1,
Yanli Wang²^na1,
Yunlu Jia¹,
Siqi Wu³,
Wenxin Qu⁴,
Yifan Xu⁴,
Wenguang Fu⁵,
Michael P. Timko⁶,
Ledong Wan⁷,
Le Ying⁸,
Chanqi Ye¹,
Ruyin Chen¹,
Qiong Li¹,
Yuqing He¹,
Keyi Xu¹,
Nong Xu¹,
Jinzhang Chen⁹,
Dayong Zheng¹⁰,
Yifei Shen^4,11 &
…
Jian Ruan¹

npj Digital Medicine volume 9, Article number: 95 (2026) Cite this article

4996 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Surgical resection is the primary curative treatment for intrahepatic cholangiocarcinoma (ICC), yet high postoperative recurrence rates pose a significant challenge. We developed an interpretable, transformer-based deep-learning pipeline that integrates multimodal data—including clinical variables, radiomic features, and whole-slide pathology images—by fusing a pre-trained encoder with a transformer network. To biologically validate our model, we leveraged spatial transcriptomics and proteomics to decipher the attention mechanisms underlying its predictions. It demonstrated robust performance in predicting 2-year overall survival, with area under the curve (AUC) values of 0.952 (95% CI: 0.909–0.983), 0.924 (95% CI: 0.804–1.000), and 0.924 (95% CI: 0.828–0.993) in three independent validation cohorts. Interrogation via spatial multi-omics revealed that the model’s attention was preferentially focused on regions histologically and molecularly associated with tumor invasion and aggressive behavior. We present a novel, interpretable multimodal deep-learning framework that achieves superior postoperative risk stratification for ICC patients.

CT-based deep learning enables early postoperative recurrence prediction for intrahepatic cholangiocarcinoma

Article Open access 19 May 2022

Radiomics-based model for predicting early recurrence of intrahepatic mass-forming cholangiocarcinoma after curative tumor resection

Article Open access 15 September 2021

Primary tumor resection improves survival in patients with multifocal intrahepatic cholangiocarcinoma based on a population study

Article Open access 09 June 2021

Introduction

Cholangiocarcinoma is the second most common primary liver tumor^1,2. Complete surgical resection remains the only potentially curative treatment for intrahepatic cholangiocarcinoma (ICC). However, only 20–30% of patients are eligible for surgery at diagnosis^3,4. Despite surgical resection, the recurrence rate remains as high as 60–70%, leading to a 5-year overall survival (OS) rate of <35%⁵. The presence of lymphnode metastasis, greater tumor burden, and aggressive pathological features such as vascular and perineural invasion has been consistently associated with poorer prognosis⁶. Early ICC is often silent, patients frequently present with large, multifocal, or vascularly invasive tumors that preclude curative-intent surgery. Even among surgical candidates, achieving a negative margin (R0) is challenging. Failure patterns are dominated by early intrahepatic recurrence and distant metastasis, reflecting aggressive biology within a permissive microenvironment⁷. Although adjuvant therapy is increasingly employed, benefits remain modest and heterogeneous³. Together, these realities highlight the need for accurate postoperative risk stratification to guide patient counseling, select systemic therapy, and tailor surveillance intervals.

ICC is marked by a dense stromal reaction—rich in cancer-associated fibroblasts, extracellular matrix remodeling, and variable immune infiltration—that drives phenotypic diversity beyond what routine clinicopathologic variables can capture^3,8. At the molecular level, recurrent alterations (IDH1/2 mutations, FGFR2 rearrangements, and changes in chromatin-remodeling genes) frequently co-occur with canonical drivers, compounding both intertumoral and intratumoral heterogeneity⁹. These features argue for integrated strategies that combine clinical variables, radiomics, whole-slide image–based histomorphometry, and targeted genomics to harness complementary biological and anatomical signals. The modest performance of existing postoperative prognostic models (reported AUCs 0.704–0.840) further underscores the limitations of unimodal designs and the need for robust, generalizable multimodal approaches to improve survival prediction and guide clinical decision-making^10,11,12,13.

Artificial intelligence (AI) has revolutionized numerous industries, including healthcare, which is experiencing similar transformative growth¹⁴. As a cornerstone of these advances, transformer-based architecture can enhance real-time clinical decision support, thereby improving precision medicine^15,16. Computational pathology increasingly leverages weakly supervised multiple‑instance learning—which treats each whole‑slide image (WSI) as a bag of tiles—and transformer‑based aggregators to capture long‑range spatial dependencies and highlight salient regions via attention heatmaps^8,17. In multimodal pipelines, fusion can be implemented through early feature concatenation, cross‑modal attention, or statistically robust late fusion—the latter often exhibiting stronger external validity¹⁸. Interpretability methods—such as attention rollouts on WSIs and global feature-importance scoring for tabular variables—are essential to build clinician trust and support hypothesis generation. Equally important, rigorous calibration and external validation across clinically relevant time horizons are needed to translate model outputs into actionable risk estimates. CLAM, an attention-based multiple instance learning framework for whole-slide image analysis, aggregates patch-level embeddings to produce slide-level predictions¹⁹. Paired with UNI—a self-supervised ViT pretrained on one of the largest histopathology slide collections—CLAM achieves strong classification performance (AUC 0.98 for LUAD vs LUSC; 0.97 for skin cancer) but has not been employed for survival modeling, with no applications reported in ICC²⁰. High-throughput quantitative nuclear features, capturing extensive cell-level characteristics, possess significant prognostic value²¹, and integrated multimodal models consistently outperform unimodal approaches in predicting cancer outcomes²². To our knowledge, no prior study has developed and validated a comprehensive prognostic model for a large-scale cohort of patients with postoperative ICC.

Spatial transcriptomics complements pathomics by mapping gene-expression programs onto tissue architecture, directly linking WSI–derived morphologic phenotypes to underlying biological states²³. Co-registering spatial transcriptomics with attention heatmaps or histomorphometric clusters can validate model-salient regions and disentangle spatial niches. Proteomics adds an orthogonal layer—capturing post-transcriptional regulation, protein abundance and modifications, and pathway activity—that refines RNA-only interpretations²⁴. In ICC, proteomic signatures of extracellular-matrix remodeling, epithelial–mesenchymal transition, metabolic rewiring, and immune-checkpoint signaling can contextualize risk scores^7,25. Together, spatial transcriptomics and proteomics provide mechanistic grounding for multimodal predictions and enhance biological interpretability, with the potential to open avenues for biomarker discovery and rational therapeutic stratification in ICC.

We previously developed an ensemble machine‑learning approach for metastasis prediction and risk stratification in ICC²⁶, here we present an advanced multimodal transformer-based ICC Prognostic Prediction System (ICC-PPS), integrating a powerful pretrained model, quantitative high-throughput nuclear and radiomic features, with essential clinical information to address this gap. The model was validated across three medical centers, and spatial transcriptomics and proteomic analyses were performed to elucidate the biological basis of our findings (Fig. 1, Supplementary Figs. 1, 2). External validation demonstrated that our model effectively predicted post-surgical survival in patients with ICC, identified poor prognostic factors, and supported more informed clinical decision-making.

**Fig. 1: Workflow overview and model interpretability.**

Results

Multimodal dataset and cohort characteristics in postoperative ICC study

This multicenter study included 654 patients with ICC who underwent curative‑intent surgery at four centers between January 2012 and June 2022 and met strict eligibility criteria. Data from 402 patients of the First Affiliated Hospital of Zhejiang University School of Medicine (FAHZU cohort) were used for training, and 151, 62, and 39 patients from Nanfang Hospital, Southern Medical University (NFSMU cohort), Shunde Hospital of Southern Medical University (SDSMU cohort), and Affiliated Hospital of Southwest Medical University (AHSMU cohort), were included as validation cohorts 1, 2, and 3, respectively. Detailed inclusion and exclusion criteria are provided in the Methods section (Fig. 2a). No significant differences in survival were observed across cohorts: The training cohort with a median survival of 40.0 months (95% confidence interval (CI): 35.1–47.0, n = 402, 275 events), validation cohort 1 with 48.3 months (95% CI: 37.6–53.9, n = 151, 87 events), validation cohort 2 with 41.9 months (95% CI: 32.9–56.8, n = 62, 40 events), and validation cohort 3 with 42.1 months (95% CI: 28.7–72.0, n = 39, 24 events) (Fig. 2b). The Kaplan–Meier survival curves for patients at different stages across cohorts are presented in Supplementary Fig. 3. The training cohort predominantly consisted of stage I, II, and III with distributions of 156 (38.8%), 62 (15.4%), and 163 (40.5%), respectively. Notably, the training cohort also included 21 patients (5.2%) with metastatic disease, all presenting with solitary metastases that were considered completely resectable after multidisciplinary evaluation. No statistical differences in baseline characteristics were identified between the training and validation cohorts (Fig. 2c, d, Supplementary Table 1).

**Fig. 2: Clinicopathological characteristics of training and validation cohorts of ICC-PPS.**

ICC-PPS: superior multimodal integration for enhanced predictive performance

The tissue region of each slide was first automatically segmented and then divided into numerous smaller patches (256 × 256 pixels) (Fig. 3a). Subsequently, these were projected onto the corresponding coordinates of the original WSI (Fig. 3b), and a UNI pretrained model was loaded to generate 1024-dimensional features for each coordinate (Fig. 3c). Subsequently, we precisely segmented tumor lesions from WSIs (512 × 512 pixels), breaking them down into over 3,263,600 image tiles and extracted more than 1,194,477,600 rows of tumor cell nuclear features (Fig. 3d). Pathomic nuclear features distribution illustrated that Zernike features predominated at 43.5% of the total, followed by Intensity features, the second largest group at 23.2% (Fig. 3e). From an initial set of 345 features per tile, a refined subset of nuclear characteristics was selected using Spearman correlation analysis (cutoff = 0.8), Cox multivariate regression (P < 0.1), and near-zero variance filtering (Supplementary Table 2). The resulting heatmap illustrates that related Zernike parameters exhibit positive intercorrelations, whereas remaining largely independent of other measurements (Fig. 3f).

**Fig. 3: Modality summary and multimodal performance comparison.**

Radiomics features of the region of interest (ROI) were extracted from MRI T2‑weighted images. A comprehensive analysis of histogram, texture, and shape characteristics generated a robust dataset of 704 quantitative features per patient (Fig. 3g). Figure 3h illustrates the radiomic feature distribution, exhibiting that texture-based features predominate in the analysis. The GLCM features contributed the largest proportion at 25.0%, followed by First-order features at 20.5%. GLSZM and GLRLM features each accounted for 18.2% of the distribution, whereas GLDM features represented 15.9%. Several prognostic radiomic features were identified, after feature selection including filtering by high correlation screening and FDR correction; however, no features were statistically significant predictors of survival in Cox regression (Fig. 3i and Supplementary Table 3).

The ICC-PPS exhibited superior performance in predicting survival compared with conventional TNM staging and CLAM classification in the external validation cohorts. The ICC-PPS effectively stratified patients by predicted 2-year survival across all validation cohorts: cohort 1 (AUC: 0.952 [95% CI: 0.915–0.989]), cohort 2 (AUC: 0.924 [95% CI: 0.814–1.000]), and cohort 3 (AUC: 0.924 [95% CI: 0.839–1.000]). ICC-PPS demonstrated markedly high sensitivity across the three validation cohorts (0.981–1.000), while specificity remained moderate (0.822–1.000). The ICC-PPS precision exceeded 0.929 in all cohorts (Fig. 3k and Supplementary Table 4). In comparison, the CLAM model showed substantially lower AUC values, ranging from 0.620 (95% CI: 0.518–0.722) to 0.714 (95% CI: 0.532–0.895), while the TNM Staging demonstrated even lower performance, with AUC values ranging from 0.561 (95% CI: 0.391–0.731) to 0.690 (95% CI: 0.605–0.774).

The ICC-PPS model yielded the highest AUC compared to other models in the validation cohorts. For 1-year OS prediction (n = 252), the AUC was 0.846 (95% CI: 0.782–0.909); for 2-year OS prediction (n = 252), the AUC was 0.936 (95% CI: 0.900–0.973); and for 3-year OS prediction (n = 217), the AUC was 0.806 (95% CI: 0.746–0.866). Although TNM staging achieved an AUC of 0.716 (95% CI: 0.653–0.779) for predicting 3-year OS, the comprehensive multimodal ICC-PPS model consistently outperformed both TNM staging and the CLAM model across all time periods (Fig. 3l, Supplementary Tables 5, 6). In prespecified subgroup analyses, the ICC-PPS maintained consistently high performance across key clinical strata. Across disease stages I–IV, F1-scores ranged from 0.941 to 0.971 and AUCs from 0.934 to 0.972, indicating similar discrimination in both early and advanced disease. Comparable robustness was observed in subgroups defined by tumor size, nerve and lymphovascular invasion, lymph node status, and pre-/postoperative treatment, with F1-scores ranging from 0.911 to 0.971 and AUCs from 0.896 to 0.972 (Supplementary Table 7, Supplementary Fig. 4).

Based on the ICC-PPS model predictions, patients with scores >0.5 were classified as low-risk, while those with scores ≤0.5 were assigned to the high-risk group. This risk stratification demonstrated significantly different survival outcomes across the three independent validation cohorts, with median survival times of 53.8 versus 14.7 months (HR = 0.07 [95% CI: 0.05–0.11], P < 0.001) (Fig. 3j).

Our attention visualization analysis revealed that the ICC-PPS model primarily focused on tumor regions. The attention heatmaps demonstrated that beyond the tumor core, tumor margins represent a critical area of model focus. Concentrated attention patterns were observed across multiple tumor margin regions (Fig. 3m). Notably, in several patient cases, the model’s attention to tumor margins was particularly pronounced—even exceeding focus on the tumor itself—highlighting the importance of tumor-stromal interfaces in predicting clinical outcomes (Fig. 3n, Supplementary Fig. 5).

Spatial omics reveals the relationship between attention and tumor margin invasiveness

The attention heatmap from ICC-PPS suggested a potential correlation between patient survival and tumor margin invasion. To further investigate, we selected two patients representing distinct ICC-PPS risk strata—P1 (ICC-PPS low-risk, long-term survivor) and P2 (ICC-PPS high-risk, short-term survivor)—for single-cell and spatial transcriptomic analyses, focusing on attention mechanisms at the tumor margin. UMAP plot illustrating cell type and patient distribution, as presented in Fig. 4a, b. We inferred tumor cells based on copy number variation (Fig. 4c and Supplementary Fig. 6). Following the Li et al. findings in ICC²⁷, we defined proliferating tumor cells (Prolif) as the top 25% of cells overexpressing MKI67, TOP2A, and UBE2C. We assessed common behavioral characteristics of tumor cells and found significantly increased epithelial-mesenchymal transition (EMT) and proliferation scores using the “addmodulescore” algorithm (P < 0.001, Fig. 4d). KEGG pathway enrichment analysis indicated activation of cell junction and migration pathways (Fig. 4e). CellChat analysis revealed strong bidirectional communication between tumor cells and fibroblasts, depicted by thick connecting lines in both interaction weight and interaction number networks. This robust tumor-fibroblast signaling axis was particularly prominent among all cell-cell interactions (Fig. 4f). We identified and distinguished various cell types and delineated the boundaries between tumor and normal tissues by employing SPOTlight and SpaCET analyses to annotate spatial transcriptomics with single-cell data. P2 increased signal intensity from tumor tissues (WNT, TGF-β, and VEGF pathways), leading to abnormal activation of these oncogenic pathways and suppression of regulatory functions of normal tissues, potentially promoting tumor progression and shortening survival (Fig. 4g). Integrated spatial and single-cell transcriptomic analysis revealed significantly higher expression of proliferation markers MKI67, TOP2A, CDH2, and VIM in short-term survivors (P < 0.001), with distinct spatial enrichment at tumor-normal interfaces. This combined approach demonstrated enhanced proliferative activity and mesenchymal features at invasive margins in patients with poorer prognosis, providing both cellular and spatial contexts for molecular determinants of survival outcomes (Fig. 4h). Both patients showed prominent integrin-mediated interactions; P2 exhibited denser and thicker connections, particularly involving SDC1/2/4 and LRP1. (Fig. 4i). The increased intensity and complexity of ligand-receptor interactions in P2 suggest enhanced intercellular signaling that may contribute to more aggressive tumor behavior. The shift from primarily proliferation-focused processes in P1 to trafficking/autophagy/chromosomal processes in P2 suggests tumor evolution toward more invasive and adaptive phenotypes (Fig. 4j). P2 tumors appeared to have enhanced abilities for protein secretion, membrane remodeling, and stress adaptation through autophagy, characteristics associated with more aggressive disease, therapy resistance, and ultimately poorer patient outcomes. Simultaneously, we observed spatial localization of CAFs, with poor-prognosis patients exhibiting higher CAF expression (Fig. 4k). This observation was consistent with CellChat analysis, which revealed strong communication networks between fibroblasts and tumor cells, reflecting the functional significance of these spatial relationships in the tumor microenvironment.

**Fig. 4: Spatial transcriptomics reveals tumor margin invasion characteristics in patients with different risk levels.**

Proteomic landscape across patient risk groups validates metastasis and CAF-associated signatures

Cell-cell colocalization analysis revealed that CAFs exhibited a strong positive spatial association, particularly with endothelial cells and M1 macrophages (Fig. 5a, Supplementary Fig. 7). The correlation plot highlighted the prevalence of these CAF-centered interactions, illustrating multiple significant CAF-paired correlations, including CAF-endothelial, -macrophage M1, -mast cell, and -pDC interactions. To further investigate proteomic differences between risk-stratified patients, we examined the protein profiles of 112 patients incorporated into our model, comprising 33 ICC-PPS high-risk and 79 ICC-PPS low-risk individuals. Compared to low-risk patients, high-risk patients exhibited markedly elevated expressions in both the metastasis- and CAF-related protein modules, demonstrating a clear molecular distinction between these prognostic groups (Fig. 5b). GSVA highlighted a stromal/mesenchymal program in high-risk tumors, spanning CAF, EMT, ECM remodeling, angiogenesis, adhesion, and TGF-β signaling (all P < 0.05, Fig. 5c), whereas GSEA illustrated CAF and EMT concordant downregulation in low-risk cases (NES = 2.22 and 1.51; both P < 0.05), indicating a less-stromalized and invasive microenvironment (Fig. 5d).

**Fig. 5: Proteomic analysis validates differential protein expression profiles between patient risk groups.**

Through LASSO regression analysis of relevant genes, we constructed an ICC Protein Score (ICPS) built from 7 proteins (Supplementary Table 8). This protein-based signature demonstrated robust predictive performance for 1, 2, 3, 4, and 5-year survival, with AUC values of 0.804 (95% CI: 0.719–0.890), 0.898 (95% CI: 0.834–0.962), 0.828 (95% CI: 0.748–0.909), 0.774 (95% CI: 0.680–0.868), and 0.778 (95% CI: 0.675–0.880), respectively (Fig. 5e). Patients stratified by ICPS exhibited significantly different OS outcomes: low-risk patients with a median survival of 63.5 months (95% CI: 50.2–NA, n = 56, 26 events) and high-risk patients 22.3 months (95% CI: 17.1–41.8, n = 56, 47 events), underscoring the prognostic utility of this proteomic signature (Fig. 5f). As an extension of this work, we further validated the ICC-PPS as part of an ongoing Phase II prospective cohort study (NCT06298968). Enrolled patients from the prospective cohort are presented (Supplementary Table 9). All 9 enrolled patients with recurrent disease following surgical resection were classified as low-risk by the ICC-PPS model. At the time of analysis, only one patient had less than 2 years of follow-up, whereas the remaining patients had all achieved OS exceeding 2 years, supporting the predictive value of our signature in identifying patients with better prognosis (Fig. 5g, h). The ICC-PPS demonstrated robust predictive performance, effectively stratifying patients according to their prognostic risk profiles.

Discussion

We developed a deep learning-based model for early assessments of patients with ICC undergoing radical surgery by integrating quantitatively preprocessed radiomic and pathomic features. The ICC-PPS model accurately predicts patient survival and demonstrates good AUC, high sensitivity, specificity, and accuracy in three external validation cohorts, outperforming traditional predictive models. The robustness and generalizability of ICC-PPS have been further validated in a subset of patients enrolled in a Phase II prospective clinical trial. Our research provides a reliable and reproducible tool for predicting postoperative survival in patients undergoing radical surgery for ICC, facilitating computer-assisted personalized management.

The substantial disparity in postoperative outcomes among patients with ICC, ranging from long-term survival to rapid recurrence, reflects the distinct biological behaviors that characterize this disease²⁸. This heterogeneity stems from the inherent multi-scale complexity of the tumor²⁹. Although previous studies have attempted to predict prognosis by investigating this heterogeneity through single modalities such as clinical characteristics, radiomics, or conventional pathology, these approaches are limited by inherent constraints. By providing only a partial perspective, these methods failed to capture the comprehensive landscape of tumor heterogeneity, thereby restricting their predictive capacity^30,31. The systematic characterization of tumor heterogeneity necessitates the synergistic integration of multimodal information. Accumulating evidence from other malignancies has demonstrated that models integrating multiple data sources exhibit significantly superior predictive performance compared to unimodal approaches^32,33. Therefore, our study integrated advanced deep-learning features from a large-scale, pretrained model, high-throughput nuclear features, preoperative MRI-based texture signatures, and key clinical risk factors, confirming that this multimodal strategy not only outperformed the accuracy of any single modality model but also provided a comprehensive view of ICC heterogeneity.

Conventional machine-learning methods (LR and SVM) can be reliable in constrained settings but have limited capacity and feature-learning ability for high-dimensional data with nonlinear relationships and complex interactions, making them susceptible to the curse of dimensionality and overfitting³³. Conversely, deep-learning frameworks with deep nonlinear architectures and automated representation learning have demonstrated clear advantages for similarly complex tasks^33,34. Most prognostic models for ICC still relied on Cox proportional hazards or random-forest-based nomograms and have not leveraged advanced frameworks to integrate heterogeneous modalities³⁵. To our knowledge, no ICC study has used a deep-learning framework to integrate high-dimensional multi-omics data. Recent studies have demonstrated the potential of machine learning approaches for predicting postoperative outcomes in ICC; however, most existing models rely on single‑modality data and offer limited biological interpretability. A pathomics‑driven ensemble model developed by Zhang et al. integrated WSIs with clinical variables and achieved AUCs of 0.902, 0.853, and 0.807 for 1‑, 2‑, and 3‑year OS, respectively, while further identifying TPM2 and PLOD2 as prognostic biomarkers through genomic correlation³⁶. For early liver recurrence prediction, Jolissaint et al. reported an AUC of 0.84 by combining texture features with tumor size and demonstrated associations with somatic driver mutations³⁷. More recently, interpretable ML approaches have emerged; for instance, XGBoost models with SHAP‑based visualization have been used to predict perineural invasion, attaining AUCs of 0.79–0.88 across retrospective and prospective cohorts³⁸. Overall, a common limitation of these models is the marked loss of performance when evaluated in external datasets, as exemplified by the AUC decrease from 0.904 to 0.779 in Alaimo et al.³⁹. Compared with these recent ML‑based ICC models, ICC‑PPS provides higher and more consistent discrimination for 2‑year OS, with AUCs of 0.924–0.952 across three external cohorts, and offers biologically grounded interpretability supported by spatial transcriptomics and proteomics. Nevertheless, its dependence on digital pathology, MRI, and carefully curated feature‑extraction pipelines may pose practical challenges for implementation relative to simpler MRI‑based or purely clinicopathologic tools. For clinical translation, the proposed approach can be integrated into existing diagnostic workstations to serve as an assistive tool. A semi-automated workflow, where clinicians verify the target regions prior to analysis, would ensure high-quality data input while maintaining clinical efficiency. This allows the risk stratification model to be seamlessly incorporated into routine decision-making processes.

We circumvented the typical “black box” problem of deep learning and discovered that the attention of the model systematically concentrated on the invasive front of the tumor in ICC by leveraging the intrinsic interpretability of our transformer framework. This data-driven insight prompted our hypothesis that the marginal tumor microenvironment is a key driver of poor prognosis. We validated this using targeted spatial transcriptomics and large-scale proteomics, confirming a significant enrichment of invasion- and EMT-related pathways, along with substantial infiltration of activated CAFs, which remodel the extracellular matrix to directly promote tumor invasion^40,41. Ultimately, our study not only provided a biologically grounded prognostic model but also demonstrated the advanced AI power as a hypothesis-generating engine, effectively linking macroscopic patterns to the microscopic mechanisms of ICC invasion. An increasing number of AI-driven studies have demonstrated a pronounced trend toward integrating high-throughput data with sequencing information, revealing substantial potential for enhanced predictive modeling. For instance, transformer-based AI algorithms have exhibited a remarkable efficacy in integrating various omics data, including cfDNA methylation profiles, and achieving outstanding performance in clinical applications⁴². Consistent with this paradigm, our investigation revealed that spatial transcriptomics and proteomics offer powerful complementary approaches by effectively combining sequencing data with spatial context. Multi-omics integration strategies underscore the promising prospects for personalized therapeutic approaches guided by comprehensive molecular profiling, positioning multi-dimensional data fusion as a cornerstone for precision oncology. Nonetheless, it must be noted that, owing to the currently limited number of spatial transcriptomic datasets and the still exploratory nature of studies specifically interrogating the tumor margin, these findings should be regarded as preliminary and warrant confirmation in larger, well‑annotated cohorts.

Our study yielded promising results, but it has several limitations. First, the ICC-PPS model was trained and validated using highly curated radiomic and pathomic data, underscoring the necessity of establishing standardized data acquisition and quality control protocols for future clinical implementation. Second, this study did not apply specific harmonization techniques to account for parametric variations introduced by different scanners across multiple institutions, which may introduce potential bias. Third, the current reliance on manual delineation of tumor regions of interest is labor-intensive and time-consuming. To enhance clinical workflow, developing a user-friendly, fully automated segmentation system is a key priority for future work.

We developed and validated ICC-PPS, a novel AI model that integrates MRI with digital pathology to predict postoperative survival in patients with ICC. The high performance and generalizability of the model highlight its potential as a clinical decision-support tool. By enabling more accurate risk stratification, ICC-PPS could help guide the treatment strategy personalization and ultimately improve outcomes for patients with this challenging disease.

Methods

Cohort curation

This retrospective study included patients who underwent surgical resection between January 2012 and June 2022 at four institutions: the First Affiliated Hospital, Zhejiang University School of Medicine (FAHZU; training cohort); Nanfang Hospital, Southern Medical University (NFSMU; validation cohort 1); Shunde Hospital of Southern Medical University (SDSMU; validation cohort 2); and the Affiliated Hospital of Southwest Medical University (AHSMU; validation cohort 3). All cases were confirmed postoperatively as ICC by pathological examination. Patients were eligible for inclusion if they met the following criteria: (1) pathologically confirmed ICC; (2) primary surgical resection with curative intent; (3) available preoperative MRI within 30 days before surgery; (4) complete pathological assessment and clinical documentation; (5) no previous surgical intervention for ICC. The exclusion criteria were as follows: (1) Patients without pathologically confirmed cholangiocarcinoma; (2) patients who did not undergo surgery and those with missing pathological images, images of pretreatment MRI sequences including T2-weighted imaging (T2WI), or clinical information; (3) patients with insufficient follow-up of <2 years.

To generalize the applicability of ICC-PPS in routine clinical practice, a subset of patients enrolled in a Phase II prospective clinical trial (ClinicalTrials.gov: NCT06298968; registered 7 March 2024) at Nanfang Hospital, Southern Medical University, was included as an external, validation cohort for model performance evaluation. The study adhered to the Declaration of Helsinki and was approved by the institutional review board. To ensure comparability and minimize selection bias, the inclusion criteria for the prospective cohort were aligned with those used for the retrospective cohorts, with the additional requirement of prespecified data completeness for ICC-PPS calculation.

Clinicopathological information assessment

At each participating hospital, H&E-stained slides from the surgical resection specimen were evaluated according to the hospital’s standard pathology reporting protocol. A central pathology laboratory conducted an additional review of clinicopathological information. Reviews were performed by two experienced hepatobiliary pathologists and one experienced clinical hepatobiliary specialist. In cases of disagreement between the two pathologists, a third expert pathologist (Dr. Yanli Wang) provided the final adjudication.

Each center assigned two clinicians to participate in clinical information collection. Cohort data were obtained from institutional medical information systems documenting patients who underwent liver resection. From these systems, postoperative pathological stage, age, and other key clinicopathological variables were extracted using a prespecified data dictionary and standardized case report form (CRFs). When postoperative pathological stage was unavailable, clinical stage was recorded as a substitute, using the same prespecified staging framework. A clinical expert with 15 years of experience (Dr. Yunlu Jia) conducted a final quality-control review of the clinical characteristics data to ensure accuracy. Clinical covariates matched those reported in Supplementary Table 1 (age, sex, TNM stage, tumor location, tumor size, tumor differentiation, nerve invasion, lymphovascular invasion, lymph node metastasis, preoperative treatment, postoperative treatment, and survival status). Binary variables were encoded as 1/0 (yes/no), and multi‑level categorical variables were encoded as integers.

Follow-up and survival evaluation

At each participating medical center, two trained clinical research assistants (CRAs) were responsible for patient follow‑up. CRAs conducted structured telephone contacts at prespecified intervals and continued until an event (death) or censoring. A standardized script was used to ensure consistency across centers. During each contact, CRAs recorded vital status (alive/deceased) and date of the contact. When a patient could not be reached on the scheduled date, repeat call attempts were made at different times/days. Vital status and dates of death were verified against hospital records.

The primary endpoint was OS, defined as the time from date of surgical resection to death from any cause. Patients who were alive at the database lock were censored at the date of their last successful contact. Patients lost to follow‑up were censored at the last date they were known to be alive. All follow‑up data were entered into a centralized, time‑stamped electronic database with routine cross‑center consistency checks and queries for missing or implausible values. Data collection followed site‑specific ethical approvals and written informed consent procedures, and all telephone interactions adhered to confidentiality and data‑protection requirements.

Annotation of MRI scan

For each participant, axial high‑resolution T2‑weighted images were acquired on 1.5‑T or 3.0‑T MRI systems at the collaborating hospitals. The associated MRI scans were reviewed according to the following inclusion criteria: (1) the scan must include T2-weighted imaging sequences; (2) absence of severe artifacts or motion-related blur obscuring the ROI; and (3) adequate signal-to-noise ratio. All MRI scans were performed before surgery. All MRI scans were available in the Digital Imaging and Communications in Medicine (DICOM) format through the institution’s Picture Archiving and Communication System. The native MRI data were exported using RadiAnt DICOM Viewer (version 5.0.1), then manually annotated with ITK‑SNAP (www.itksnap.org)⁴³. ROIs were delineated along the margins of the tumor. Two radiation oncologists, each with at least 10 years’ experience in hepatobiliary MRI, performed the tumor ROI contouring under the supervision of an expert radiologist with 15 years of diagnostic imaging experience, Dr. Chengzhi Li, from the Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine.

Annotation of WSI

To collect H&E images, clinical research assistants reviewed the pathology information system to identify pathology cases corresponding to the patients’ surgical resection specimens, and expert pathologists reviewed the slides to select high-quality tumor region specimens for digitization. All slides included in the analysis were prepared from primary surgical resection specimens. Formalin‑fixed, paraffin‑embedded tissue was sectioned into 4‑μm H&E‑stained slides and digitized as WSIs at 20× magnification (0.5 μm/pixel) using panoramic scanning instruments (Aperio AT2 or CS2, Leica Biosystems, Germany) at each participating site. Pathologists verified that every slide contained tumor and met high image‑quality standards for evaluation. Two senior pathologists annotated tumor cells regions in ImageScope (www.leicabiosystems.com), and a third senior professor re‑examined the annotated ROIs to ensure accurate delineation.

Feature extraction of MRI

Radiomics features were extracted from the tumor region of each patient using the open-source PyRadiomics (version 3.0.1) Python package⁴⁴. All input images (.nii format) and their ROI masks were derived from the preoperative hepatobiliary MRI scans. First, image signal intensities were normalized by subtracting the mean and dividing by the standard deviation (Z-score normalization), followed by multiplication with a scaling factor of 100 to reduce variability across different scanners. For robust texture analysis, the image gray levels were discretized using a fixed bin width of 5. Feature extraction was performed on the original images as well as on derived images obtained through Laplacian of Gaussian (LoG, with σ values of 1.0, 3.0, and 5.0) and various Wavelet transformations. A forced 2D mode was employed, extracting features slice-by-slice from each axial plane containing the tumor. Finally, features were extracted from the following classes: Shape features, First-order statistics, Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Matrix (GLDM). This process generated a comprehensive set of radiomics features for each patient’s tumor region, which were used for subsequent analysis.

Feature extraction of tumor nucleus characteristics

All eligible slides were processed at 20× magnification and tessellated into non-overlapping 512 × 512-pixel tiles. Each tile was first normalized using the Macenko stain method and subsequently standardized with Z-score normalization⁴⁵. Pathomic features were extracted using CellProfiler (version 4.2.5, https://cellprofiler.org/)⁴⁶. After separating color channels within ROIs using the UnmixColors module, tumor cell nuclei in ROI tiles were semi-automatically segmented with IdentifyPrimaryObjects (including threshold estimation, morphological operations, and declumping), with minimum/maximum nuclear diameters tuned to image resolution. Per-nucleus measurements included: (i) morphological AreaShape features (e.g., area, perimeter, eccentricity, solidity, feret diameters, and major/minor axis lengths), together with Zernike moments of orders 0–9 (coefficients 0_0 to 9_9); (ii) intensity features on the Hematoxylin channel only—Integrated, Mean, Median, Max, Min, Quartile, and Std, including edge-based metrics—via MeasureObjectIntensity; and (iii) Location features capturing the intensity centroid and coordinates of maximum intensity on the Hematoxylin channel as well as geometric centroids. For each slide, features were aggregated over non-overlapping batches of 50 tiles by computing the mean, median, standard deviation, and the 25th and 75th percentiles. This procedure yielded 345 tumor nucleus feature dimensions per patient.

Whole-slide image data preparation

To convert raw WSIs into structured inputs for downstream transformer models, we adopted the CLAM framework’s tissue segmentation and tiling pipeline¹⁹. We first performed tissue–background segmentation to remove non-informative regions and common artifacts; within the resulting tissue mask, slides were tiled into fixed 256 × 256-pixel patches, retaining only tiles that met the default tissue-coverage threshold and recording their positions in slide coordinates. For feature extraction, we employed UNI, a foundation model tailored to computational pathology²⁰; Each patch was preprocessed according to UNI’s recommended normalization and encoded into a dense 1,024-dimensional feature vector. Compared with general-purpose vision encoders, UNI yields representations that more faithfully capture cellular morphology and the tissue microenvironment—signals critical for pathological diagnosis. The final slide-level representation comprises a “bag” of patch embeddings with associated spatial metadata, which can be ingested directly by set- or sequence-based transformer architectures while preserving spatial provenance for positional encoding and end-to-end traceability. Unless otherwise specified, we followed the default parameters from the CLAM and UNI reference implementations to ensure consistency and reproducibility.

Features selection and modeling strategy

We developed a multimodal transformer classifier that integrates standardized clinical covariates, pathomic nuclear features, and patch-level WSI tokens within a unified sequence model. A transformer encoder serves as the backbone: modality-specific lightweight fully connected projection heads map clinical, radiomic/pathomic features into a shared embedding space, after which they are concatenated with WSI tokens⁴⁷. Positional encodings and attention masks are applied to preserve ordering and to robustly accommodate variable-length sequences across patients and slides. Classification is performed by a two-layer multilayer perceptron with ReLU activation and dropout, applied to the transformer output representation. Models were optimized with Adam under a cosine-annealed learning‑rate schedule using class‑balanced binary cross‑entropy with logits. Radiomics, nuclear pathomic features, and H5 features were reduced by principal component analysis (PCA) fitted on the training set, retaining 100, 50, and 100 components, respectively, while clinical covariates (11 dimensions) were kept at original dimensionality. The multimodal transformer was trained with an initial learning rate of 0.0001 and a batch size of 16. To manage GPU memory, WSIs were downsampled by a factor of three prior to patch extraction; this empirically did not degrade validation performance. Training proceeded for up to 50 epochs with automatic mixed precision on 8× NVIDIA Tesla V100‑SXM2‑32GB GPUs, employing early stopping and distributed data parallelism with gradient scaling for stability and efficiency.

Prediction and evaluation metrics

The combined data were input into distinct prediction models to generate prediction values (varied from 0 to 1) for individuals and patches. Accordingly, individual with prediction value that higher than 0.5 will be defined as a “long survival” case, while those lower than or equal to 0.5 will be defined as a “short survival” case. Let TP, TN, FP, and FN denote, respectively, true positives, true negatives, false positives, and false negatives for the long‑survival class. We report the following metrics on the evaluation data.

(1)Acc (Accuracy) Overall proportion of correct classifications across both classes:

$${Acc}=\frac{\left({TP}+{TN}\right)}{{TP}+{TN}+{FP}+{FN}}$$

(2)Rec (Recall/Sensitivity for long survival) Ability to correctly identify long‑survival cases:

$${\rm{Rec}}=\frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FN}}}$$

(3)Spec (Specificity for short survival) Ability to correctly recognize short‑survival cases as short:

$${Spec}=\frac{{TN}}{{TN}+{FP}}$$

(4)Prec (Precision/Positive Predictive Value for long survival) Among cases predicted as long survival, the fraction that are truly long survival:

$${Prec}=\frac{{TP}}{{TP}+{FP}}$$

(5)F1 (F1-score) Harmonic mean of precision and recall, summarizing the balance between “being correct” and “finding” long‑survival cases:

$${F}_{1}=2* \frac{{Prec}* {Rec}}{{Prec}+{Rec}}$$

(6)MCC (Matthews Correlation Coefficient) A balanced correlation‑like measure that uses all four entries of the confusion matrix and is robust to class imbalance:

$${MCC}=\frac{{TP}* {TN}-{FP}* {FN}}{{sqrt}\left(\left({TP}+{FP}\right)* \left({TP}+{FN}\right)* \left({TN}+{FP}\right)* \left({TN}+{FN}\right)\right)}$$

Range: −1 (inverse), 0 (random), 1 (perfect). (7)Kappa (Cohen’s kappa) Agreement between predictions and ground truth beyond chance:

$$\kappa =\frac{{p}_{o}-{p}_{e}}{1-{p}_{e}}$$

where p_o is the observed agreement (akin to accuracy) and p_e is the agreement expected by chance based on the marginal class distributions.

Attention-based visualization

Tile‑level attention weights were extracted from the multimodal transformer classifier. For each patient, the model produced a one‑dimensional attention vector, where each element represented a scalar attention score assigned by the last transformer layer to a pathology tile. These scores were computed using the patient‑level token as the query and internally averaging across attention heads. Each attention value was then aligned to its corresponding tile coordinate on the WSI, with coordinates obtained from the HDF5 file. Tiles sharing the same coordinate were aggregated by averaging their attention scores, which were subsequently converted to within‑patient percentile ranks to yield normalized importance scores in the range [0, 1]. Because pathology tiles possess explicit spatial locations, these percentile‑normalized scores were visualized as heatmaps by plotting each tile at its coordinate and coloring it according to its percentile score.

Data resources and descriptions—tissue acquisition, storage, and sequencing

ICC specimens, including adjacent normal liver tissue, the leading-edge area, and tumor core tissues, were collected from ICC patients with voluntary written consent, and with the sanction of each participating medical center’s medical ethics committee. These specimens were stored at tissue storage solution, and then used for single-cell sequencing, spatial transcriptome sequencing, and paraffin embedding.

Samples processed for scRNA‐Seq

Samples of tissues were rinsed with 4 °C PBS and dissected into fragments. These fragments were immersed in a 10 mL EDTA-enriched solution and agitated for an hour at a temperature of 37 °C. Subsequently, the tissues were treated with a 10 mL DTT-enriched PBS solution (65 mm DTT, with 10% FBS added) for a duration of 15 min at the same temperature, with continuous agitation. The EDTA and DTT were eliminated by rinsing the tissues with PBS twice. The cells were then strained through a 100-micron filter, centrifuged, and rinsed twice with PBS. The resulting cell suspensions were prepared for scRNA-seq, performed by Novo Technology (Beijing, China), following the standard operating procedure on the 10X Genomics Chromium platform.

Samples processed for spatial transcriptomics

Spatial transcriptomic libraries were crafted utilizing the 10x Genomics Visium Spatial Gene Expression Reagent Kit. The leading-edge tissue samples were bisected, with the well-preserved half being encased in OCT and preserved at −80 °C for subsequent analysis. The counterpart half was allocated for scRNA sequencing. Prior to executing the complete procedure, a tissue optimization test was conducted using the 10x Genomics Visium Spatial Tissue Optimization with cryosections from ICC patient tumors, accompanied by fluorescence signal visualization, which determined 24 min as the ideal permeabilization duration. Samples were prepared for spatial transcriptomic sequencing in accordance with the protocol provided by Novo Technology (Beijing, China).

Spatial genomics data preparation and computational processing

The FASTQ files were aligned to the GRCh38 human genome using the Space Ranger (v2.1.0) software from 10x Genomics, yielding a count matrix as per genome annotations. Spatial transcriptomics samples underwent quality control through the Loupe Browser, with the selection of undamaged spots devoid of tissue folds. Further quality checks, including gene count, mitochondrial read percentage, and ribosomal fraction, were not applied as all spots met the criteria.

Processed gene-spot matrices were subsequently analyzed using the Seurat package in R (version 4.2.1)⁴⁸. The “SCTtransform” function was applied for normalization across spots. Dimensionality reduction and clustering were conducted using PCA, selecting the top 9–11 principal components. Signature scoring was executed with the “AddModuleScore” function in Seurat. The gene lists of particular signatures were generated from scRNA-seq data and pathways from the “msigdbr” library, spatial feature expression was visualized with Seurat’s “SpatialFeaturePlot” function.

scRNA‐seq data processing

The raw data was executed with the “Seurat” package in R (version 4.2.1). Cells exceeding 25% mitochondrial gene content were filtered out, along with genes detected in less than 20 cells and cells expressing less than 500 genes. Seurat’s “FindVariableFeatures” function pinpointed the genes with the highest variability. Dimensionality reduction was achieved through PCA, and batch effects were mitigated using Seurat’s CCA (canonical correlation analysis) method. UMAP clustering revealed six primary clusters, with cell type annotations determined by integrating known and differentially expressed marker genes. Signature scores were derived by averaging the scaled and centered expression values of genes within each signature, as calculated by Seurat’s “AddModuleScore” function.

Cancer hallmarks and GSEA analyses

Gene sets for analysis were sourced from the “msigdbr” library (http://software.broadinstitute.org/gsea/msigdb/). The scoring methodology for gene signatures adhered to a previously reported approach. Essentially, each cell’s gene signature score was derived from averaging the scaled expressions of the genes within the signature. To detect significant shifts in gene expression between leading‐edge and core tumor cells, the “FindAllMarkers” algorithm was applied.

InferCNV analysis

The R package “inferCNV” (version 1.20.0), accessible at its GitHub repository (https://github.com/broadinstitute/inferCNV), was employed to distinguish malignant cells by deducing chromosomal CNVs from gene expression profiles. Leveraging T/NK cells, B cells, and myeloid cells as representative comparators of normal cell populations, the epithelial with high CNVs and high epithelial markers were estimated to be malignant. A gene ordering file was crafted, grounded in the human GRCh38 genome assembly, which outlined the precise chromosomal coordinates—both the start and end positions—for each gene. This file was integral to the “gene_order_file” parameter required by the inferCNV algorithm. By inputting the raw count matrix and annotation file into inferCNV, a rigorous cutoff threshold of 0.1 was implemented to identify significant CNVs, thereby refining the precision of the analysis in the detection of tumor cells.

Cell-cell interactions analysis

Interactions between cells, as indicated by the expression patterns of established Ligand‐Receptor (L‐R) pairs across various cell types, were quantified using the CellChat (version 2.2.0), available on its GitHub repository (https://github.com/sqjin/CellChat). Briefly, CellChat was utilized with gene expression data to map overexpressed ligands and receptors onto a protein‐protein interaction network, identifying significant ligand‐receptor interactions. Probability assessment and permutation testing within CellChat were applied to deduce significant cell-cell communications. The resulting interaction networks were visualized using bubble plots for clarity.

Tandem mass tag (TMT)-based proteomic analysis

Formalin-fixed paraffin-embedded (FFPE) samples (0.5–1 mg) were dewaxed and rehydrated and then subjected to acidic hydrolysis with formic acid (FA). Proteins were denatured with 6 M urea (Sigma-Aldrich, Germany) and 2 M thiourea (Sigma-Aldrich, Germany), and then digested into peptides with trypsin (1:20, Hualishi, Beijing, China) and Lys-C (1:80, Hualishi, Beijing, China) with the assistance of pressure-cycling technology (PCT). Peptides were labeled with TMTproTM 16 plex (Thermo Fisher Scientific^™, San Jose, USA). Each batch contained 15 experimental samples and one pooled sample in the TMT126 channel for normalization. The fractions (60 per batch) were separated using offline high-pH reversed-phase chromatography with a Thermo Dionex Ultimate 3000 RSLCnano System and then combined to a total of 30 fractions per batch. Subsequently, the fractionated samples were separated with a Thermo Dionex Ultimate 3000 RSLCnano System and then analyzed with a Q Exactive HF mass spectrometer using the data-dependent acquisition (DDA) mode (Thermo Fisher Scientific^™, San Jose, USA). The database searching included all reviewed human entries from UniProt (downloaded on 14 April 2020, containing 20,365 proteins) using Proteome Discoverer (version 2.4, Thermo Fisher Scientific^™, Waltham, MA).

Pathway analysis based on proteomic dataset

DEPs of cell subgroups were recognized by the DESeq2. |FC| > 2 and q value (false discovery rate) <0.05 were used as the cut-off criteria. GO enrichment analysis was performed on these DEPs with R package clusterProfiler. Single-sample GSEA (ssGSEA) was applied to evaluate the enrichment scores of each sample. ssGSEA was conducted with the R package GSVA.

The REACTOME gene sets were downloaded from The Broad Institute (https://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp?collection=CP:REACTOME).

Statistical analysis

All data underwent suitable statistical evaluations. For variables exhibiting normal distribution, differences between two groups were assessed using an unpaired two‐tailed t‐test, while non‐normally distributed variables were compared with the Mann-Whitney U test. Receiver operating characteristic (ROC) curves were generated by plotting the proportion of true positive cases (sensitivity) against the proportion of false positive cases (1-specificity) based on various predictive probability thresholds. The 95% CIs for the AUC were estimated using the non-parametric DeLong method. A LASSO‑penalized Cox regression model was implemented using the glmnet package. Ten‑fold cross‑validation was applied and the penalty parameter λ was selected as lambda.min. Survival differences were examined with the Kaplan-Meier method complemented by the log‐rank test. A P value < 0.05 was set as the threshold for statistical significance (*P < 0.05, **P < 0.01, ***P < 0.001). Analysis was conducted in ITK SNAP v.4.0.1, ImageScope v.12.4.3 CellProfiler v.4.2.5 and custom code written in Python v.3.10.0 (using Pandas v.2.2.2, NumPy v.1.26.4, PyTorch v.2.3.0, TorchVision v.0.18.0, OpenSlide v.4.0.0, Seaborn v.0.13.22, Matplotlib v.3.9.0, SciPy v.1.13.1, scikit-learn v.1.5.0 and PyRadiomics v.3.0.1).

Ethics approval

This study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang University School of Medicine (approval number: [2025B] IIT Ethics Approval No.0738), the Ethics Committee of Shunde Hospital of Southern Medical University (approval number: KYLS20250601), the Ethics Committee of Nanfang Hospital of Southern Medical University (approval number: NFEC-2024–074) and the Ethics Committee of the Affiliated Hospital of Southwest Medical University (approval number: KY2025264).

Data availability

The raw radiomic and pathomic data are not publicly available by hospital regulations to protect patient privacy. The data that supports the findings of this study are available from the corresponding authors with a signed data access agreement. The omics data generated in this study have been deposited in the Human Research Archive (HRA) at CNCB-NGDC (https://ngdc.cncb.ac.cn) under accession HRA012232.

Code availability

The code associated with this study is publicly available on https://github.com/wmy123-zju/ICC-PPS.

References

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 72, 7–33 (2022).
PubMed Google Scholar
Beal, E. W. et al. Cohort contributions to trends in the incidence and mortality of intrahepatic cholangiocarcinoma. Hepatobiliary Surg. Nutr. 7, 270–276 (2018).
Article PubMed PubMed Central Google Scholar
Moris, D. et al. Advances in the treatment of intrahepatic cholangiocarcinoma: an overview of the current and future therapeutic landscape for clinicians. CA Cancer J. Clin. 73, 198–222 (2023).
PubMed Google Scholar
Endo, I. et al. Intrahepatic cholangiocarcinoma: rising frequency, improved survival, and determinants of outcome after resection. Ann. Surg. 248, 84–96 (2008).
Article PubMed Google Scholar
El-Diwany, R., Pawlik, T. M. & Ejaz, A. Intrahepatic cholangiocarcinoma. Surg. Oncol. Clin. N. Am. 28, 587–599 (2019).
Article PubMed Google Scholar
Farges, O. et al. Influence of surgical margins on outcome in patients with intrahepatic cholangiocarcinoma: a multicenter study by the AFC-IHCC-2009 study group. Ann. Surg. 254, 824–829 (2011). discussion 830.
Article PubMed Google Scholar
Dong, L. Q. et al. Proteogenomic characterization identifies clinically relevant subgroups of intrahepatic cholangiocarcinoma. Cancer Cell 40, 70 (2022).
Article CAS PubMed Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Article CAS PubMed PubMed Central Google Scholar
Rizvi, S., Khan, S. A., Hallemeier, C. L., Kelley, R. K. & Gores, G. J. Cholangiocarcinoma—evolving concepts and therapeutic strategies. Nat. Rev. Clin. Oncol. 15, 95–111 (2018).
Article CAS Google Scholar
Fiz, F. et al. Radiomics of intrahepatic cholangiocarcinoma and peritumoral tissue predicts postoperative survival: development of a CT-based clinical-radiomic model. Ann. Surg. Oncol. 31, 5604–5614 (2024).
Article PubMed Google Scholar
Ding, G. Y. et al. Mining the interpretable prognostic features from pathological image of intrahepatic cholangiocarcinoma using multi-modal deep learning. BMC Med. 22, https://doi.org/10.1186/s12916-024-03482-0 (2024).
Ma, X. et al. MRI features predict microvascular invasion in intrahepatic cholangiocarcinoma. Cancer Imaging 20, 40 (2020).
Article PubMed PubMed Central Google Scholar
Mao, S. et al. Development and validation of a novel preoperative clinical model for predicting lymph node metastasis in perihilar cholangiocarcinoma. BMC Cancer 24, 297 (2024).
Article CAS PubMed PubMed Central Google Scholar
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bera, K., Braman, N., Gupta, A., Velcheti, V. & Madabhushi, A. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 19, 132–146 (2022).
Article CAS PubMed Google Scholar
Haug, C. J. & Drazen, J. M. Artificial intelligence and machine learning in clinical medicine, 2023. N. Engl. J. Med. 388, 1201–1208 (2023).
Article CAS PubMed Google Scholar
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med Image Anal. 81, 102559, https://doi.org/10.1016/j.media.2022.102559 (2022).
Article PubMed Google Scholar
Chen, Z. F. et al. Predicting gastric cancer response to anti-HER2 therapy or anti-HER2 combined immunotherapy based on multi-modal data. Signal Transd. Target. Ther. 9, https://doi.org/10.1038/s41392-024-01932-y (2024).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, D. et al. Prognostic and predictive value of a pathomics signature in gastric cancer. Nat. Commun. 13, 6903 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chelebian, E., Avenel, C. & Wählby, C. Combining spatial transcriptomics with tissue morphology. Nat. Commun. 16, https://doi.org/10.1038/s41467-025-58989-8 (2025).
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Article CAS PubMed Google Scholar
Hong, L. B. et al. Spatial single-cell proteomics landscape decodes the tumor microenvironmental ecosystem of intrahepatic cholangiocarcinoma. Hepatology, https://doi.org/10.1097/hep.0000000000001283 (2025).
Ruan, J. et al. EMLI-ICC: an ensemble machine learning-based integration algorithm for metastasis prediction and risk stratification in intrahepatic cholangiocarcinoma. Brief Bioinform. 23, https://doi.org/10.1093/bib/bbac450 (2022).
Zuyin, L. et al. Single-cell and spatial transcriptomics delineate the microstructure and immune landscape of intrahepatic cholangiocarcinoma in the leading-edge area. Adv. Sci. 12, e2412740, https://doi.org/10.1002/advs.202412740 (2025).
Article CAS Google Scholar
Banales, J. M. et al. Cholangiocarcinoma 2020: the next horizon in mechanisms and management. Nat. Rev. Gastroenterol. Hepatol. 17, 557–588 (2020).
Article PubMed PubMed Central Google Scholar
Jusakul, A. et al. Whole-genome and epigenomic landscapes of etiologically distinct subtypes of cholangiocarcinoma. Cancer Discov. 7, 1116–1135 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fiz, F. et al. Radiomic analysis of intrahepatic cholangiocarcinoma: non-invasive prediction of pathology data: a multicenter study to develop a clinical–radiomic model. Cancers 15, 4204 (2023).
Article PubMed PubMed Central Google Scholar
Cao, Q.-H. et al. Pathomics-integrated multimodal model evaluate chemoimmunotherapy efficacy in unresectable intrahepatic cholangiocarcinoma. JHEP Rep. 101557, https://doi.org/10.1016/j.jhepr.2025.101557 (2025).
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Article CAS PubMed Google Scholar
Sammut, S. J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).
Article CAS PubMed Google Scholar
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J. Clin. Oncol. 31, 1188–1195 (2013).
Article PubMed Google Scholar
Zhang, X. et al. Development and interpretation of a pathomics-driven ensemble predictive model for prognosis of intrahepatic cholangiocarcinoma. Eur. J. Surg. Oncol. 51, 110295, https://doi.org/10.1016/j.ejso.2025.110295 (2025).
Article PubMed Google Scholar
Jolissaint, J. S. et al. Machine learning radiomics can predict early liver recurrence after resection of intrahepatic cholangiocarcinoma. HPB 24, 1341–1350 (2022).
Article PubMed Google Scholar
Liu, Z. et al. Noninvasive prediction of perineural invasion in intrahepatic cholangiocarcinoma by clinicoradiological features and computed tomography radiomics based on interpretable machine learning: a multicenter cohort study. Int J. Surg. 110, 1039–1051 (2024).
Article PubMed Google Scholar
Alaimo, L. et al. Development and validation of a machine-learning model to predict early recurrence of intrahepatic cholangiocarcinoma. Ann. Surg. Oncol. 30, 5406–5415 (2023).
Article PubMed Google Scholar
Sahai, E. et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat. Rev. Cancer 20, 174–186 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kalluri, R. The biology and function of fibroblasts in cancer. Nat. Rev. Cancer 16, 582–598 (2016).
Article CAS PubMed Google Scholar
Li, G. et al. Transformer-based AI technology improves early ovarian cancer diagnosis using cfDNA methylation markers. Cell Rep. Med. 5, 101666, https://doi.org/10.1016/j.xcrm.2024.101666 (2024).
Article CAS PubMed PubMed Central Google Scholar
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128 (2006).
Article PubMed Google Scholar
van Griethuysen, J. J. M. et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 77, e104–e107 (2017).
Article PubMed PubMed Central Google Scholar
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1107–1110, https://doi.org/10.1109/ISBI.2009.5193250 (2009).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, https://doi.org/10.1186/gb-2006-7-10-r100 (2006).
Vaswani, A. et al. Attention Is All You Need. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 5998–6008 (Curran Associates, Inc., 2017).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2024C03175), National Natural Science Foundation of China (82473004, 82402729), Zhejiang Provincial Natural Science Foundation of China (LY22H160019, LQ23H200003), Beijing Xisike Clinical Oncology Research Foundation (Y-MSDZD2022-0161) and A Project Supported by Scientific Research Fund of Zhejiang Provincial Education Department (Y202045631).

Author information

These authors contributed equally: Mingyu Wan, Yongfeng Ding, Yanli Wang.

Authors and Affiliations

Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, Hangzhou, China
Mingyu Wan, Yongfeng Ding, Yunlu Jia, Chanqi Ye, Ruyin Chen, Qiong Li, Yuqing He, Keyi Xu, Nong Xu & Jian Ruan
Department of Pathology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Yanli Wang
The First Clinical School of Medicine, Zhengzhou University, Zhengzhou, China
Siqi Wu
Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Wenxin Qu, Yifan Xu & Yifei Shen
Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, China
Wenguang Fu
Departments of Biology and Public Health Sciences, University of Virginia, Charlottesville, VA, USA
Michael P. Timko
Department of Pharmacological Sciences, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, USA
Ledong Wan
Department of Medicine, Monash University, Clayton, VIC, Australia
Le Ying
Department of Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, China
Jinzhang Chen
Department of Oncology, Shunde Hospital, Southern Medical University, Shunde, China
Dayong Zheng
Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, TX, USA
Yifei Shen

Authors

Mingyu Wan
View author publications
Search author on:PubMed Google Scholar
Yongfeng Ding
View author publications
Search author on:PubMed Google Scholar
Yanli Wang
View author publications
Search author on:PubMed Google Scholar
Yunlu Jia
View author publications
Search author on:PubMed Google Scholar
Siqi Wu
View author publications
Search author on:PubMed Google Scholar
Wenxin Qu
View author publications
Search author on:PubMed Google Scholar
Yifan Xu
View author publications
Search author on:PubMed Google Scholar
Wenguang Fu
View author publications
Search author on:PubMed Google Scholar
Michael P. Timko
View author publications
Search author on:PubMed Google Scholar
Ledong Wan
View author publications
Search author on:PubMed Google Scholar
Le Ying
View author publications
Search author on:PubMed Google Scholar
Chanqi Ye
View author publications
Search author on:PubMed Google Scholar
Ruyin Chen
View author publications
Search author on:PubMed Google Scholar
Qiong Li
View author publications
Search author on:PubMed Google Scholar
Yuqing He
View author publications
Search author on:PubMed Google Scholar
Keyi Xu
View author publications
Search author on:PubMed Google Scholar
Nong Xu
View author publications
Search author on:PubMed Google Scholar
Jinzhang Chen
View author publications
Search author on:PubMed Google Scholar
Dayong Zheng
View author publications
Search author on:PubMed Google Scholar
Yifei Shen
View author publications
Search author on:PubMed Google Scholar
Jian Ruan
View author publications
Search author on:PubMed Google Scholar

Contributions

J.R., Y.S., Y.D., and N.X. contributed to study concept, design, and supervision; M.W., L.W., and Y.S. contributed to bioinformatics and statistical analysis; W.F., Y.W., J.C., Y.J., and D.Z. contributed to the acquisition of samples and data; W.Q., Y.X., M.P.T., and S.W. contributed to experimental procedures; L.Y., C.Y., R.C., and Q.L. contributed to data interpretation; M.W., Y.D., Y.H., K.X., and S.W. contributed to the discussion of results; M.W., Y.D., and J.R. contributed to drafting and writing the final manuscript. All authors have read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Jinzhang Chen, Dayong Zheng, Yifei Shen or Jian Ruan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wan, M., Ding, Y., Wang, Y. et al. Interpretable multimodal deep learning improves postoperative risk stratification in intrahepatic cholangiocarcinoma in multicentre cohorts. npj Digit. Med. 9, 95 (2026). https://doi.org/10.1038/s41746-025-02282-x

Download citation

Received: 14 October 2025
Accepted: 11 December 2025
Published: 29 December 2025
Version of record: 29 January 2026
DOI: https://doi.org/10.1038/s41746-025-02282-x