Fig. 5: Classification model.

a Schematic overview of classifier generation. b Violin plots of AUC, sensitivity, and specificity across 100 randomly selected four-protein classifier combinations. Boxplots show median (center line), interquartile range (IQR, extending from the 1st to the 3rd quartile, box), and 1.5 IQR (whiskers). c 100-fold bootstrapped ROC curves of the classifier as trained and tested on tumor samples in the MSKCC-ICC dataset, which was iteratively split into randomized training (75%) and test (25%) sets. Shaded regions represent 95% bootstrap confidence intervals around the mean ROC curves. d PCA of FU-ICC cohort indicating assigned clusters (ECM in blue; proliferation in orange). e Kaplan–Meier curve incl. log-rank test comparing overall survival in the FU-ICC cohort between classifier-assigned ECM (n = 141) and proliferation (n = 73) clusters. Dotted lines in Kaplan-Meier curves indicate median survival (ECM in blue; proliferation in orange). f Correlation of protein logFC values between the MSKCC and FU ICC cohorts (ECM vs. proliferation clusters). Colors indicate proteins with concordant enrichment in the ECM (blue) or proliferation (orange) cluster across both cohorts. g 100-fold bootstrapped ROC curves of the applied classifier for the UKF-ICC cohort, as trained and tested on tumor samples in the MSKCC-ICC dataset, which was iteratively split into randomized training (75%) and test (25%) sets. Shaded regions represent 95% bootstrap confidence intervals around the mean ROC curves. h PCA of UKF-ICC cohort indicating assigned clusters (ECM in blue; proliferation in orange). i Kaplan–Meier curve incl. log-rank test comparing overall survival in the UKF-ICC cohort between classifier-assigned ECM (n = 18) and proliferation (n = 42) clusters. Dotted lines in Kaplan-Meier curves indicate median survival (ECM in blue; proliferation in orange). Source data are provided as a Source Data file.