Fig. 6: Performance of Survival Prediction on 16 datasets.
From: A multimodal knowledge-enhanced whole-slide pathology foundation model

a Comparison of C-Index between mSTAR and compared methods on 9 held-out datasets. b Comparison of C-Index between mSTAR and compared methods on 4 external datasets. The red lines and the values reported at the top of figures (a, b) refer to the averaged performance across datasets. Each point represents a dataset, with the size of the point indicating the standard deviation. c Task distribution of various survival endpoints for different evaluation. d The performance (C-Index and 95% CI) on independent cohorts. `out' refers to the partitions held out from pretraining data. `idpt' means independent datasets with a data source that differs from the pretraining data. `ext' represents external datasets where data originates from a source distinct from the training data used for fine-tuning and is used solely for testing, without any training involved. Error bars represent 95% CI with 1000 bootstrap replicates for all bar plots. P-value for every group of experiments is given through one-sided Wilcoxon signed-rank test between mSTAR and the second-best FM. * represents P < 0.05, ** means P < 0.01 and *** indicates P < 0.001. Detailed performances of every dataset are presented in Supplementary Table 18. Source data are provided as a Source Data file.