Fig. 1: Overview of the study.
From: A multimodal knowledge-enhanced whole-slide pathology foundation model

a The workflow in clinical practice for diagnosis, treatment and prognosis of oncology, which primarily involves three common modalities data: WSIs, pathology reports and gene expression profiles. b The overview of mSTAR paradigm. mSTAR consists of two stages: 1) Slide-level Contrastive Learning, and 2) Patch-level Self-Taught Training. c–e statistics of data used in this study, including (c) Venn Graph of cases across various modalities, d the number of cases in pretraining data across different cancer types. e the distribution of word count for pathology reports. f evaluation scheme in this study: including held-out, independent, external and zero-shot. The illustration is presented in Sec. ? g the distribution of datasets across different types of tasks for different evaluation scheme, and the detailed information about every dataset is presented in Supplementary Table 1. h The average performance spanning 15 types of 97 tasks across 7 categories of applications: Pathological Diagnosis, Molecular Prediction, Report Generation, Survival Prediction, Multimodal Fusion, Zero-shot Slide Classification, and Zero-shot Slide Retrieval. Zero-shot tasks, which require a well-aligned vision-language space, are evaluated for vision-language models only, i.e., PLIP, CONCH and mSTAR. Source data are provided as a Source Data file and presented in Supplementary Table 2 as well. This figure was created in BioRender. Zhou, Z. (https://BioRender.com/r035ixv).