Fig. 1: Overview of TITAN.
From: A multimodal whole-slide foundation model for pathology

a, Tissue site distribution of Mass-340K used for TITANV pretraining (stage 1). Mass-340K includes 335,645 WSIs across 20 organs with a mix of tissue sections stained with H&E (89.7%), IHC (7.9%), special stains (2.3%) and others (0.1%) or a mix of neoplastic (70.0%), tissue damage response (8.4%), normal (4.7%), inflammatory (3.4%) and others (13.5%) scanned with diverse scanner types. TITAN pretraining (stages 2 and 3) uses a subset of Mass-340K with paired captions and medical reports. b–d, Block diagram of TITANV pretraining. b, TITAN uses a ViT to encode a WSI into a slide embedding. c, TITANV (stage 1) is pretrained using SSL with student–teacher knowledge distillation. d, TITAN (stage 2 and 3) is pretrained using vision-language modeling, first by aligning the slide embedding with synthetic captions (stage 2) and then with medical reports (stage 3). e, UMAP visualization of TCGA slide embeddings obtained with TITAN, color-coded by organ. UMAP, uniform manifold approximation and projection; px, pixel.