Fig. 1: Overview of Prov-GigaPath.
From: A whole-slide foundation model for digital pathology from real-world data

a, Flow chart showing the model architecture of Prov-GigaPath. Prov-GigaPath first serializes each input WSI into a sequence of 256 × 256 image tiles in row-major order and uses an image tile-level encoder to convert each image tile into a visual embedding. Then Prov-GigaPath applies a slide-level encoder based on the LongNet architecture to generate contextualized embeddings, which can serve as the basis for various downstream applications. b, Image tile-level pretraining using DINOv2. c, Slide-level pretraining with LongNet using masked autoencoder. [CLS] is the classification token.