Fig. 6

Overview of the pipeline, architectures, and endpoints. (a) Whole slide images are passed through a tissue segmentation network to isolate the desired tissue type. After segmentation, the remaining tissue regions are partitioned into smaller image patches. A pretrained feature extractor encodes the patches into feature representations. (b) During training, the feature vectors are passed through an attention layer to obtain attention scores for each tile indicating importance towards diagnosis. These scores are pooled with the feature vectors and passed to the final classification layer to obtain the slide-level classification. The strongly attended regions are used in the classification pipeline to provide self-guided supervision towards attending to important patches. (c) The inference pipeline reconstructs heatmaps from the attention scores to visualize important regions contributing to the final WSI prediction.