Fig. 4: Cancer subtype clustering across four datasets.
From: VOLTA: an enVironment-aware cOntrastive ceLl represenTation leArning for histopathology

a, c Ovarian cancer and (b, d) endometrial cancer datasets are hierarchically clustered based on cell cluster proportions. To achieve this, we first train our model to deliver cell representations in a self-supervised manner. For the ovarian cases (a, c), our model will be applied to patches, a graph of cells will be built based on the cluster predictions, and the distribution of cell type clusters around each cell will be measured. Lastly, this distribution will be used to cluster the cases into distinct cohorts. In the case of endometrial cancer (b, d), we realize the cell count distribution across patches capture enough information for providing the separation. Therefore, after applying the model to each patch, we measure the distribution of cell type clusters across all the patches and use this distribution for a hierarchical clustering. In panel b, the supercluster on the right (yellow) demonstrates a cohort of patients that mostly have the POLE subtype (only one sample from p53abn is in this group), the supercluster in the middle (red) depicts mainly the MMRd patients (with only one POLE case misclassified), and the superclass on the left (purple) shows the p53abn cases with only one POLE case misplaced (Source data are provided as a Source Data file).