Fig. 5: Experimental pipeline of attMIL-based prediction models.

Schematic of experimental setup employed in this study showing the (a) architecture of the attention-based multiple instance learning pipeline; (b) cohorts used for training and cross-validation (TCGA-BRCA FBC), and external validation (FBC and MBC); (c) schematic of 5-fold cross-validation during which the cohort is divided into 5 equal sets. In each fold, the model is trained on 4/5th of the data and tested on the remaining 1/5th. This is repeated 5 times, such that each set is used as the test set once. This ensures that the model is tested on multiple and mutually exclusive subsets of the data, providing a representative evaluation of the dataset. Figure created with BioRender.com. *FBC external validation cohort composed of cases from Breast Cancer Now Tissue Bank and CPTAC-BRCA dataset. **MBC composed of cases from the Male Breast Cancer Consortium, NHS Greater Glasgow and Clyde Biorepository, NHS Grampian Biorepository, Northern Ireland Biobank, Wales Cancer Biobank, Breast Cancer Now Tissue Bank, and TCGA-BRCA dataset.