Fig. 2: Unsupervised approach generates clusters enriched in tiles from patients with poor outcome, with good representation of the three cohorts, and predicting disease-free survival while providing tile clusters important for that prediction.

a UMAP with the 26 Leiden clusters found at resolution 0.75. b PAGA representation of the Leiden clusters with node connections. The size of the nodes is proportional to the number of tiles and their color is proportional to the proportion of tiles associated with good/poor outcome patients. c UMAP with colors showing tiles associated with good/poor outcome patients (green/orange). Each dot is a tile. d Univariate analysis comparing the c-index (average of a 3-fold cross validation) for the prediction of the RFS for the development cohort (NYU + UCSF) and on the external cohorts (CAUSA, Mayo). c-index below 0.5 (green) indicates lower risk of poor outcome, while c-index above 0.5 (orange) indicates higher risk of poor outcome. e Details of panel c for two clusters where the development cohort and the external cohort show the same trend (See Supplementary Fig. 6 for all clusters). Error bars show the confidence interval. f Projection on the PAGA of the HPCs showing coherent trends for both the cross-validation on the development cohort, and on the external cohorts. g Kaplan–Meier curve of predicted high and low risk patients of having a poor outcome from the unsupervised HPL approach using a Cox regression, 3-fold cross-validation on the development cohort (NYU + UCSF). First row is computed using the whole dataset, while second and third show a subset of patient with stage T2a (BWH staging) and T2 (AJCC staging) only. Error bars show 95% confidence interval (CI). 95% CI of hazard ratio (logrank) is shown between brackets. The median value computed on the whole dataset is used to split low from high risk patients. h Same as g but using the Mayo as a test cohort. i Same as g but using the CAUSA as a test cohort.