Fig. 6: HIF-based prediction of molecular phenotypes.

a Receiver operator characteristic (ROC) curves for (i) PD-1, (ii) PD-L1, (iii) CTLA-4, (iv) HRD, and (v) TIGIT hold-out predictions across cancer types and pan-cancer. Skin cutaneous melanoma (SKCM) predictions were conducted only for TIGIT due to low sample sizes. Pan-cancer predictions use binary labels thresholded independently by cancer type. For TIGIT predictions, pan-cancer includes all five cancer types. For the remainder of predictions, pan-cancer includes all cancer types excluding SKCM. Random classifiers correspond to area under the ROC curve (AUROC) = 0.50. b Visualization of predictive human-interpretable image features (HIFs) for each molecular phenotype. Boxplots show the top five most predictive HIF clusters for each phenotype in pan-cancer models. For TIGIT predictions, pan-cancer models only included three non-zero HIF clusters. Clusters are ranked by the maximum absolute ensemble beta across HIFs in a given cluster. Ensemble betas are computed per HIF as the average across the three models incorporated into the final ensemble evaluated on the hold-out set. The center and bounds of each boxplot represent the median and interquartile range (IQR; 25th, 75th percentiles) for HIF betas in each cluster, respectively. Upper and lower boxplot whiskers represent the smaller of the maximum beta value or the 75th percentile + 1.5 × IQR and the larger of the minimum beta value or the 25th percentile − 1.5 × IQR, respectively. Each cluster is labeled with a representative HIF corresponding to the maximum absolute ensemble beta value. The number of ensemble betas (HIFs) used to derive each boxplot is: 32, 49, 32, 9, and 11 (from top to bottom) for PD-1 clusters; 8, 30, 49, 20, and 70 for PD-L1 clusters; 38, 4, 14, 77, and 20 for CTLA-4 clusters; 7, 15, 11, 8, and 19 for HRD clusters; and 26, 22, and 2 for TIGIT clusters (see Supplementary Data 1 for the number of HIFs per cluster). In cases where that HIF is difficult to interpret, a more interpretable HIF within a fivefold difference of the maximum ensemble beta is presented (indicated by a black asterisk). As absolute values were used for ranking, HIFs with negative ensemble betas are denoted by a red asterisk. Boxplots of predictive HIF clusters for cancer type-specific models are included in Supplementary Fig. 11. Radar charts show the normalized magnitude of ensemble betas in pan-cancer models stratified across nine HIF axes, corresponding to the five cell types, three tissue types, and cancer–stroma interface (CSI). Normalized magnitudes were computed as the sum of absolute ensemble betas for HIFs associated with each axis divided by the total number of HIFs associated with the said axis (e.g., all HIFs involving fibroblasts). Multiple predictive HIFs are visualized with overlaid cell- or tissue-type heatmaps in Fig. 3. Tumor regions include cancer tissue (CT), cancer-associated stroma (CAS), and a combined CT + CAS.