Fig. 5: Prediction of clinically relevant downstream tasks with virtually multiplexed data.

a, Patch extraction and computation of patch features with a frozen ResNet-50 model (blue trapezoid). b, Overview of the GT model, implemented by first constructing a patch-level graph representation, followed by a transformer that processes the graph representation to predict clinically relevant endpoints. c, Training of GT models (green trapezoid) under three different settings, depending on the integration strategy. d, Prediction results of overall survival status (left: 0, alive/censored; 1, prostate cancer related death) and disease progression (right: 0, no recurrence; 1, recurrence). Barplot colours indicate one of the five combinations of training setting and input data used (see legend). For each combination, barplot heights and error bars indicate the mean and standard deviation of the weighted F1 score, as computed in the held-out test set from three independent runs with different initializations. The exact number of training samples used in each cases is given on the top of the barplots. aFor all multimodal models, the reported number refers to the union across all markers. MM-R-L, multimodal–real–late fusion; MM-V-E, multimodal–virtual–early fusion; MM-V-L, multimodal–virtual–late fusion.