Fig. 3: Graph representation of patient data and integration of both EHR and genomics data are essential toward identifying patient subgroups with differential IO-treatment benefits.

In both volcano bubble plots, each bubble represents a patient subgroup, the x axis represents the difference of the estimated median survival times between a patient subgroup and the overall cohort as control. The vertical line marked zero median survival difference, with bubbles on the right of the vertical line showing the tendency of beneficial IO outcomes and bubbles on the left showing the tendency of IO non-beneficial outcomes. y axis is the −log10(FDR) of the corresponding log-rank test between a subgroup vs. the overall cohort with multiple-comparison adjustment by Benjamini–Hochberg procedure, representing the statistical significance of the observed survival difference. The horizontal dashed line marked the statistical significance cutoff of FDR of 0.05. a Integrating both EHRs and genomics is important for effective patient subgroup discovery on IO treatment benefits and setting the number of clusters (subgroups) to five outperforms cluster number of three or ten. We compared patient subgrouping using both types of features versus using EHR or genomics features alone. To make a robust comparison, we explored different number of resulting subgroups, including three, five, and ten subgroups respectively. Integrating both types of features discovers patient subgroups with significant IO beneficial and non-beneficial outcomes, while individual features alone do not identify any subgroup with significant IO beneficial or non-beneficial outcomes. In the setting of incorporating both genomic and clinical features, we obtained optimized results when the targeted number of clusters was set to 5, which is able to identify more patients with significant IO beneficial and non-beneficial outcomes and with stronger statistical confidence. b Graph representation of patient clinico-genomic data is important for effective patient subgroup discovery on IO treatment benefits. Subgrouping results compared with other methods demonstrates that graph representation of patient data (MGAE and GAE) discovers patient subgroups with significant IO beneficial and IO non-beneficial outcomes, while non-graph-based approaches (t-SNE, UMAP, autoencoder, and denoising autoencoder) did not identify any subgroups with significant IO beneficial or IO non-beneficial outcomes.