Fig. 8: Top markers discovered using the feature importance modules of Flexynesis in predicting the drug response values (trained on the CCLE dataset and evaluated on the GDSC2 dataset).

A fully connected network (DirectPred), a supervised variational autoencoder (supervised_vae), and a graph-convolutional network (GNN-SAGE) was trained on three combinations of data modalities: using only mutations; using mutations and RNA expression; using mutations, RNA expression, and copy number alterations. A Best performing model+data type combination for each drug is displayed. Color scale (shades of blue to red) reflects the pearson correlation score. B The top 10 markers (in the y-axis) discovered for each drug (based on the best performing model + data type combination depicted in panel (A). The markers are both labeled and colored by the corresponding data modality (dark blue: RNA expression, light blue: Mutation). The markers that are already known to be indicator markers for the corresponding drug according to the CIViC (Clinical Interpretation of Variants in Cancer) database are labeled as “Known Target (CIVIC)”. The x-axis displays the relative importance of the top markers, where the best marker has a value of 1. While most drugs have dominantly mutation markers in the top 10, the best performing models always have RNA expression as an additional data modality. Source data are provided as a Source Data file.