Fig. 4: Performance of different KRAS dependency modeling strategies and predictor analysis in KRASwt cell lines. | npj Systems Biology and Applications

Fig. 4: Performance of different KRAS dependency modeling strategies and predictor analysis in KRASwt cell lines.

From: Diffusion kernel-based predictive modeling of KRAS dependency in KRAS wild type cancer cell lines

Fig. 4

a Correlation analysis (Pearson’s r) in independent test sets between the experimentally determined KRAS cancer cell line dependency and our machine learning-based predictions for varying sets of predictors (see methods). Results are shown for models using all available predictors of the RNA sequencing data (total), all available predictors of the protein interaction network (net) and predictors selected by the diffusion kernel with hyperparameter optimization (kernel). Models were based on KRASwt cell lines of the different datasets (crispr - Achilles CRISPR effect data (n = 567), rnai - DRIVE RNAi (DEMETER2) data (n = 487)). In case of the diffusion kernel variable selection workflow maximum correlation was reached with a hyperparameter constellation using 500 predictors. For complete results of hyperparameter tuning see Supplementary Data 4. b Performance (Pearson’s r) of KRAS dependency models in KRASwt group compared between the different approaches (RAE – RAE-based models (CRISPR data), Loboda – Models using RNA expression of the gene selection by Loboda et al. (CRISPR data), Singh – Models using RNA expression of the gene selection genes by Singh et al. (CRISPR data), CRISPR – Best performing models using RNA expression of the gene selection by the diffusion kernel with optimized hyperparameters (CRISPR data), RNAi – Best performing models using RNA expression of the gene selection by the diffusion kernel with optimized hyperparameters (RNAi data)). For CRISPR/RNAi correlation analysis was performed similarly to (a). Correlation coefficients for RAE, Loboda and Singh were determined as described above. c Absolute error of CRISPR/RNAi models for each cell line using mutation- and best performing RNA-predictor set. Summarized results of 400 unique models are shown in the two waterfall plots. Cell lines were ordered by ascending observed KRAS dependency from left to right. The absolute error was estimated by summing the individual absolute differences of the predicted values from the observed values. d Correlation analysis (Pearson’s r, nCRISPR = 567, nRNAi = 487) performed similarly to (a) this time comparing models using different algorithms (Elastic Net regression - enet, Random Forest regression – forest, Lasso regression - lasso). Neither Elastic net nor Random Forest Regression could improve the Lasso predictions of KRAS dependency. e Occurrence frequency of RNA-predictors in 12000 unique models of KRAS dependency (CRISPR/RNAi) in KRASwt cancer cell lines. Only models using the variable selection by the diffusion kernel were included. Negative values indicate the frequency of how often the predictor had a negative coefficient in the models (associated with higher KRAS dependency), positive values the frequency of how often the predictor had a positive coefficient (associated with lower KRAS dependency). The 25 most redundant genes are shown here.

Back to article page