Fig. 2: Overview of the study workflow, from data collection to model evaluation.
From: Establishing predictive machine learning models for drug responses in patient derived cell culture

Four datasets (GDSC1, GDSC2, PRISM, RX) were pre-processed to remove missing values and imputed where necessary. The data were randomly split into training (80%), validation (10%), and test (10%) sets, with RX using leave-one-out cross-validation. A machine learning model was trained on historical drug response data to predict drug sensitivities for new patient-derived cell lines. Model performance was evaluated using Pearson correlation, Spearman correlation, RMSE, accuracy, and hit rate metrics.