Fig. 4: PDGrapher generalizes to new (previously unseen) cell lines and learns optimal chemical perturbagens in held-out folds that contain both new cell lines and new samples.
From: Combinatorial prediction of therapeutic perturbations using causally inspired neural networks

a,b, PDGrapher shows improved performance when trained on nine chemical perturbation datasets spanning various diseases and evaluated on the remaining eight cell lines. It achieves up to 8.67% more accurately predicted samples in the testing sets compared with the second-best baseline (for example, when trained on chemical-PPI-prostate-PC3, 12.81% versus 4.13% (a)) and an nDCG value of up to 0.03 higher than the second-best baseline (for example, when trained on chemical-PPI-colon-HT29, 0.19 versus 0.16 (b)). In a and b, the bars show the average performance across five cross-validation test splits for each of the nine chemical datasets. The overlaid points represent performance values from individual data splits (n = 5 per cell line). Each data split contains 20% samples in the dataset, with each sample corresponding to a perturbation-response instance. Where replicates exist for a given drug, they are treated as independent inputs during training and evaluation. c, PDGrapher recovers ground-truth therapeutic targets at higher rates (evaluated by recall 1–100) compared with competing methods for chemical-PPI datasets. d, Box plots show the distribution of average model rankings across 9 cell lines (n = 9); each dot corresponds to the aggregated ranking value across cross-validation splits, train cell lines and across all metrics for a distinct cell line. A higher value indicates better performance. The central line inside the box represents the median, while the top and bottom edges correspond to the first and third quartiles. The whiskers extend to the smallest and largest values within 1.5× the interquartile range from the quartiles. Each dot represents a data point for a specific cell line and metrics. P values from the statistical tests are provided in the Source data. e, Shown is the difference of shortest-path distances between ground-truth therapeutic genes and predicted genes by PDGrapher and a random reference across nine cell lines. Predominantly negative values indicate that PDGrapher predicts sets of therapeutic genes that are closer in the network to ground-truth therapeutic genes compared with what would be expected by chance (average shortest-path distances across cell lines for PDGrapher versus random reference = 2.75 versus 3.11).