Fig. 5: PDGrapher shows robust performance across training strategies, PPI networks and data availability settings. | Nature Biomedical Engineering

Fig. 5: PDGrapher shows robust performance across training strategies, PPI networks and data availability settings.

From: Combinatorial prediction of therapeutic perturbations using causally inspired neural networks

Fig. 5

a, Performance of PDGrapher in the prediction of unseen approved drug targets to reverse disease effects across all cell lines with healthy counterparts in chemical perturbation datasets. Individual data points represent individual cell lines (n = 6). b, Performance of sensitivity analyses evaluated by the percentage of accurately predicted samples for cell lines MDAMB231 and MCF7 under chemical and genetic perturbations, respectively. The PPI network used here is from STRING (string-db.org) with a confidence score for each edge. The edges are filtered by the 0.1, 0.2, 0.3, 0.4 and 0.5 quantiles of the confidence scores as cut-offs, resulting in 5 PPI networks with 625,818, 582,305, 516,683, 443,051 and 296,451 edges, respectively. Data are presented as mean values across five cross-validation data splits per PPI confidence quantile. Shaded bands represent ±1 s.d. from the mean (n = 5 computational replicates per quantile). Each point corresponds to performance on a specific data split. c, Performance metrics of the ablation study on PDGrapher’s objective function components: PDGrapher-Cycle trained using only the cycle loss, PDGrapher-SuperCycle trained using the supervision and cycle loss, and PDGrapher-Super trained using only the supervision loss, evaluated by percentage of accurately predicted samples. PDGrapher-Cycle shows inferior performance, resulting in limited visibility in the bar plot. d, Performance metrics of the second ablation study on PDGrapher’s input data: PDGrapher—no disease intervention data using only treatment intervention data, and PDGrapher using both disease and treatment intervention data. The disease and treatment intervention data are organized as ‘healthy, mutation, disease’ and ‘diseased, drug, treated’, respectively. In c and d, bars show the average performance across five cross-validation test splits for each of the nine chemical datasets. The overlaid points represent performance values from individual data splits (n = 5 per cell line). The dashed horizontal lines represent the average performance across all cell lines. Each data split contains 20% samples in the dataset, with each sample corresponding to a perturbation-response instance. Where replicates exist for a given drug, they are treated as independent inputs during training and evaluation. P values from the statistical tests are provided in the Source data.

Source data

Back to article page