Fig. 2: Simulation benchmarking. | Nature Communications

Fig. 2: Simulation benchmarking.

From: Prioritizing perturbation-responsive gene patterns using interpretable deep learning

Fig. 2

A 6 different perturbations were applied on the control slice (slice 0) in silico, obtaining 6 new slices (details in Methods). B The control slice (slice 0) was compared with each of slice 1 to slice 6 using River. C Benchmarking outcome for each method on six datasets. The performance of different methods is evaluated by F1-scores. River achieves the highest F1 score across six experiments with statistical significance (p-value < 0.05) Box plots display the median (center lines), inter-quartile range (hinges) and 1.5 × IQR (whiskers). One-sided Wilcoxon rank-sum tests were applied to F1-scores (n = 6 slices per method). P-values = 2.42 × 10⁻³, 2.16 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, 2.46 × 10⁻³, 2.46 × 10⁻³, 1.34 × 10⁻³, 2.46 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, 1.34 × 10⁻³, respectively. D Benchmarking results summary for top-k parameter dependency methods among different k values in F1 scores. X-axis: different k choices. Y-axis: F1-score. E Comparison of score distribution between River and Sepal. River’s attribution method is IG for this figure, other two methods are also compared with Sepal in Supplementary Fig. 2. For each dataset, the left line chart indicates the score value for each gene, where positive genes (Ground truth DSEP genes) are expected to obtain larger scores compared with remaining negative genes. Violin plots on the right show the score distributions for DSEP (positive) and non-DSEP (negative) genes across two methods, River and Sepal. Each violin includes 200 DSEP genes and 900 non-DSEP genes per dataset (n = 200 for River(pos) and Sepal(pos); n = 900 for River(neg) and Sepal(neg); each point represents the score of a single gene). Violin plots display the full distribution, with internal box plots indicating the median (center lines), interquartile range (hinges), and 1.5× interquartile range (whiskers). One-sided Wilcoxon rank-sum tests were used to assess group differences. Exact p-values for River(pos) vs. River(neg) across datasets 1–6 are: 2.28 × 10⁻⁵⁶, 1.17 × 10⁻⁶⁰, 1.00 × 10⁻⁶⁷, 9.11 × 10⁻⁶³, 3.18 × 10⁻⁵⁷, and 3.98 × 10⁻⁶⁴. Corresponding p-values for Sepal(pos) vs. Sepal(neg) are: 0.380, 0.693, 0.261, 0.506, 0.579, and 0.711.

Back to article page