Fig. 5: Benchmarking stringency cutoffs in simulated imbalanced datasets. | Nature Communications

Fig. 5: Benchmarking stringency cutoffs in simulated imbalanced datasets.

From: Image-based DNA sequencing encoding for detecting low-mosaicism somatic mobile element insertions

Fig. 5: Benchmarking stringency cutoffs in simulated imbalanced datasets.

a Precision-recall (PR) curves of RetroNet to identify resampled L1 insertions with various levels of noise, at signal-to-noise ratios (SNRs) of 1:1 (yellow), 1:10 (light green), 1:100 (dark green), and 1:1000 (purple). The solid line represents the average PR curve from 100 simulations, and the ribbon around the line indicates the 95% confidence interval. b The impact of more stringent probability cutoffs (from 0.5 to 0.99, blue to red) on the precision and recall of RetroNet, when applied to imbalanced datasets with an SNR of 1:1 (cross), 1:10 (circle), 1:100 (triangle), and 1:1000 (square). Despite the additional noise, RetroNet could still achieve high precision in highly imbalanced datasets while managing reasonable levels of recall by choosing a higher stringency cutoff.

Back to article page