Fig. 6: Interpretation of the RetroNet neural network reveals known L1 retrotransposition hallmarks. | Nature Communications

Fig. 6: Interpretation of the RetroNet neural network reveals known L1 retrotransposition hallmarks.

From: Image-based DNA sequencing encoding for detecting low-mosaicism somatic mobile element insertions

Fig. 6: Interpretation of the RetroNet neural network reveals known L1 retrotransposition hallmarks.

a RetroNet’s predicted values are represented in pre-normalized probabilities (logits) and modeled by a generalized additive model, showing the average in a blue line and the 95% confidence intervals (CIs) in green. The logit (blue) for L1 insertions is positively correlated with the 5’-end positions in the true images (orange) but not the false images (gray). b RetroNet’s logit (blue) for L1 insertions is also positively correlated with the 3’-end positions in the true images (orange) and not the false images (gray). The scarcity of true L1 3’-end deletions results in wider 95% CIs (green) of 3’-ends in L1:0-5000bp. c Boxplots of logits by supporting read orientation: upstream + downstream, two upstream, and two downstream. d Boxplots of logits by supporting read category: clipped paired-end (clipped PE), paired-end (PE), and split reads (SR). e Per-base perturbation of the L1Hs:3-6062. Each site was tested by sampling 300 training images and permuting the allele to three other alternative bases, resulting in probability changes that are colored as down-regulated (blue, probability change <  -0.001 and adjusted P  <  0.05), significantly up-regulated (red, probability change > 0.001 and adjusted P  < 0.05), and not significant (gray). f Categories of perturbations with significant down-regulation in RetroNet, grouped by type of L1 consensus alleles: A (blue), C (dark brown), T (green), and G (light brown). g Three-base perturbations at L1Hs hallmark alleles (L1:5927-5929 ACA/G to GAG and L1:6010-6012 TAG to TAA) caused significant decreases in probabilities. h Permutation of 328 alleles from 37 active L1s showed that five high-frequency (over 19 of 37 active L1s) alleles (706/C, 3952/C, 5389/T, 5533/T, 5536/G) significantly increased probabilities. All statistics in (a–h) were based on images derived from the 1000 Genomes Project offspring (n = 549 individuals). P-values in (c–e, g–h) were calculated using two-sided Wilcoxon tests and adjusted by the Benjamini-Hochberg correction. Box plots show the median (horizontal line within box), the first to third quartiles (Q1–Q3, 25th–75th percentiles) as the box, and whiskers extending to values within 1.5 × IQR (the interquartile range).

Back to article page