Fig. 4: Competitive decoy-donors are specifically surrounding annotated-donors.
From: Empirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data

a The relative Donor Frequency (DF) of all decoy-donors within 150 nt of annotated-donors (decoy-donor DF / annotated-donor DF). Plots are shown for +/−150 nt of annotated-donors due to the steeply declining number of exons longer than this. Decoy-donors with a stronger DF score than the annotated-donor are shown in red, otherwise grey. b Depletion of GT decoy-splice sites (observed/expected) (see Methods and Supplementary Fig. 4). Exonic donors where use of the decoy-donor would be in-frame are shown in orange, whereas those out-of-frame (or intronic) are shown in red. GT decoy-donors show increasing exonic depletion approaching the annotated-donor, and negligible depletion in the intron. c Decoy-donors in-frame and closer to the annotated-donor are more likely to be present in 40,233 publicly available RNA-seq samples (40K-RNA). At each distance from the annotated splice-site, the number of decoy-donors present in 40K-RNA is divided by the total number of naturally occurring decoy-donors at that position d depletion of GT decoy-donors as in b, split according to decoy-donor DF relative to the annotated-donor (decoy-donor DF/annotated-donor DF). There is negligible depletion of decoy-donor sequences that do not exist as a bonafide donor in GRCh37 (DF = 0, grey), with increasing depletion of exonic decoy-donors closer in DF to the annotated donor (blue gradient). e Proportion of GT decoy-donors seen in 40K-RNA as in c, split as in d. Decoy-donors closer to the annotated-donor and with higher DF relative to the annotated-donor are more likely to be present in 40K-RNA. Lines show LOESS smoothing (locally weighted smoothing i.e., trendlines) with confidence bands in grey.