Fig. 1: Reference dataset statistics.

A The proportion of all mutations per variant classification that are retained in the missplicing and deleterious missplicing subsets reveals that predicted deleterious mutations come from all regions, and more significantly from splice sites. B A plot describing the distribution of variants per gene per patient shows that typically genes have between 1 and 2 mutations. C A breakdown of the cancer types analyzed and how many patients each project includes, with BRCA being the largest in terms of cohort size; the number of cancer-specific deleterious mutations in each cancer type is also displayed (cancer-specific mutations are variants found only among patients with one cancer type). D Most of the deleterious mutations that induce a missed acceptor fall on or around the splice site motif, as do most of the mutations that induce a missed donor, though there are several variants that disrupt both junctions from hundreds of nucleotides away. E The proportion of all unique mutations in each variant type category in the TCGA set available and in the predicted deleterious subset indicates that most somatic mutations analyzed are SNVs, while insertions seem to proportionally induce more splice site alterations as is indicated by their higher composition among deleterious variants.