Fig. 2: Benchmarking Oncosplice predictions using RNAseq and allele frequencies.

A, B Plotting the junction allele fraction for cryptic splice sites induced by variants against their SpliceAI probability reveals a significant correlation between the latter and discovered splice junction penetrance (as measured using RNAseq and MiSplice); we used a SpliceAI detection threshold of 0.25 and any site with a change in penetrance below this is not detected. Moreover, we only observed the positions for which the lifted hg38 coordinate of the MiSplice-identified cryptic splice site was within 3 nucleotides of the SpliceAI-identified cryptic splice site. C We calculate the discovery ratio as the proportion of splicing events identified in two separate RNAseq-based computational investigations that were properly predicted by Oncosplice. D A depletion of variants occurring in the general population among predicted deleterious mutations indicates the added insight Oncosplice generates on top of simply identifying missplicing mutations with SpliceAI. E We analyzed the mean Oncosplice score of variants binned based on gnomAD MAF into similarly sized sets (~1.4E4 mutations per bin) and reveal a significant correlation. F The mean gnomAD MAF is significantly lower among predicted deleterious variants than among missplicing variants, and significantly lower among missplicing variants than among non-missplicing variants. G A splice site mutation in MET’s 10th intron results in a skipped exon and deletes a large part of the protein’s functional domain.