Extended Data Fig. 2: Splicing track performance. | Nature

Extended Data Fig. 2: Splicing track performance.

From: Advancing regulatory variant effect prediction with AlphaGenome

Extended Data Fig. 2

(a) Schematic overview of splice site (SS) classification, splice site usage (SSU) prediction, and splice junction (SJ) read count prediction tasks. (b) (left) Performance comparison (AUPRC) of SS classification and SJ classification against reference methods. ‘Baseline’ means the fraction of positive splice junctions in the evaluated data. Splice site classification is evaluated with both GTF (GENCODE v46) annotated splice sites only and also splice sites derived from GTEx RNA-seq data (Methods). Splice junction classification discriminates between true splice junctions observed from RNA-seq data versus false junctions not observed from RNA-seq (but where the splice sites are observed). Splice junction classification was evaluated per tissue and then the mean AUPRC across tissues were reported. (right) Performance comparison (Pearson r) of predicted vs. measured SSU and SJ counts (log(1+x) transformed). (c) Scatter plot between predicted and measured donor SSU across seven example human tissues (from GTEx). Pearson r in each tissue is displayed as text. (d) Scatter plot between predicted and measured splice junction counts across seven human tissues (from GTEx). Pearson r in each tissue is displayed as text. (e) Distribution of Pearson correlation coefficients between predicted and measured PSI3 per tissue (left), PSI5 per tissue (middle), and junction counts across tissues (measuring tissue specificity of the splice junction predictions).

Back to article page