Fig. 6: Interpreting variant effects across modalities with AlphaGenome.
From: Advancing regulatory variant effect prediction with AlphaGenome

a, Non-coding cancer mutations in T-ALL. Overview of groups of mutations affecting TAL1 in patients with T-ALL. b, Detailed ALT–REF predictions for an oncogenic insertion (chr. 1: 47239296: C>ACG) characterized in ref. 6. Shown are differences between AlphaGenome predictions between the ALT and REF sequences of the variant in CD34+ CMP tracks. The ALT sequence increases expression of the TAL1 gene 7.5 kb away. c, Predicted TAL1 expression change (ALT–REF) in CD34+ CMPs. RNA-seq variant scores for TAL1 expression in CD34+ CMPs. Oncogenic mutations (orange) are compared with randomly sampled, length-matched indels (grey). d, Multimodal heat map of predicted variant effects. Each column is a distinct variant from c. Each row is a variant effect score associated with a genome track in CD34+ CMPs, except for contact map variant effect scores, which were averaged across tissues (as there is no CD34+ CMP contact map in our data). Background mutations are included alongside oncogenic mutations. Variants were grouped by their insertion length and position (as displayed in Fig. 6c), and scores were min-max scaled. e, ISM results for DNase, H3K27ac and TAL1 RNA-seq expression prediction by AlphaGenome in CD34+ CMPs. Top, ISM on the reference sequence; bottom, ISM on the oncogenic insertion sequence (chr. 1: 47239296: C>ACG). Myb motif from a previous study6, originally from UniPROBE54. f, Multimodality in trait-altering non-coding variants. Fraction of trait-affecting variants55 (‘candidate causal’; 338 for Mendelian and 1,140 for complex traits), as well as matched control variants55 (‘control’; 3,042 and 10,260, respectively), which exceed varying quantile-score thresholds in at least one predicted track. Here, surpassing a quantile-score threshold of 1.0 implies a predicted effect in excess of 99% of common variants (Methods). Variants are categorized depending on the tracks where the threshold was passed: ‘local regulation’ (ChIP/DNase/ATAC), ‘expression only’ (RNA/CAGE) and ‘multimodal’ (combination of the above). Numbers above the bars indicate the relative enrichment of detected variants (sum of the three categories) among candidate causal variants compared with the control variants. The enrichment increases with stricter thresholds, with a reduction in recall (x axis).