Fig. 5: AlphaGenome accurately predicts variant effects on chromatin accessibility and SPI1 transcription factor binding. | Nature

Fig. 5: AlphaGenome accurately predicts variant effects on chromatin accessibility and SPI1 transcription factor binding.

From: Advancing regulatory variant effect prediction with AlphaGenome

Fig. 5: AlphaGenome accurately predicts variant effects on chromatin accessibility and SPI1 transcription factor binding.

a, Schematic of the centre-mask variant scoring strategy used for accessibility and ChIP-seq predictions (Methods). b,c, Performance comparison of AlphaGenome, Borzoi and ChromBPNet on QTL causality (b; average precision) and QTL effect size (c; Pearson r) across QTL types and ancestries. d, Predicted versus observed effect sizes for causal caQTLs (African ancestry). The scatterplot displays GM12878 cell line DNase predictions. Signed Pearson r = 0.74; unsigned Pearson r = 0.45. Signed Pearson r correlation uses raw values; unsigned Pearson r uses absolute values. Red and blue circles highlight variants in e and f. e, Example ALT–REF differences in predicted DNase (GM12878) for variants in d. f, ISM-derived sequence logos for REF/ALT alleles from e, suggesting variant disruption or modulation of transcription factor binding motifs. Putative binding factors and JASPAR53 matrix IDs (MA0105.1 and MA0105.3) are indicated on the right. g, Predicted versus observed effect sizes for causal SPI1 bQTLs using the GM12878 SPI1 ChIP-seq track. Signed Pearson r = 0.55; unsigned Pearson r = 0.12. Red and blue circles highlight variants in h and i. h, Example AlphaGenome predictions for selected SPI1 bQTLs. Shown are ALT–REF differences in predicted SPI1 ChIP-seq track (GM12878) around the variants highlighted in g. i, ISM-derived sequence logos for REF and ALT alleles of example SPI1 bQTLs from h, suggesting potential impacts such as creation or disruption of SPI1 or related motifs. The putative binding factors and JASPAR matrix IDs (MA0081.2 and MA0080.5) are indicated on the right. j, CAGI5 MPRA challenge performance (average across loci; mean Pearson r). Top, zero-shot using cell-type-matched DNase; middle, LASSO regression using cell-type-matched or agnostic DNase; bottom, LASSO regression using multimodal features (DNase + RNA + histone ChIP-seq output types for AlphaGenome and Borzoi; DNase + CAGE output types for Enformer) and all cell types. TF, transcription factor.

Back to article page