Extended Data Fig. 9: Interpreting trait-affecting variants using AlphaGenome.
From: Advancing regulatory variant effect prediction with AlphaGenome

(a) AlphaGenome predictions for the variant chr11:116837649:T>G, previously reported to result in underexpression of APOA1 and Hypoalphalipoproteinemia57. The variant strongly reduces the predicted expression of the APOA1 transcript in heart tissue. ISM of the reference sequence suggests that the variant disrupts a TATA-like motif. (b) AlphaGenome predictions for the variant chr16:173692:A>G, previously reported to cause Hemoglobin H disease58. The model predicts that the variant strongly reduces the predicted expression of the HBA2 transcript. ISM of the reference sequence suggests that the variant disrupts the polyadenylation hexamer. (c) AlphaGenome predictions for the variant chr17:44001600:C>A, previously reported to result in reduced binding of HNF1, underexpression of NAGS and ultimately N-acetylglutamate Synthase Deficiency59. AlphaGenome predicts reduced binding of HNF1A, local DNA accessibility, as well as a strong reduction in expression of the NAGS transcript in liver and related cell types. ISM of the reference sequence suggests that the variant disrupts a HNF1 motif. (d) AlphaGenome predictions for the variant chrX:55028202:A>G, previously reported to result in reduced GATA binding and reduced expression of ALAS2, thereby possibly causing Sideroblastic Anemia60. The model predicts reduced binding of GATA factors (GATA2 shown), as well as TAL1, and drastically reduced expression of the ALAS2 transcript in K562. ISM of the reference sequence suggests that the variant disrupts a GATA-TAL composite motif. (e) AlphaGenome predictions for the variant chr5:1295046:T>G, previously reported to cause TERT overexpression associated with malignant melanoma61. AlphaGenome predicts increased binding of ETS factors (ELF1 shown) and an increase in TERT expression in melanocytes. Comparing the ISM around the reference and alternative alleles suggests that the variant creates an ETS factor binding motif. The phyloP62 score for this position is −3.29, consistent with the fact that gain-of-function variants can occur in fast-evolving parts of the genome. (f) AlphaGenome predictions for the variant chr11:5254983:G>C, previously implicated in hereditary persistence of fetal hemoglobin63. The model predicts increased CTCF binding, as well as increased HBG2 expression (shown for CD14+ monocytes). Comparing the ISM around the reference and alternative alleles suggests that the variant creates a CTCF binding motif. The phyloP score for this position is −0.67, indicating that the variant position is under relatively neutral selection (given the low magnitude of the score). (g) AlphaGenome predictions for the variant rs12740374, previously found to be associated with low-density lipoprotein cholesterol levels64. Additionally, the variant is a plasma pQTL for CELSR2, as well as a GTEx eQTL for CELSR2 and PSRC117,65. Consistent with the observed molecular QTL effects, AlphaGenome predicts increased expression of both of these genes in liver tissue. Moreover the model predicts increased CEBP binding on the alternative allele. Comparing the ISM around the reference and alternative alleles suggests that the variant creates a CEBP binding motif. The phyloP score for this position is −0.20, indicating that the variant position is under relatively neutral selection (given the low magnitude of the score). (h) AlphaGenome predictions for the variant rs570639864, previously found to be associated with reduced bone mineral density66. AlphaGenome predicts decreased binding of JUN/FOS factors (JUNB shown), as well as a strong reduction in expression of WNT7B, a gene implicated in bone formation67. ISM of the reference sequence suggests that the variant disrupts a JUN/FOS motif. (i) AlphaGenome predictions for the variant chrX:101386224:T>C, previously reported to result in a lack of functional BTK and causing X-linked Agammaglobulinemia (OMIM #300755). The model predicts reduced binding of SPI1 at the BTK promoter, diminished local DNA accessibility, as well as a reduction in expression of the BTK transcript in B & lymphoblastoid cells. ISM of the reference sequence suggests that the variant disrupts a SPI-like motif (SPI1.H13.0.PB shown). Also shown is the 100-vertebrate PhyloP conservation track, which shows high conservation around the variant.