Extended Data Fig. 7: AlphaGenome improves enhancer-gene linking using input gradients and shows enhanced sensitivity to distal enhancers. | Nature

Extended Data Fig. 7: AlphaGenome improves enhancer-gene linking using input gradients and shows enhanced sensitivity to distal enhancers.

From: Advancing regulatory variant effect prediction with AlphaGenome

Extended Data Fig. 7

(a) Zero-shot performance of AlphaGenome and Borzoi on the ENCODE-rE2G benchmark. Bars indicate the area under the precision-recall curve (auPRC) for predicting enhancer-gene links. Two scoring methods derived from each model were evaluated: input gradient scores and RNA-seq variant effect scores. (b) Impact of incorporating AlphaGenome’s input gradient score as a feature in the ENCODE-rE2G extended logistic regression model, evaluated on the ENCODE-rE2G benchmark. ENCODE-rE2G is a logistic regression model trained to predict enhancer-gene interactions from features12. Precision-recall curves are shown, colored by the feature sets used for training the regression model (auPRC values indicated in the legend). Feature sets are: ‘rE2G extended with AlphaGenome features’: All ENCODE-rE2G extended model features plus a single AlphaGenome’s input x gradient score; ‘AlphaGenome features only’: The AlphaGenome input x gradient score alone; ‘TSS distance with AlphaGenome features’: AlphaGenome input x gradient score plus the distance to TSS feature; ‘rE2G extended’: All features from the ENCODE-rE2G extended model12; ‘TSS distance’: Distance to TSS feature from12; ‘ABC features only’: Subset of ‘rE2g extended’, with only features related to the Activity-By-Contact (ABC) model12. (c) Precision-recall curves for the ENCODE-rE2G benchmark, similar to panel (b), evaluating the ENCODE-rE2G extended regression model with different feature sets. Area under the precision-recall curve (auPRC) values for the different feature sets are indicated in the legend. In this configuration, ‘AlphaGenome features’ consist of a more comprehensive set of K562 cell line-specific variant effect scores. These include Allele-Specific Activity Scores (AAS) and variant effect scores calculated as the difference between alternate (ALT) and reference (REF) allele predictions (ALT-REF Diff scores). These scores were derived from AlphaGenome for the following genomic assays: RNA-seq of the target gene, ChIP-TF EP300, ChIP-Histone H3K27ac, CAGE, PRO-cap, H1-ESC contact maps. (d) Relationship between enhancer perturbation effects (ENCODE-rE2G dataset12) and enhancer-promoter distance. The scatter plot shows experimentally observed percentage changes in gene expression upon enhancer knockout (grey points and trend line) versus the genomic distance between the enhancer and the target gene’s Transcription Start Site (TSS). Overlaid are trend lines for AlphaGenome’s (AG, dark blue) and Borzoi’s (green) predictions of these expression changes, derived from their respective model input gradient scores. Each point corresponds to a validated enhancer-gene pair. Error bars (grey) represent 95% confidence intervals from the linear regression.

Back to article page