Extended Data Fig. 3: Track-level performance benchmarking. | Nature

Extended Data Fig. 3: Track-level performance benchmarking.

From: Advancing regulatory variant effect prediction with AlphaGenome

Extended Data Fig. 3

Performance comparison of AlphaGenome with Enformer and Borzoi on held-out genomic track prediction. (a, b) Comparison of AlphaGenome test set performance on Enformer human tracks (each dot is one track) against Enformer models either (a) not fine-tuned or (b) fine-tuned on human data (the main released Enformer version). AlphaGenome model was re-trained for direct comparability using matched training intervals and an additional Enformer prediction head (Methods). (c) Evaluation of RNA-seq prediction performance at base and gene resolution using the same source of RNA-seq data as Borzoi, but processed at base-resolution and not scaled (Methods). Borzoi’s 32 bp RNA-seq predictions were upsampled and unscaled to the original scale for comparison. The larger performance difference observed on the normal scale (first column) likely reflects resolution differences at exon-intron boundaries. This difference decreased when using log(1+x) transformed values (second column), suggesting better agreement on overall gene expression levels. A similar trend was observed when aggregating expression per gene (average exon coverage, third column). Cell-type specificity was evaluated by correlating quantile-normalized, mean-subtracted expression profiles across genes (fourth column) and across tracks (fifth column). (d) Test set performance comparison of AlphaGenome against Borzoi (fold 1) on Borzoi track data at 32 bp resolution (each dot is one track). AlphaGenome was fine-tuned with an additional Borzoi head at matched resolution (Methods). (e) Stratification of cell-type specific prediction accuracy. The per-gene log-fold change correlation performance (from panel c, fourth column) was stratified by gene characteristics: median expression level across tissues (Median TPM; quintile breakpoints: 5.5×10−9, 4.1×10−4, 8.1×10−4, 0.17, 4.1, 3.6×104 TPM), number of tissues with the gene expressed (TPM ≥ 0.001; quintile breakpoints: 9.4×10−8, 9.4×10−4, 8.0, 52, 54, 54 tissues), and housekeeping gene status. Sample sizes in brackets are the number of genes in each category. Box plots display the median (center line), the 25th and 75th percentiles (box bounds), and the whiskers extend to 1.5 times the interquartile range from the box bounds; points beyond whiskers indicate outliers. (f, g) Performance comparison of AlphaGenome against (f) ProCapNet (on PRO-Cap data) and (g) ChromBPNet (on ATAC and DNase). Evaluation was performed on ProCapNet fold 5 and ChromBPNet fold 0 test peak regions, respectively, where regions overlapping with AlphaGenome fold 0 training intervals were excluded. Performance is quantified by track Pearson r, Pearson r on the log total count, and Jensen-Shannon distance (JSD; lower indicates better performance). AlphaGenome outperforms the baselines across all metrics, modalities and cell-lines. For (g), only tracks with matching experiment accessions between AlphaGenome and ChromBPNet training sets were considered.

Back to article page