Fig. 2: Models often disagree on predicted direction of effect of cis-regulatory variation.
From: Personal transcriptome variation is poorly explained by current genomic deep learning models

a, Predictions from all four deep learning models on an example gene, SNHG5, that has strong negative cross-individual correlations for Enformer, Basenji2 and ExPecto, and positive cross-individual correlation for Xpresso. Points are colored by the corresponding individual’s dosage of the most statistically significant eQTL for this gene. Dashed lines indicate the predicted expression levels of the reference (Ref) and alternate (Alt) alleles of the most statistically significant eQTL. b, Comparison of cross-individual Spearman rank correlations for Enformer versus other models. A kernel density estimate of each scatterplot is overlaid (red). Note the increased density of genes along the y = x and y = −x axes. Related plots for all pairs of models are shown in Extended Data Fig. 4. c, Cross-individual Spearman rank correlations for Enformer compared with the P value of the most statistically significant eQTL in each gene (top left), the distance to the TSS for that eQTL (top right), the median observed expression level of the gene (bottom left) and the coefficient of variation of the predicted expression levels of the gene (bottom right). Note that negative cross-individual correlations are observed even for genes with strong eQTLs. For each plot, Pearson correlations and lines of best fit using ordinary least squares are shown in black when computed using all genes, and in orange or green when computed using only genes with positive or negative cross-individual correlations, respectively. Related plots for all tested models are shown in Extended Data Figs. 5–10.