Fig. 1: Boxplots of gene expression prediction R2 for simulations with varying proportions of true causal SNPs pcausal = (0.001, 0.01, 0.05, 0.1) and true expression heritability \({{{ h}}}_{{{ e}}}^{{{2}}}=\left({{0.1,0.2,0.5}}\right)\).

Expression was simulated 1000 times per scenario. Avg-valid + SR models obtained the highest test R2 for gene expression imputation across all 11 of 12 scenarios, while SR-TWAS performed best in the scenario with \({p}_{{\mbox{causal}}}=0.01\) and \({h}_{e}^{2}=0.5\). This is because test samples are simulated under the same genetic architecture as the ROSMAP training cohort and the validation cohort, which only have ~50% overlapped true causal SNPs as the GTEx training cohort. Gene expression prediction \({R}^{2}\) is the squared Pearson correlation between predicted GReX and simulated true expression. The lower and upper bounds of the box are 1st and 3rd quartile, respectively. The horizontal line is the median, the diamond inside the box is the mean, whiskers extend to the most extreme observation within 1.5 × IQR of the nearest box boundary, and points beyond the whiskers are outliers. Color indicates model: blue = PrediXcan-GTEx, gold = TIGAR-ROSMAP, green = TIGAR-ROSMAP-valid, black = Naive, red = SR-TWAS, purple = Avg-valid+SR.