Table 1 Summary of imputation accuracy

From: Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

 

Cardiometabolic

Psychiatric disorders

 

r2

r2 binary

AUPR

AUROC

r2

r2 binary

AUPR

AUROC

GAIN

0.071 (0.002)

0.015 (0.002)

0.245 (0.004)

0.587 (0.007)

0.020 (0.000)

0.013 (0.001)

0.281 (0.001)

0.428 (0.001)

KNN

0.237 (0.002)

0.025 (0.001)

0.259 (0.004)

0.600 (0.003)

0.049 (0.001)

0.041 (0.001)

0.398 (0.001)

0.596 (0.001)

HI-VAE

0.193 (0.002)

0.067 (0.003)

0.337 (0.001)

0.693 (0.001)

0.072 (0.001)

0.070 (0.001)

0.430 (0.001)

0.696 (0.001)

SoftImpute

0.269 (0.003)

0.064 (0.002)

0.327 (0.006)

0.689 (0.007)

0.087 (0.001)

0.071 (0.001)

0.425 (0.002)

0.658 (0.002)

AutoComplete

0.297 (0.002)

0.096 (0.004)

0.361 (0.006)

0.726 (0.005)

0.112 (0.001)

0.099 (0.001)

0.450 (0.002)

0.701 (0.001)

  1. Average metrics across all simulations (1%, 5%, 10%, 20% and 50% missing data) are shown for Cardiometabolic and Psychiatric disorder phenotypes. We report the correlation coefficient (r2), the r2 restricted to binary-valued phenotypes (r2 binary), and AUPR and AUROC for binary-valued phenotypes. Standard errors are shown in parentheses.