Table 4 Prediction accuracy in simulation studies in which the phenotype is associated with SNPs only (heritability = 0.10).

From: Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes

Distribution of the true SNP effects

Prediction models

Number of true susceptibility variants

100

200

500

2000

5000

Mean (SE) PCC

Powera

Mean (SE) PCC

Powera

Mean (SE) PCC

Powera

Mean (SE) PCC

Powera

Mean (SE) PCC

Powera

Laplace distribution

STMGP

0.1520 (0.0293)

1.00

0.1029 (0.0408)

1.00

0.0521 (0.0252)

0.80

0.0241 (0.0193)

0.35

0.0217 (0.0171)

0.25

PRS

0.0454 (0.0434)

0.75

0.0421 (0.0247)

0.85

–0.0018 (0.0283)

0.10

0.0128 (0.0203)

0.15

0.0004 (0.0203)

0.10

GBLUP

0.0137 (0.0134)

0.05

0.0201 (0.0143)

0.15

0.0163 (0.0190)

0.20

0.0198 (0.0133)

0.25

0.0199 (0.0201)

0.15

SBLUP

0.0140 (0.0148)

0.05

0.0186 (0.0143)

0.10

0.0150 (0.0200)

0.20

0.0189 (0.0159)

0.25

0.0186 (0.0189)

0.15

BayesR

0.1217 (0.0680)

0.90

0.0782 (0.0475)

0.85

0.0345 (0.0337)

0.35

0.0202 (0.0195)

0.25

0.0172 (0.0222)

0.15

Ridge

0.0183 (0.0158)

0.20

0.0188 (0.0138)

0.20

0.0215 (0.0212)

0.30

0.0184 (0.0111)

0.10

0.0171 (0.0192)

0.15

Normal distribution

STMGP

0.1045 (0.0281)

1.00

0.0638 (0.0205)

0.95

0.0236 (0.0122)

0.30

0.0208 (0.0156)

0.25

0.0195 (0.0186)

0.15

PRS

0.0258 (0.0305)

0.50

0.0177 (0.0220)

0.30

0.0079 (0.0224)

0.15

0.0053 (0.0216)

0.10

0.0015 (0.0233)

0.00

GBLUP

0.0220 (0.0168)

0.30

0.0202 (0.0172)

0.15

0.0161 (0.0147)

0.15

0.0172 (0.0191)

0.15

0.0204 (0.0132)

0.15

SBLUP

0.0215 (0.0173)

0.30

0.0195 (0.0174)

0.15

0.0173 (0.0150)

0.15

0.0185 (0.0198)

0.20

0.0206 (0.0129)

0.15

BayesR

0.0943 (0.0489)

0.90

0.0444 (0.0224)

0.70

0.0210 (0.0171)

0.15

0.0189 (0.0135)

0.20

0.0130 (0.0127)

0.05

Ridge

0.0251 (0.0156)

0.40

0.0269 (0.0180)

0.40

0.0187 (0.0184)

0.15

0.0170 (0.0162)

0.15

0.0154 (0.0179)

0.10

  1. PCC predictive correlation coefficient, SE standardized error, STMGP Smooth-Threshold Multivariate Genetic Prediction, PRS polygenic risk scores, GBLUP genomic best linear-unbiased prediction, SBLUP summary-data-based best linear-unbiased prediction, NEG normal–exponential–gamma.
  2. aPower is the proportion of replicates achieving a significant prediction at P value < 0.05.