Table 2 Prediction accuracy for depressive states.

From: Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes

 

Partial correlations in the independent validation datasets (SE)

P value

Partial correlations in the training datasets (SE)

Number of variants included in prediction models

STMGP

0.0530 (0.0180)

3.424 × 10−3

0.3230 (0.0151)

102

PRS

0.0247 (0.0178)

0.1724

0.9025 (0.0076)

13,421

GBLUP

0.0211 (0.0178)

0.2431

0.9623 (0.0017)

601,239

SBLUP

0.0134 (0.0178)

0.3663

0.9554 (0.0019)

599,149

BayesR

0.0190 (0.0185)

0.2871

0.9633 (0.0015)

615,386

Ridge

0.0160 (0.0178)

0.4321

0.9998 (0.0000)

30,333

  1. PCC predictive correlation coefficient, SE standardized error, STMGP Smooth-Threshold Multivariate Genetic Prediction, PRS polygenic risk scores, GBLUP genomic best linear-unbiased prediction, SBLUP summary-data-based best linear-unbiased prediction, SNP single-nucleotide polymorphism, PC principal component.
  2. Partial correlations were adjusted by covariates such as sex, age, and PC1 ~26.
  3. Since ridge regression based on raw SNP data was difficult to implement in our environment due to the substantial computational cost, the genome data were clumped into approximately 30,000 SNPs in a manner similar to a previous study for these analyses51.