Fig. 1: Detailed flowchart of PROSPER.
From: An ensemble penalized regression method for multi-ancestry polygenic risk prediction

The analysis of \(M\) populations in PROSPER involves three key steps: (1) Separate single-ancestry analysis for all populations \(i=1,\ldots,M\); (2) Joint analysis across populations using penalized regression; (3) Ensemble regression. In step 1, the training GWAS data is used to train lassosum2 models, and the tuning data is used to obtain the optimal tuning parameters in a single-ancestry analysis. In step 2, the training GWAS and the optimal tuning parameter values from step 1 are used to train the joint cross-population penalized regression model, and obtain solution \({{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\) for each \(\lambda\) and \(c\). In step 3, the tuning data is used to train the super learning model for the ensemble of PRSs computed from the solutions in step 2, \({{{{{{\bf{PRS}}}}}}}_{\lambda,c,i}={{{{{\bf{X}}}}}}{{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\). The final PRS is computed as \({{{{{\bf{PRS}}}}}}={{{{{\bf{X}}}}}}\left(\sum {{{{{{\boldsymbol{w}}}}}}}_{\lambda,c,i}{{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\right)\), where \({w}_{\lambda,c,i}\) are the weights from the super learning model. Refer to the “Method Overview” section in the main text for a full explanation of all notations in the flowchart.