Fig. 1: Framework for optimizing polygenic scores for a new population. | Nature Communications

Fig. 1: Framework for optimizing polygenic scores for a new population.

From: Clinical utility of polygenic scores for cardiometabolic disease in Arabs

Fig. 1: Framework for optimizing polygenic scores for a new population.

A pragmatic framework for optimizing polygenic scores for a target population using publicly accessible datasets and methods consists of four steps. First, genomic data is prepared using standard quality control and imputation to obtain multiple polygenic scores based on available large and diverse GWAS results and various score derivation methods. The raw scores are adjusted for population structure using principal components (PCs) of ancestry. Second, a best-performing score is identified by splitting the dataset into training and validation sets to determine the best model in the training set and to assess the association between ancestry-specific optimized scores and traits in the validation set. Third, individual risk percentile rank is derived from the distribution based on the reference population of the same ancestry in order to identify individual relative risk levels and study the interplay between genetic risk and conventional risk factors. Fourth, to validate the ancestry-specific optimized scores, ancestry-matched samples can be identified in a large biobank dataset using genetic distance.

Back to article page