Figure 2
From: Identifying and characterizing disease subpopulations that most benefit from polygenic risk scores

NG and NG + PRS model performance across diseases. CAD coronary artery disease, BC breast cancer, SZ schizophrenia, NG model with non-genetic feature set, NG + PRS model with combined non-genetic features and polygenic risk scores, AUC area under the curve. (a–c) Plot of first-time incidence rate for each percentile of risk for both NG and NG + PRS risk scores for each disease on the test set across 10 trials. The points in the scatterplot represent the incidence at each percentile for each model at each of the 10 trials (i.e., raw data). The curves visualize the overall trend of the results and were obtained by fitting a fifth order polynomial to the data from each trial and calculating the mean and standard error for each point in the curves. (d) The boxplot illustrates the distribution of the AUCNG and AUCNG+PRS performance metrics for each disease across 10 trials. (e) (Left) Top 10% Subgroup. For the top 10% subgroup there is a substantial improvement in risk discrimination attributed to adding PRSs. For the bottom 30 percentiles of the NG + PRS score, the incidence is 0% and remains < 1% through the 60th percentile. At the top 2 percentiles of the NG + PRS score, the incidence ranges from ~ 47 to ~ 85%, with a higher incidence rate than the NG only model for the top seven percentiles of the score. (Right) Bottom 10% Subgroup. For the bottom 10% subgroup, the NG model achieves strong risk discrimination, with incidence rates ranging from to 0.00% at the lowest percentile up to 23.61% ± 10.92% at the highest percentile of risk, whereas the NG + PRS model achieves worse performance at higher percentiles of the score, such that < 1% of those at the highest percentile of the risk score end up as incident cases.