Fig. 1: Distribution of performance measurements and the number of individuals in the Polygenic Score Catalog.
From: Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

A Distribution of performance measurement records. B Distribution of AUC values and covariate usage across different ancestry cohorts. In A and B The orange plot represents records in the PGS catalog that do not considerate of covariates; the blue plot represents records that considerate of covariates. C Comparison of the distribution of AUC values between AUC and different ancestry cohorts. The orange box represents the AUC values in the CMUH model; the blue band represents the AUC values recorded from the PGS catalog. D Distribution of the number of individuals at different process stages. The blue plot represents the PGS record used before the initial screening step. The green plot represents the PGS record used after the initial screening step. The orange plot represents the PGS record used for optimized model. In C and D, the box represents the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the data. The bottom and top edges of the box represent the smallest observation and the largest observation excluding outliers. The line inside the box represents the median (50th percentile) of the data. As for the violin plot, a smoothed kernel density estimate of the data distribution within each group is displayed. The bottom and top edges display the minimum and maximum values of the data. The two-sided Wilcoxon rank-sum test was used to calculate the P value. Bold text indicates that the P value < \(1\times {10}^{-5}\) E Cumulative distribution of the number of individuals at each process stage. The blue line represents the PGS record used before the initial screening step. The green line represents the PGS record used after the initial screening step. The orange line represents the PGS record used for optimized model.