Fig. 4: The performance of polygenic scores is attenuated when using imputed data, which affects both population-level measures such as variance explained in a simulated phenotype, and in individual-level metrics, such as rank concordance when compared to true genotypes.

a Shows the attenuation in variance explained by a PGS when an error introduced by using imputed dosages resides in either the discovery stage or in scoring. The green bars are the reference, where true genotypes were used for both discovery and scoring. Red bars indicate that imputed dosages were used in discovery, while blue bars indicate that imputed dosages were used for scoring. b Shows the rank concordance between individuals in iPSYCH2015i ranked according to their PGS calculated using true genotypes on X axis and imputed dosages from each data integration protocol, across the four figure panels on the Y axis. The rank concordance with the truth set is higher (Red: Low, Blue: High in heatmap gradient) when employing the separate or two-stage protocols as compared to union or intersection protocols.