Fig. 2: Validation of thresholds derived through simulations on real GWAS data for 19 phenotypes.
From: Cost-effective non-additive GWAS across 2329 diseases in 500,349 individuals

A–B Solid lines correspond to additive p-value thresholds derived from simulations, calculated on a negative log scale, with 99%, 80%, and 50% theoretically derived discovery rates for recessive (A) and dominant (B) models. Blue dots show additive p-values and allele frequencies of variants that are missed by the additive model but are genome-wide significant in recessive (A) and dominant (B) models in the analysis of 19 phenotypes in FinnGen. Variants below a threshold would be missed during the filtering procedure that utilizes this threshold. A two-proportion Z-test was used to calculate p-values. C–D Empirical discovery rates (portions of retained new non-additive associations) that correspond to the thresholds with different theoretical discovery rates derived using simulations for the recessive (C) and dominant (D) models. The curve is above the diagonal if the threshold’s empirical discovery rate is greater than the theoretical discovery rate. E–F The reduction in the number of variants left after the filtration for thresholds with different theoretical discovery rates for recessive (E) and dominant (F) models. G–H The reduction in the number of variants analyzed per identified new non-additive association for thresholds with different theoretical discovery rates for recessive (G) and dominant (H) models.