Fig. 4: Correction factor adjustment between studies and across genes.

Distribution of iCFs for 6,082 healthy controls from 8 cohorts in MIGen were determined according to rare, pathogenic alleles across all protein-coding genes (a). iCF values were calculated as the exome-wide ratio of the sum of per-individual observed allele counts (OAC) to the sum of per-individual expected allele counts in gnomAD (EAC) according to Eq. (5). The violins demonstrate the spread of iCF among all 8 MIGen cohorts. The horizontal line in each boxplot indicates the median iCF values while the top and bottom lines represent the 75th and 25th percentiles of the iCF distribution, respectively. Length of boxplot represents the inter-quartile range of iCF values. Dashed line represents an iCF corresponding to equal total allele counts between a MIGen participant and gnomAD (i.e., iCF = 1). gCF values were computed as the ratio of the sum of OAC to the sum of iCF-adjusted EAC across (1) all individuals and (2) all genes that were organized into one of 50 gene bins according to Eq. (7). Gene bins were ascertained according to quintile of iCF-adjusted EAC and decile of P-value obtained from a rare variant association test (using burden of rare, pathogenic alleles) conducted in the ranking cohort, consisting of the remaining 2730 healthy control participants in MIGen as “cases” and gnomAD non-Finnish Europeans as controls. The iCF and gCF-adjusted EAC was thereafter calculated according to Eq. (8). The cumulative delta allele count was calculated from the cumulative sum of the per-gene difference between OAC and either the iCF-adjusted EAC (red points) or iCF and gCF-adjusted EAC (blue points) according to Eq. (9). The cumulative delta allele count demarcates genes that are systematically enriched (ENR), well-calibrated (WC), and depleted (DEP) for rare, pathogenic alleles among MIGen control participants (b). The height of the mountain demonstrates the degree of adjustment offered by the iCF to achieve calibration between MIGen controls and gnomAD. Implementing the gCF mitigates residual gene-level biases that cannot be accounted for by the iCF alone. iCF indicates individual correction factor, and gCF indicates gene correction factor. Equations are defined in the “Methods” section. Source data are provided as a Source data file.