Fig. 4: Genotyping large cohorts.

a, The hexbin plots show the relationship between AFs and heterozygosities of the PanGenie genotypes for all 200 unrelated samples from the 1000 Genomes Project. The barplots show the one-dimensional distributions of both features (top: AF, right: heterozygosity). All large insertions (≥50 bp, n = 84,836) and deletions (≥50 bp, n = 34,290) contained in our lenient set were taken into account. b, Comparison of AFs computed from the PanGenie genotypes for 200 samples and the corresponding AFs observed in the 11 assembly samples from which variants were called. As in a, we consider all large insertions (≥50 bp, n = 84,836) and deletions (≥50 bp, n = 34,290) contained in our lenient set. In the boxplots, lower and upper limits of the box represent the lower and upper quartiles (Q1 and Q3); the median is marked in yellow. Lower and upper whiskers are defined as Q1 − 1.5 (Q3–Q1) and Q3 + 1.5 (Q3–Q1), respectively, and outliers are marked by dots. c, Length distribution of the number of common insertions and deletions (AF ≥ 5%) contained in the PanGenie lenient callset and gnomAD.