Extended Data Fig. 3: Quality assessment and population characteristics.
From: Structural variation in 1,019 diverse humans based on long-read sequencing

a) Quality of the genotypes by Giggles on the HPRC_mg_44 + 966 graph after filtering. Genotyping quality is shown here using a Hardy-Weinberg Equilibrium (HWE) plot given with the allele frequency of the genotyped allele and the percentage of samples heterozygous for that allele (using only the 908 unrelated samples from our dataset). b) SV allele sharing across continental populations. Grey: shared by at least two (and less then all) continental groups. Black: shared by all continental groups. Deletions (top left), insertions (top right), all biallelic SVs (bottom left), all multiallelic (SVs). c) Linkage disequilibrium (LD) of all SVs (MAF > = 1%) with nearby single nucleotide polymorphisms (SNPs). d) As c) with SVs restricted to Genome in a Bottle high-confident regions of the CHM13 genome (2.3 Gbp, 74.2%). e) SV-based admixing spectrum using five reference populations. f) Principal component analysis using all SVs. g) Relation between Variant Allele Count and the Number of Variant Sites with that allele count in the logarithmic space for the SV genotypes on the HPRC_mg_44 + 966 graph, annotated by SVAN. Duplications (DUP), Mobile element insertions and deletions (MEI (non-reference) and MEI (reference), respectively), Nuclear mitochondrial DNA integration (NUMT), processed pseudogene integration (PSD). h) Relationship between the Inversion Allele Count (AC) and the Number of Variant Sites with that allele count shown in log-space for the GeONTIpe based inversion genotypes. The majority of inversions are rare, with most exhibiting an AC < 10. A small subset of inversions is observed more frequently across populations, with 37 inversions exceeding an AC of 1,000, potentially corresponding to reference genome inversions.