Fig. 5: Mitochondrial pangenome analysis and nuclear pangenome performance gain.

a A circular representation of the mitochondrial pangenome, detailing the position and nomenclature of annotated mitochondrial genes within the pangenome. Each bubble or loop represents a haplotype. b Mitochondrial UAE-based Arab Pangenome Reference (mtUPR)variant landscape. A bar chart showcasing the number of UPR-specific small variants observed across different samples in comparison to Human Pangenome Reference Consortium (HPRC), differentiated between polymorphism (dark blue) and singleton (light blue). n = 53 individuals. c Comparative analysis of variant calling performance using linear, assembly and pangenome methods. Violin plot displaying the recall of linear variant calls using assembly-based and pangenome-based methods. n = 10 UPR individuals. d Bar graph illustrating the proportion of errors in Single Nucleotide Polymorphism (SNP) and Insertion and Deletion (Indel) variant calls using three different methods: assembly (red), linear (green), and pangenome (blue). e Mapping accuracy assessment. Box plot illustrating the percentage of properly paired reads in alignments of 9 short read whole genome sequenced Arab samples (from UAE, Saudi, Syria, and Oman) to the UPR and HPRC genomic graphs, compared to the CHM13 reference. Box plots show the 25th and 75th percentiles (interquartile range), center line represents the median, whiskers extend to the minimum and maximum values, and individual data points are overlaid. f Genotyping recall for SNPs. Box plot depicting the recall rates for genotyping of polymorphic variants in easy genomic region based on CHM13 variant calls. Easy genomic regions are defined as parts of the genome excluding segmental duplications, centromeric/satellite sequences, composite repeats, satellites, chrXY sequence classes, telomeres, and palindromes/inverted repeats. n = 9 Arab individuals. g Structural variants across samples in easy genomic regions. Line graph comparing the count of structural variants identified across Arab samples mapped to the UPR and HPRC graphs. h Line graph depicting the frequency of SV lengths across Arab samples mapped to UPR and HPRC graphs. n = 53 UPR, 47 HPRC individuals. Source data are provided as a Source Data file.